i was slightly thinking a bit of this recursion more generally and
i observed that those very long depth chains are kinda problem of this recursion becouse maybe it is more fitted to be run parrallel
Groovy hepcat fir was jivin' in comp.lang.c on Wed, 20 Mar 2024 11:48
am. It's a cool scene! Dig it.
i was slightly thinking a bit of this recursion more generally and
i observed that those very long depth chains are kinda problem of
this recursion becouse maybe it is more fitted to be run parrallel
I wasn't going to post this here, since it's really an algorithm
issue, rather than a C issue. But the thread seems to have gone on for
some time with you seeming to be unable to solve this. So I'll give
you this as a clue.
The (or, at least, a) solution is only partially recursive. What I
have used is a line-based algorithm, each line being filled
iteratively (in a simple loop) from left to right. Recursion from
line to line completes the algorithm. Thus, the recursion level is
greatly reduced. And you should find that this approach fills an area
of any shape. Note, however, that for some pathological cases (very
large and complex shapes), this can still create a fairly large level
of recursion. Maybe a more complex approach can deal with this. What I present here is just a very simple one which, in most cases, should
have a level of recursion well within reason.
I use a two part approach. The first part (called floodfill in the
code below) just sets up for the second part. The second part (called r_floodfill here, for recursive floodfill) does the actual work, but
is only called by floodfill(). It goes something like this (although
this is incomplete, untested and not code I've actually used, just an example):
static void r_floodfill(unsigned y, unsigned x, pixel_t new_clr,
pixel_t old_clr)
{
unsigned start, end;
/* Find start and end of line within floodfill area. */
start = end = x;
while(old_clr == get_pixel(y, start - 1))
--start;
while(old_clr == get_pixel(y, end + 1))
++end;
/* Fill line with new colour. */
for(x = start; x <= end; x++)
set_pixel(y, x, new_clr);
/* Run along again, checking pixel colours above and below,
and recursing if appropriate. */
for(x = start; x <= end; x++)
{
if(old_clr == get_pixel(y - 1, x))
r_floodfill(y - 1, x, new_clr, old_clr);
if(old_clr == get_pixel(y + 1, x))
r_floodfill(y + 1, x, new_clr, old_clr);
}
}
void floodfill(unsigned y, unsigned x, pixel_t new_clr)
{
pixel_t old_clr = get_pixel(y, x);
/* Only proceed if colours differ. */
if(new_clr != old_clr)
r_floodfill(y, x, new_clr, old_clr);
}
To use this, simply call floodfill() passing the coordinates of the starting point for the fill (y and x) and the fill colour (new_clr).
On Fri, 22 Mar 2024 13:04:39 +1100
Peter 'Shaggy' Haywood <phaywood@alphalink.com.au> wrote:
To use this, simply call floodfill() passing the coordinates of
the starting point for the fill (y and x) and the fill colour
(new_clr).
It looks like anisotropic variant of my St. George Cross algorithm.
Or like recursive variant of Malcolm's algorithm.
Being anisotropic, it has higher amount of glass jaws. In particular,
it would be very slow for not uncommon 'jail window' patterns.
* *** *** *** ***
* * * * * * * * *
* * * * * * * * *
* * * * * * * * *
* * * * * * * * *
* * * * * * * * *
* * * * * * * * *
* * * * * * * * *
* * * * * * * * *
*** *** *** *** *
Also, implementation is still recursive and the worst-case recursion
depth is still O(N), where N is total number of recolored pixels, so
unlike many other solutions presented here, you didn't solve fir's
original problem.
And in presented form it's not thread-safe. Which is not a problem for
fir, but nonn-desirable for the rest of us.
Conclusion: sorry, you aren't going to get a cookie for your effort.
On 17/03/2024 11:25, Ben Bacarisse wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:The convetional wisdom is the opposite, But here, conventional wisdom
On 16/03/2024 13:55, Ben Bacarisse wrote:Had you offered a proof that your code neither "blows the stack" nor
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
Recursion make programs harder to reason about and prove correct.Are you prepared to offer any evidence to support this astonishing
statement or can we just assume it's another Malcolmism?
Example given. A recursive algorithm which is hard to reason about and
prove correct, because we don't really know whether under perfectly
reasonable assumptions it will or will not blow the stack.
runs out of any other resource we'd have a starting point for
comparison, but you have not done that.
Mind you, had you done that, we would have something that might
eventually become only one piece of evidence for what is an
astonishingly general remark. Broadly applicable remarks require either
broadly applicable evidence or a wealth of distinct cases.
Your "rule" suggests that all reasoning is impeded by the presence of
recursion and I don't think you can support that claim. This is
characteristic of many of your remarks -- they are general "rules" that
often remain rules even when there is evidence to the contrary.
I'll make another point in the hope of clarifying the matter. An
algorithm or code is usually proved correct (or not!) under the
assumption that it has adequate resources -- usually time and storage.
Further reasoning may then be done to determine the resource
requirements since this is so often dependent on context. This
separation is helpful as you don't usually want to tie "correctness" to
some specific installation. The code might run on a system with a
dynamically allocated stack, for example, that has very similar
limitations to "heap" memory.
To put is more generally, we often want to prove properties of code that
are independent of physical constraints. Your remark includes this kind
reasoning. Did you intend it to?
fails. Because heaps are unlimited whilst stacks are not.
Groovy hepcat fir was jivin' in comp.lang.c on Wed, 20 Mar 2024 11:48
am. It's a cool scene! Dig it.
i was slightly thinking a bit of this recursion more generally and
i observed that those very long depth chains are kinda problem of this
recursion becouse maybe it is more fitted to be run parrallel
I wasn't going to post this here, since it's really an algorithm
issue, rather than a C issue. But the thread seems to have gone on for
some time with you seeming to be unable to solve this. So I'll give you
this as a clue.
The (or, at least, a) solution is only partially recursive. What I
have used is a line-based algorithm, each line being filled iteratively
(in a simple loop) from left to right. Recursion from line to line
completes the algorithm. Thus, the recursion level is greatly reduced.
And you should find that this approach fills an area of any shape.
Note, however, that for some pathological cases (very large and
complex shapes), this can still create a fairly large level of
recursion. Maybe a more complex approach can deal with this. What I
present here is just a very simple one which, in most cases, should
have a level of recursion well within reason.
I use a two part approach. The first part (called floodfill in the
code below) just sets up for the second part. The second part (called r_floodfill here, for recursive floodfill) does the actual work, but is
only called by floodfill(). It goes something like this (although this
is incomplete, untested and not code I've actually used, just an
example):
static void r_floodfill(unsigned y, unsigned x, pixel_t new_clr, pixel_t old_clr)
{
unsigned start, end;
/* Find start and end of line within floodfill area. */
start = end = x;
while(old_clr == get_pixel(y, start - 1))
--start;
while(old_clr == get_pixel(y, end + 1))
++end;
/* Fill line with new colour. */
for(x = start; x <= end; x++)
set_pixel(y, x, new_clr);
/* Run along again, checking pixel colours above and below,
and recursing if appropriate. */
for(x = start; x <= end; x++)
{
if(old_clr == get_pixel(y - 1, x))
r_floodfill(y - 1, x, new_clr, old_clr);
if(old_clr == get_pixel(y + 1, x))
r_floodfill(y + 1, x, new_clr, old_clr);
}
}
void floodfill(unsigned y, unsigned x, pixel_t new_clr)
{
pixel_t old_clr = get_pixel(y, x);
/* Only proceed if colours differ. */
if(new_clr != old_clr)
r_floodfill(y, x, new_clr, old_clr);
}
To use this, simply call floodfill() passing the coordinates of the starting point for the fill (y and x) and the fill colour (new_clr).
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
The convetional wisdom is the opposite, But here, conventional wisdom
fails. Because heaps are unlimited while stacks are not.
That's not actually true. The size of both are bounded, yes.
It's certainly possible (in POSIX, anyway) for the stack bounds
to be unlimited (given sufficient real memory and/or backing
store) and the heap size to be bounded. See 'setrlimit'.
Michael S <already5chosen@yahoo.com> writes:
On Tue, 19 Mar 2024 21:40:22 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
On Mon, 18 Mar 2024 22:42:14 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
[...]
Here is the refinement that uses a resizing rather than
fixed-size buffer.
[code]
This variant is significantly slower than Malcolm's.
2x slower for solid rectangle, 6x slower for snake shape.
Slower with some shapes, faster in others.
In my small test suit I found no cases where this specific code is measurably faster than code of Malcolm.
My test cases include pixel fields of 32k by 32k, with for
example filling the entire field starting at the center point.
Kind of a stress test but it turned up some interesting results.
I did find one case in which they are approximately equal. I call
it "slalom shape" and it's more or less designed to be the worst
case for algorithms that are trying to speed themselves by take
advantage of straight lines.
The slalom shape is generated by following code:
[code]
Thanks, I may try that.
In any case
the code was written for clarity of presentation, with
no attention paid to low-level performance.
Yes, your code is easy to understand. Could have been easier
still if persistent indices had more meaningful names.
I have a different view on that question. However I take your
point.
In other post I showed optimized variant of your algorithm: -
4-neighbors loop unrolled. Majority of the speed up come not from unrolling itself, but from specialization of in-rectangle check
enabled by unroll.
- Todo queue implemented as circular buffer.
- Initial size of queue increased.
This optimized variant is more competitive with 'line-grabby'
algorithms in filling solid shapes and faster than them in
'slalom' case.
Yes, unrolling is an obvious improvement. I deliberately chose a
simple (and non-optimized) method to make it easier to see how it
works. Simple optimizations are left as an exercise for the
reader. :)
Generally, I like your algorithm.
It was surprising for me that queue can work better than stack, my intuition suggested otherwise, but facts are facts.
Using a stack is like a depth-first search, and a queue is like a breadth-first search. For a pixel field of size N x N, doing a
depth-first search can lead to memory usage of order N**2,
whereas a breadth-first search has a "frontier" at most O(N).
Another way to think of it is that breadth-first gets rid of
visited nodes as fast as it can, but depth-first keeps them
around for a long time when everything is reachable from anywhere
(as will be the case in large simple reasons).
On Wed, 20 Mar 2024 10:01:10 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
Generally, I like your algorithm.
It was surprising for me that queue can work better than stack, my
intuition suggested otherwise, but facts are facts.
Using a stack is like a depth-first search, and a queue is like a
breadth-first search. For a pixel field of size N x N, doing a
depth-first search can lead to memory usage of order N**2,
whereas a breadth-first search has a "frontier" at most O(N).
Another way to think of it is that breadth-first gets rid of
visited nodes as fast as it can, but depth-first keeps them
around for a long time when everything is reachable from anywhere
(as will be the case in large simple reasons).
For my test cases the FIFO depth of your algorithm never exceeds min(width,height)*2+2. I wonder if existence of this or similar
limit can be proven theoretically.
Michael S <already5chosen@yahoo.com> writes:
On Tue, 19 Mar 2024 11:57:53 +0000
Malcolm McLean <malcolm.arthur.mclean@gmail.com> wrote:
The nice thing about Tim's method is that we can expect that
performance depends on number of recolored pixels and almost
nothing else.
One aspect that I consider a significant plus is my code never
does poorly. Certainly it isn't the fastest in all cases, but
it's never abysmally slow.
On Wed, 20 Mar 2024 07:27:38 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
On Tue, 19 Mar 2024 11:57:53 +0000
Malcolm McLean <malcolm.arthur.mclean@gmail.com> wrote:
[...]
The nice thing about Tim's method is that we can expect that
performance depends on number of recolored pixels and almost
nothing else.
One aspect that I consider a significant plus is my code never
does poorly. Certainly it isn't the fastest in all cases, but
it's never abysmally slow.
To be fair, none of presented algorithms is abysmally slow. When
compared by number of visited points, they all appear to be within
factor of 2 or 2.5 of each other.
Some of them for some patterns could be 10-15 times slower than
others, but it does not happen for all patterns and when it
happens it's because of problematic implementation rather because
of differences in algorithms.
The big difference between algorithms is not a speed, but amount of
auxiliary memory used in the worst case. Your algorithm appears to be
the best in that department, [...]
But even by that metric, the difference between different
implementations of the same algorithm is often much bigger than
difference between algorithms.
For example, solid 200x200 image with starting point in the corner
[...]
scott@slp53.sl.home (Scott Lurndal) writes:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
The convetional wisdom is the opposite, But here, conventional wisdom
fails. Because heaps are unlimited while stacks are not.
That's not actually true. The size of both are bounded, yes.
It's certainly possible (in POSIX, anyway) for the stack bounds
to be unlimited (given sufficient real memory and/or backing
store) and the heap size to be bounded. See 'setrlimit'.
The sizes of both heaps and stacks are bounded, because
pointers have a fixed number of bits. Certainly these
sizes can be very very large, but they are not unbounded.
Michael S <already5chosen@yahoo.com> writes:
On Wed, 20 Mar 2024 07:27:38 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
On Tue, 19 Mar 2024 11:57:53 +0000
Malcolm McLean <malcolm.arthur.mclean@gmail.com> wrote:
[...]
The nice thing about Tim's method is that we can expect that
performance depends on number of recolored pixels and almost
nothing else.
One aspect that I consider a significant plus is my code never
does poorly. Certainly it isn't the fastest in all cases, but
it's never abysmally slow.
To be fair, none of presented algorithms is abysmally slow. When
compared by number of visited points, they all appear to be within
factor of 2 or 2.5 of each other.
Certainly "abysmally slow" is subjective, but working in a large
pixel field, filling the whole field starting at the center,
Malcolm's code runs slower than my unoptimized code by a factor of
10 (and a tad slower than that compared to my optimized code).
Some of them for some patterns could be 10-15 times slower than
others, but it does not happen for all patterns and when it
happens it's because of problematic implementation rather because
of differences in algorithms.
In the case of Malcolm's code I think it's the algorithm, because
it doesn't scale linearly. Malcolm's code runs faster than mine
for small colorings, but slows down dramatically as the image
being colored gets bigger.
The big difference between algorithms is not a speed, but amount of
auxiliary memory used in the worst case. Your algorithm appears to be
the best in that department, [...]
Yes, my unoptimized algorithm was designed to use as little
memory as possible. The optimized version traded space for
speed: it runs a little bit faster but incurs a non-trivial cost
in terms of space used. I think it's still not too bad, an upper
bound of a small multiple of N for an NxN pixel field.
But even by that metric, the difference between different
implementations of the same algorithm is often much bigger than
difference between algorithms.
If I am not mistaken the original naive recursive algorithm has a
space cost that is O( N**2 ) for an NxN pixel field. The big-O
difference swamps everything else, just like the big-O difference
in runtime does for that metric.
For example, solid 200x200 image with starting point in the corner
[...]
On small pixel fields almost any algorithm is probably not too
bad. These days any serious algorithm should scale well up
to at least 4K by 4K, and tested up to at least 16K x 16K.
Tricks that make some things faster for small images sometimes
fall on their face when confronted with a larger image. My code
isn't likely to win many races on small images, but on large
images I expect it will always be competitive even if it doesn't
finish in first place.
Michael S <already5chosen@yahoo.com> writes:
On Wed, 20 Mar 2024 10:01:10 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
[...]
Generally, I like your algorithm.
It was surprising for me that queue can work better than stack, my
intuition suggested otherwise, but facts are facts.
Using a stack is like a depth-first search, and a queue is like a
breadth-first search. For a pixel field of size N x N, doing a
depth-first search can lead to memory usage of order N**2,
whereas a breadth-first search has a "frontier" at most O(N).
Another way to think of it is that breadth-first gets rid of
visited nodes as fast as it can, but depth-first keeps them
around for a long time when everything is reachable from anywhere
(as will be the case in large simple reasons).
For my test cases the FIFO depth of your algorithm never exceeds min(width,height)*2+2. I wonder if existence of this or similar
limit can be proven theoretically.
I believe it is possible to prove the strict FIFO algorithm is
O(N) for an N x N pixel field, but I haven't tried to do so in
any rigorous way, nor do I know what the constant is. It does
seem to be larger than 2.
As for finding a worst case, try this (expressed in pseudo code):
let pc = { width/2, height/2 }
// assume pixel field 'field' starts out as all zeroes
color 8 "legs" with the value '1' as follows:
leg from { 1, pc.y-1 } to { pc.x -1, pc.y-1 }
leg from { 1, pc.y+1 } to { pc.x -1, pc.y+1 }
leg from { px.x + 1, pc.y-1 } to { width-2, pc.y-1 }
leg from { px.x + 1, pc.y+1 } to { width-2, pc.y+1 }
leg from { px.x - 1, 1 } to { px.x -1, pc.y-1 }
leg from { px.x + 1, 1 } to { px.x +1, pc.y-1 }
leg from { px.x - 1, pc.y+1 } to { px.x -1, height/2 }
leg from { px.x + 1, pc.y+1 } to { px.x +1, height/2 }
So the pixel field should look like this (with longer legs for a
bigger pixel field), with '-' being 0 and '*' being 1:
+-----------------------+
| - - - - - - - - - - - |
| - - - - * - * - - - - |
| - - - - * - * - - - - |
| - - - - * - * - - - - |
| - * * * * - * * * * - |
| - - - - - - - - - - - |
| - * * * * - * * * * - |
| - - - - * - * - - - - |
| - - - - * - * - - - - |
| - - - - * - * - - - - |
| - - - - - - - - - - - |
+-----------------------+
Now start coloring at the center point with a new value
of 2.
Michael S <already5chosen@yahoo.com> writes:
On Wed, 20 Mar 2024 07:27:38 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
On Tue, 19 Mar 2024 11:57:53 +0000
Malcolm McLean <malcolm.arthur.mclean@gmail.com> wrote:
[...]
The nice thing about Tim's method is that we can expect that
performance depends on number of recolored pixels and almost
nothing else.
One aspect that I consider a significant plus is my code never
does poorly. Certainly it isn't the fastest in all cases, but
it's never abysmally slow.
To be fair, none of presented algorithms is abysmally slow. When
compared by number of visited points, they all appear to be within
factor of 2 or 2.5 of each other.
Certainly "abysmally slow" is subjective, but working in a large
pixel field, filling the whole field starting at the center,
Malcolm's code runs slower than my unoptimized code by a factor of
10 (and a tad slower than that compared to my optimized code).
Some of them for some patterns could be 10-15 times slower than
others, but it does not happen for all patterns and when it
happens it's because of problematic implementation rather because
of differences in algorithms.
In the case of Malcolm's code I think it's the algorithm, because
it doesn't scale linearly. Malcolm's code runs faster than mine
for small colorings, but slows down dramatically as the image
being colored gets bigger.
The big difference between algorithms is not a speed, but amount of auxiliary memory used in the worst case. Your algorithm appears to
be the best in that department, [...]
Yes, my unoptimized algorithm was designed to use as little
memory as possible. The optimized version traded space for
speed: it runs a little bit faster but incurs a non-trivial cost
in terms of space used. I think it's still not too bad, an upper
bound of a small multiple of N for an NxN pixel field.
But even by that metric, the difference between different
implementations of the same algorithm is often much bigger than
difference between algorithms.
If I am not mistaken the original naive recursive algorithm has a
space cost that is O( N**2 ) for an NxN pixel field. The big-O
difference swamps everything else, just like the big-O difference
in runtime does for that metric.
For example, solid 200x200 image with starting point in the corner
[...]
On small pixel fields almost any algorithm is probably not too
bad. These days any serious algorithm should scale well up
to at least 4K by 4K, and tested up to at least 16K x 16K.
Tricks that make some things faster for small images sometimes
fall on their face when confronted with a larger image. My code
isn't likely to win many races on small images, but on large
images I expect it will always be competitive even if it doesn't
finish in first place.
On Sun, 24 Mar 2024 13:26:16 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
On Wed, 20 Mar 2024 07:27:38 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
On Tue, 19 Mar 2024 11:57:53 +0000
Malcolm McLean <malcolm.arthur.mclean@gmail.com> wrote:
[...]
The nice thing about Tim's method is that we can expect that
performance depends on number of recolored pixels and almost
nothing else.
One aspect that I consider a significant plus is my code never
does poorly. Certainly it isn't the fastest in all cases, but
it's never abysmally slow.
To be fair, none of presented algorithms is abysmally slow. When compared by number of visited points, they all appear to be within
factor of 2 or 2.5 of each other.
Certainly "abysmally slow" is subjective, but working in a large
pixel field, filling the whole field starting at the center,
Malcolm's code runs slower than my unoptimized code by a factor of
10 (and a tad slower than that compared to my optimized code).
Some of them for some patterns could be 10-15 times slower than
others, but it does not happen for all patterns and when it
happens it's because of problematic implementation rather because
of differences in algorithms.
In the case of Malcolm's code I think it's the algorithm, because
it doesn't scale linearly. Malcolm's code runs faster than mine
for small colorings, but slows down dramatically as the image
being colored gets bigger.
The big difference between algorithms is not a speed, but amount
of auxiliary memory used in the worst case. Your algorithm
appears to be the best in that department, [...]
Yes, my unoptimized algorithm was designed to use as little
memory as possible. The optimized version traded space for
speed: it runs a little bit faster but incurs a non-trivial cost
in terms of space used. I think it's still not too bad, an upper
bound of a small multiple of N for an NxN pixel field.
But even by that metric, the difference between different
implementations of the same algorithm is often much bigger than difference between algorithms.
If I am not mistaken the original naive recursive algorithm has a
space cost that is O( N**2 ) for an NxN pixel field. The big-O
difference swamps everything else, just like the big-O difference
in runtime does for that metric.
For example, solid 200x200 image with starting point in the corner
[...]
On small pixel fields almost any algorithm is probably not too
bad. These days any serious algorithm should scale well up
to at least 4K by 4K, and tested up to at least 16K x 16K.
Tricks that make some things faster for small images sometimes
fall on their face when confronted with a larger image. My code
isn't likely to win many races on small images, but on large
images I expect it will always be competitive even if it doesn't
finish in first place.
You are right. At 1920*1080 except for few special patterns, your
code is faster than Malcolm's by factor of 1.5x to 1.8. Same for 4K. Auxiliary memory arrays of Malcolm are still quite small at these
image sizes, but speed suffers.
I wonder if it is a problem of algorithm or of implementation. Since I
still didn't figure out his idea, I can't improve his implementation
in order check it.
One thing that I were not expecting at this bigger pictures, is good performance of simple recursive algorithm. Of course, not of original
form of it, but of variation with explicit stack.
For many shapes it has quite large memory footprint and despite that
it is not slow. Probably the stack has very good locality of
reference.
Here is the code:
<snip>
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
scott@slp53.sl.home (Scott Lurndal) writes:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
The convetional wisdom is the opposite, But here, conventional wisdom >>>>> fails. Because heaps are unlimited while stacks are not.
That's not actually true. The size of both are bounded, yes.
It's certainly possible (in POSIX, anyway) for the stack bounds
to be unlimited (given sufficient real memory and/or backing
store) and the heap size to be bounded. See 'setrlimit'.
The sizes of both heaps and stacks are bounded, because
pointers have a fixed number of bits. Certainly these
sizes can be very very large, but they are not unbounded.
I was referring to the term of art used in POSIX, where
unlimited simply means that the operating system doesn't
limit them [.. elaboration ..]
[..various fill algorithms and how they scale..]
One thing that I were not expecting at this bigger pictures, is good performance of simple recursive algorithm. Of course, not of original
form of it, but of variation with explicit stack.
For many shapes it has quite large memory footprint and despite that it
is not slow. Probably the stack has very good locality of reference.
[algorithm]
Michael S <already5chosen@yahoo.com> writes:
[..various fill algorithms and how they scale..]
One thing that I were not expecting at this bigger pictures, is good performance of simple recursive algorithm. Of course, not of
original form of it, but of variation with explicit stack.
For many shapes it has quite large memory footprint and despite
that it is not slow. Probably the stack has very good locality of reference.
[algorithm]
You are indeed a very clever fellow. I'm impressed.
Intrigued by your idea, I wrote something along the same lines,
only shorter and (at least for me) a little easier to grok.
If someone is interested I can post it.
I see you have also done a revised algorithm based on the same
idea, but more elaborate (to save on runtime footprint?).
Still working on formulating a response to that one...
On Thu, 28 Mar 2024 23:04:36 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
[..various fill algorithms and how they scale..]
One thing that I were not expecting at this bigger pictures, is
good performance of simple recursive algorithm. Of course, not of
original form of it, but of variation with explicit stack. For
many shapes it has quite large memory footprint and despite that
it is not slow. Probably the stack has very good locality of
reference.
[algorithm]
You are indeed a very clever fellow. I'm impressed.
Yes, the use of switch is clever :(
It more resemble computed GO TO in old FORTRAN or indirect jumps
in asm than idiomatic C switch. But it is a legal* C.
Intrigued by your idea, I wrote something along the same lines,
only shorter and (at least for me) a little easier to grok.
If someone is interested I can post it.
If non-trivially different, why not?
I see you have also done a revised algorithm based on the same
idea, but more elaborate (to save on runtime footprint?).
Still working on formulating a response to that one...
The original purpose of enhancement was to amortize non-trivial
and probably not very fast call stack emulation logic over more
than one pixel. 2x2 just happens to be the biggest block that
still has very simple in-block recoloring logic. ~4x reduction in
the size of auxiliary memory is just a pleasant side effect.
Exactly the same 4x reduction in memory size could have been
achieved with single-pixel variant by using packed array for 2-bit
state (==trace back) stack elements. But it would be the same or
slower than original while the enhanced variant is robustly faster
than original.
After implementing the first enhancement I paid attention that at
4K size the timing (per pixel) for few of my test cases is
significantly worse than at smaller images. So, I added another
enhancement aiming to minimize cache trashing effects by never
looking back at immediate parent of current block. The info about
location of the parent nicely fitted into remaining 2 bits of
stack octet.
The most robust code that I found so far that performs well both
with small pictures and with large and huge, is a variation on the
same theme of explicit stack, may be, more properly called trace
back. It operates on 2x2 squares instead of individual pixels.
The worst case auxiliary memory footprint of this variant is rather
big, up to picture_size/4 bytes. The code is *not* simple, but
complexity appears to be necessary for robust performance with
various shapes and sizes.
[...]
I did program in FORTRAN briefly but don't remember ever using
computed GO TO. And yes, I found that missing semicolon and put it
back. Is there some reason you don't always use -pedantic? I
pretty much always do.
An alternate idea is to use a 64-bit integer for 32 "top of stack"
elements, or up to 32 I should say, and a stack with 64-bit values.
Just an idea, it may not turn out to be useful.
The few measurements I have done don't show a big difference in
performance between the two methods. But I admit I wasn't paying
close attention, and like I said only a few patterns of filling were exercised.
After implementing the first enhancement I paid attention that at
4K size the timing (per pixel) for few of my test cases is
significantly worse than at smaller images. So, I added another enhancement aiming to minimize cache trashing effects by never
looking back at immediate parent of current block. The info about
location of the parent nicely fitted into remaining 2 bits of
stack octet.
The idea of not going back to the originator (what you call the
parent) is something I developed independently before looking at
your latest code (and mostly I still haven't). Seems like a good
idea.
Two further comments.
One, the new code is a lot more complicated than the previous
code. I'm not sure the performance gain is worth the cost
in complexity. What kind of speed improvements do you see,
in terms of percent?
Two, and more important, the new algorithm still uses O(NxN) memory
for an N by N pixel field. We really would like to get that down to
O(N) memory (and of course run just as fast). Have you looked into
how that might be done?
Michael S <already5chosen@yahoo.com> writes:
[...]
The most robust code that I found so far that performs well both
with small pictures and with large and huge, is a variation on the
same theme of explicit stack, may be, more properly called trace
back. It operates on 2x2 squares instead of individual pixels.
The worst case auxiliary memory footprint of this variant is rather
big, up to picture_size/4 bytes. The code is *not* simple, but
complexity appears to be necessary for robust performance with
various shapes and sizes.
[...]
I took a cursory look just now, after reading your other later
posting. I think I have a general sense, especially in conjunction
with the explanatory comments.
I'm still hoping to find a method that is both fast and has
good memory use, which is to say O(N) for an NxN pixel field.
Something that would help is to have a library of test cases,
by which I mean patterns to be colored, so that a set of
methods could be tried, and timed, over all the patterns in
the library. Do you have something like that? So far all
my testing has been ad hoc.
Incidentally, it looks like your code assumes X varies more rapidly
than Y, so a "by row" order, whereas my code assumes Y varies more
rapidly than X, a "by column" order.
The difference doesn't matter
as long as the pixel field is square and the test cases either are
symmetric about the X == Y axis or duplicate a non-symmetric pattern
about the X == Y axis. I would like to be able to run comparisons
between different methods and get usable results without having
to jump around because of different orientations. I'm not sure
how to accommodate that.
On Thu, 28 Mar 2024 23:04:36 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Intrigued by your idea, I wrote something along the same lines,
only shorter and (at least for me) a little easier to grok.
If someone is interested I can post it.
If non-trivially different, why not?
On Fri, 29 Mar 2024 23:58:26 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
One, the new code is a lot more complicated than the previous
code. I'm not sure the performance gain is worth the cost
in complexity. What kind of speed improvements do you see,
in terms of percent?
On my 11 y.o. and not top-of-the-line even then home PC for 4K
image (3840 x 2160) with cross-in-cross shape that I took from one of
your previous post, it is 2.43 times faster.
I don't remember how it compares on more modern systems. Anyway, right
now I have no test systems more modern than 3 y.o. Zen3.
Michael S <already5chosen@yahoo.com> writes:
On Wed, 20 Mar 2024 10:01:10 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
[...]
Generally, I like your algorithm.
It was surprising for me that queue can work better than stack, my
intuition suggested otherwise, but facts are facts.
Using a stack is like a depth-first search, and a queue is like a
breadth-first search. For a pixel field of size N x N, doing a
depth-first search can lead to memory usage of order N**2,
whereas a breadth-first search has a "frontier" at most O(N).
Another way to think of it is that breadth-first gets rid of
visited nodes as fast as it can, but depth-first keeps them
around for a long time when everything is reachable from anywhere
(as will be the case in large simple reasons).
For my test cases the FIFO depth of your algorithm never exceeds min(width,height)*2+2. I wonder if existence of this or similar
limit can be proven theoretically.
I believe it is possible to prove the strict FIFO algorithm is
O(N) for an N x N pixel field, but I haven't tried to do so in
any rigorous way, nor do I know what the constant is. It does
seem to be larger than 2.
On Sat, 30 Mar 2024 00:54:19 -0700[...]
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Something that would help is to have a library of test cases,
by which I mean patterns to be colored, so that a set of
methods could be tried, and timed, over all the patterns in
the library. Do you have something like that? So far all
my testing has been ad hoc.
I am not 100% sure about the meaning of 'ad hoc', but I'd guess
that mine are ad hoc too. Below are shapes that I use apart from
solid rectangles. I run them at 5 sizes: 25x19, 200x200,
1280x720, 1920x1080, 3840x2160. That is certainly not enough for
correction tests, but feel that it is sufficient for speed tests.
[code]
On Fri, 29 Mar 2024 23:58:26 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
I did program in FORTRAN briefly but don't remember ever using
computed GO TO. And yes, I found that missing semicolon and put it
back. Is there some reason you don't always use -pedantic? I
pretty much always do.
Just a habit.
In "real" work, as opposed to hobby, I use gcc almost exclusively for
small embedded targets and quite often with 3-rd party libraries in
source form. In such environment rising warnings level above -Wall
would be counterproductive, because it would be hard to see relevant
warning behind walls of false alarms.
May be, for hobby, where I have full control on everything, switching
to -Wpedantic is not a bad idea.
The idea of not going back to the originator (what you call the
parent) is something I developed independently before looking at
your latest code (and mostly I still haven't). Seems like a good
idea.
I call it a principle of Lot's wife.
That is yet another reason to not grow blocks above 2x2.
For bigger blocks it does not apply.
On Sat, 30 Mar 2024 21:15:06 +0300
Michael S <already5chosen@yahoo.com> wrote:
On Fri, 29 Mar 2024 23:58:26 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
One, the new code is a lot more complicated than the previous
code. I'm not sure the performance gain is worth the cost
in complexity. What kind of speed improvements do you see,
in terms of percent?
On my 11 y.o. and not top-of-the-line even then home PC for 4K
image (3840 x 2160) with cross-in-cross shape that I took from one of
your previous post, it is 2.43 times faster.
I don't remember how it compares on more modern systems. Anyway, right
now I have no test systems more modern than 3 y.o. Zen3.
I tested on newer hardware - Intel Coffee Lake (Xeon-E 2176G) and AMD
Zen3 (EPYC 7543P).
Here I no longer see significant drop in speed of the 1x1 variant at 4K
size, but I still see that more complicated variant provides nice speed
up. Up to 1.56x on Coffee Lake and up to 3x on Zen3.
On Sun, 24 Mar 2024 10:24:45 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
On Wed, 20 Mar 2024 10:01:10 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
[...]
Generally, I like your algorithm.
It was surprising for me that queue can work better than stack, my
intuition suggested otherwise, but facts are facts.
Using a stack is like a depth-first search, and a queue is like a
breadth-first search. For a pixel field of size N x N, doing a
depth-first search can lead to memory usage of order N**2,
whereas a breadth-first search has a "frontier" at most O(N).
Another way to think of it is that breadth-first gets rid of
visited nodes as fast as it can, but depth-first keeps them
around for a long time when everything is reachable from anywhere
(as will be the case in large simple reasons).
For my test cases the FIFO depth of your algorithm never exceeds
min(width,height)*2+2. I wonder if existence of this or similar
limit can be proven theoretically.
I believe it is possible to prove the strict FIFO algorithm is
O(N) for an N x N pixel field, but I haven't tried to do so in
any rigorous way, nor do I know what the constant is. It does
seem to be larger than 2.
It seems that in worst case the strict FIFO algorithm is the same as
the rest of them, i.e. O(NN) where NN is the number of re-colored
points. Below is an example of the shape for which I measured memory consumption for 3840x2160 image almost exactly 4x as much as for
1920x1080.
[code to generate fractal tree pattern]
Michael S <already5chosen@yahoo.com> writes:
On Sun, 24 Mar 2024 10:24:45 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
On Wed, 20 Mar 2024 10:01:10 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
[...]
Generally, I like your algorithm.
It was surprising for me that queue can work better than stack,
my intuition suggested otherwise, but facts are facts.
Using a stack is like a depth-first search, and a queue is like a
breadth-first search. For a pixel field of size N x N, doing a
depth-first search can lead to memory usage of order N**2,
whereas a breadth-first search has a "frontier" at most O(N).
Another way to think of it is that breadth-first gets rid of
visited nodes as fast as it can, but depth-first keeps them
around for a long time when everything is reachable from anywhere
(as will be the case in large simple reasons).
For my test cases the FIFO depth of your algorithm never exceeds
min(width,height)*2+2. I wonder if existence of this or similar
limit can be proven theoretically.
I believe it is possible to prove the strict FIFO algorithm is
O(N) for an N x N pixel field, but I haven't tried to do so in
any rigorous way, nor do I know what the constant is. It does
seem to be larger than 2.
Before I do anything else I should correct a bug in my earlier
FIFO algorithm. The initialization of the variable jx should
read
Index const jx = used*3 < open ? k : j+open/3 &m;
rather than what it used to. (The type may have changed but that
is incidental; what matters is the value of the initializing
expression.) I don't know what I was thinking when I wrote the
previous version, it's just completely wrong.
It seems that in worst case the strict FIFO algorithm is the same as
the rest of them, i.e. O(NN) where NN is the number of re-colored
points. Below is an example of the shape for which I measured
memory consumption for 3840x2160 image almost exactly 4x as much as
for 1920x1080.
I agree, the empirical evidence here and in my own tests is quite
compelling.
That said, the constant factor for the FIFO algorithm is lower
than the stack-based algorithms, even taking into account the
difference in sizes for queue and stack elements. Moreover cases
where FIFO algorithms are O( NxN ) are unusual and sparse,
whereas the stack-based algorithms tend to use a lot of memory
in lots of common and routine cases. On the average FIFO
algorithms typically use a lot less memory (or so I conjecture).
[code to generate fractal tree pattern]
Thank you for this. I incorporated it into my set of test
patterns more or less as soon as it was posted.
Now that I have taken some time to play around with different
algorithms and have been more systematic in doing speed
comparisons between different algorithms, on different patterns,
and with a good range of sizes, I have some general thoughts
to offer.
Stack-based methods tend to do well on long skinny patterns and
tend to do not as well on fatter patterns such as circles or
squares. The fractal pattern is ideal for a stack-based method.
Conversely, patterns that are mostly solid shapes don't fare as
well under stack-based methods, at least not the ones that have
been posted in this thread, and also they tend to use more memory
in those cases.
I've been playing around with a more elaborate, mostly FIFO
method, in hopes of getting something that offers the best
of both worlds. The results so far are encouraging, but a
fair amount of tuning has been necessary (and perhaps more
still is), and comparisons have been done on just the one
test server I have available. So I don't know how well it
would hold up on other hardware, including especially more
recent hardware. Under these circumstances I feel it is
premature to post actual code, especially since the code
is still in flux.
This topic has been more interesting that I was expecting, and
also more challenging.
I have a strong rule against writing
functions more than about 60 lines long. For the problem of
writing an acceptably quick flood-fill algorithm, I think it would
at the very least be a lot of work to write code to do that while
still observing a limit on function length of even 100 lines, let
alone 60.
On Wed, 10 Apr 2024 19:47:11 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Before I do anything else I should correct a bug in my earlier
FIFO algorithm. The initialization of the variable jx should
read
Index const jx = used*3 < open ? k : j+open/3 &m;
I lost track, sorry. I can not find your code that contains line
similar to this.
Can you point to specific post?
On Wed, 10 Apr 2024 19:47:11 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
It seems that in worst case the strict FIFO algorithm is the same
as the rest of them, i.e. O(NN) where NN is the number of
re-colored points. Below is an example of the shape for which I
measured memory consumption for 3840x2160 image almost exactly 4x
as much as for 1920x1080.
I agree, the empirical evidence here and in my own tests is quite
compelling.
BTW, I am no longer agree with myself about "the rest of them".
By now, I know at least one method that is O(W*log(H)). It is even
quite fast for majority of my test shapes. Unfortunately, [in its
current form] it is abysmally slow (100x) for minority of tests.
[In it's current form] it has other disadvantages as well like
consuming non-trivial amount of memory when handling small spot in the
big image. But that can be improved. I am less sure that worst-case
speed can be improved enough to make it generally acceptable.
I think, I said enough for you to figure out a general principle of
this algorithm. I don't want to post code here before I try few improvements.
On Wed, 10 Apr 2024 19:47:11 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Stack-based methods tend to do well on long skinny patterns and
tend to do not as well on fatter patterns such as circles or
squares. The fractal pattern is ideal for a stack-based method.
Conversely, patterns that are mostly solid shapes don't fare as
well under stack-based methods, at least not the ones that have
been posted in this thread, and also they tend to use more memory
in those cases.
Indeed, with solid shapes it uses more memory. But at least in my
tests on my hardware with this sort of shapes it is easily faster
than anything else. The difference vs the best of the rest is
especially big at 4K images on AMD Zen3 based hardware, but even on
Intel Skylake which generally serves as equalizer between different algorithms, the speed advantage of 2x2 stack is significant.
On Wed, 10 Apr 2024 19:47:11 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
This topic has been more interesting that I was expecting, and
also more challenging.
That's not the first time in my practice where problems with
simple formulation begots interesting challenges.
Didn't Donald Knuth wrote 300 or 400 pages about sorting and
still ended up quite far away from exhausting the topic?
On Wed, 10 Apr 2024 19:47:11 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
I have a strong rule against writing
functions more than about 60 lines long. For the problem of
writing an acceptably quick flood-fill algorithm, I think it would
at the very least be a lot of work to write code to do that while
still observing a limit on function length of even 100 lines, let
alone 60.
So why not break it down to smaller pieces ?
Michael S <already5chosen@yahoo.com> writes:
On Wed, 10 Apr 2024 19:47:11 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Stack-based methods tend to do well on long skinny patterns and
tend to do not as well on fatter patterns such as circles or
squares. The fractal pattern is ideal for a stack-based method.
Conversely, patterns that are mostly solid shapes don't fare as
well under stack-based methods, at least not the ones that have
been posted in this thread, and also they tend to use more memory
in those cases.
Indeed, with solid shapes it uses more memory. But at least in my
tests on my hardware with this sort of shapes it is easily faster
than anything else. The difference vs the best of the rest is
especially big at 4K images on AMD Zen3 based hardware, but even on
Intel Skylake which generally serves as equalizer between different algorithms, the speed advantage of 2x2 stack is significant.
This comment makes me wonder if I should post my timing results.
Maybe I will (and including an appropriate disclaimer).
I do timings over these sizes:
25 x 19
19 x 25
200 x 200
1280 x 760
760 x 1280
1920 x 1080
1080 x 1920
3840 x 2160
2160 x 3840
4096 x 4096
38400 x 21600
21600 x 38400
32767 x 32767
32768 x 32768
with these patterns:
fractal
slalom
rotated slalom
horizontal snake and vertical snake
cross in cross
donut aka ellipse with hole
entire field starting from center
If you have other patterns to suggest that would be great,
I can try to incorporate them (especially if there is
code to generate the pattern).
Patterns are allowed to include a nominal start point,
otherwise I can make an arbitrary choice and assign one.
On Wed, 10 Apr 2024 19:47:11 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Stack-based methods tend to do well on long skinny patterns and
tend to do not as well on fatter patterns such as circles or
squares. The fractal pattern is ideal for a stack-based method.
Conversely, patterns that are mostly solid shapes don't fare as
well under stack-based methods, at least not the ones that have
been posted in this thread, and also they tend to use more memory
in those cases.
Indeed, with solid shapes it uses more memory. But at least in my
tests on my hardware with this sort of shapes it is easily faster
than anything else. The difference vs the best of the rest is
especially big at 4K images on AMD Zen3 based hardware, but even
on Intel Skylake which generally serves as equalizer between
different algorithms, the speed advantage of 2x2 stack is
significant.
On Thu, 11 Apr 2024 22:09:51 -0700[...]
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
I do timings over these sizes:
25 x 19
19 x 25
200 x 200
1280 x 760
760 x 1280
1920 x 1080
1080 x 1920
3840 x 2160
2160 x 3840
4096 x 4096
38400 x 21600
21600 x 38400
32767 x 32767
32768 x 32768
I didn't went that far up (ended at 4K)
and I only test landscape sizes. May be, I'd add portrait option
to see anisotropic behaviors.
with these patterns:
fractal
slalom
rotated slalom
horizontal snake and vertical snake
cross in cross
donut aka ellipse with hole
entire field starting from center
If you have other patterns to suggest that would be great,
I can try to incorporate them (especially if there is
code to generate the pattern).
Patterns are allowed to include a nominal start point,
otherwise I can make an arbitrary choice and assign one.
My suit is about the same with following exceptions:
1. I didn't add donut yet
2. + 3 greeds with cell size 2, 3 and 4
3. + fractal tree
4. + entire field starting from corner
It seems, neither of us tests the cases in which linear dimensions
of the shape are much smaller than those of the field.
static void make_grid(
unsigned char *image,
int width, int height,
unsigned char background_c,
unsigned char pen_c, int cell_sz)
{
for (int y = 0; y < height; ++y) {
unsigned char* p = &image[y*width];
if (y % cell_sz == 0) {
memset(p, pen_c, width);
} else {
for (int x = 0; x < width; ++x)
p[x] = x % cell_sz ? background_c : pen_c;
}
}
}
Michael S <already5chosen@yahoo.com> writes:
On Wed, 10 Apr 2024 19:47:11 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Stack-based methods tend to do well on long skinny patterns and
tend to do not as well on fatter patterns such as circles or
squares. The fractal pattern is ideal for a stack-based method.
Conversely, patterns that are mostly solid shapes don't fare as
well under stack-based methods, at least not the ones that have
been posted in this thread, and also they tend to use more memory
in those cases.
Indeed, with solid shapes it uses more memory. But at least in my
tests on my hardware with this sort of shapes it is easily faster
than anything else. The difference vs the best of the rest is
especially big at 4K images on AMD Zen3 based hardware, but even
on Intel Skylake which generally serves as equalizer between
different algorithms, the speed advantage of 2x2 stack is
significant.
I'm curious to know how your 2x2 algorithm compares to my
second (longer) stack-based algorithm when run on the Zen3.
On my test hardware they are roughly comparable, depending
on size and pattern. My curiosity includes the fatter
patterns as well as the long skinny ones.
On Fri, 12 Apr 2024 11:59:25 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
I'm curious to know how your 2x2 algorithm compares to my
second (longer) stack-based algorithm when run on the Zen3.
On my test hardware they are roughly comparable, depending
on size and pattern. My curiosity includes the fatter
patterns as well as the long skinny ones.
This particular server turned off right now.
Hopefully, next Monday I would be able to test on it.
It would help if in the mean time you point me to specific post
with code.
Michael S <already5chosen@yahoo.com> writes:
On Fri, 12 Apr 2024 11:59:25 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
I'm curious to know how your 2x2 algorithm compares to my
second (longer) stack-based algorithm when run on the Zen3.
On my test hardware they are roughly comparable, depending
on size and pattern. My curiosity includes the fatter
patterns as well as the long skinny ones.
This particular server turned off right now.
Hopefully, next Monday I would be able to test on it.
It would help if in the mean time you point me to specific post
with code.
Does this help? Message-ID: <8634ru3ofo.fsf@linuxsc.com>
Michael S <already5chosen@yahoo.com> writes:
On Wed, 10 Apr 2024 19:47:11 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Stack-based methods tend to do well on long skinny patterns and
tend to do not as well on fatter patterns such as circles or
squares. The fractal pattern is ideal for a stack-based method.
Conversely, patterns that are mostly solid shapes don't fare as
well under stack-based methods, at least not the ones that have
been posted in this thread, and also they tend to use more memory
in those cases.
Indeed, with solid shapes it uses more memory. But at least in my
tests on my hardware with this sort of shapes it is easily faster
than anything else. The difference vs the best of the rest is
especially big at 4K images on AMD Zen3 based hardware, but even
on Intel Skylake which generally serves as equalizer between
different algorithms, the speed advantage of 2x2 stack is
significant.
I'm curious to know how your 2x2 algorithm compares to my
second (longer) stack-based algorithm when run on the Zen3.
On my test hardware they are roughly comparable, depending
on size and pattern. My curiosity includes the fatter
patterns as well as the long skinny ones.
Finally found the time for speed measurements. [...]
Michael S <already5chosen@yahoo.com> writes:
[...]
Finally found the time for speed measurements. [...]
I got these. Thank you.
The format used didn't make it easy to do any automated
processing. I was able to get around that, although it
would have been nicer if that had been easier.
The results you got are radically different than my own,
to the point where I wonder if there is something else
going on.
Considering that, since I now have no way of doing any
useful measuring, it seems there is little point in any
further development or investigation on my part. It's
been fun, even if ultimately inconclusive.
On Wed, 17 Apr 2024 10:47:25 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
[...]
Finally found the time for speed measurements. [...]
I got these. Thank you.
The format used didn't make it easy to do any automated
processing. I was able to get around that, although it
would have been nicer if that had been easier.
The results you got are radically different than my own,
to the point where I wonder if there is something else
going on.
What are your absolute result?
Are they much faster, much slower or similar to mine?
Also it would help if you find out characteristics of your
test hardware.
Considering that, since I now have no way of doing any
useful measuring, it seems there is little point in any
further development or investigation on my part. It's
been fun, even if ultimately inconclusive.
I am still interested in combination of speed that does
not suck with O(N) worst-case memory footprint.
I already have couple of variants of the former,
but so
far they are all unreasonably slow - ~5 times slower than
the best.
Michael S <already5chosen@yahoo.com> writes:
On Wed, 17 Apr 2024 10:47:25 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
[...]
Finally found the time for speed measurements. [...]
I got these. Thank you.
The format used didn't make it easy to do any automated
processing. I was able to get around that, although it
would have been nicer if that had been easier.
The results you got are radically different than my own,
to the point where I wonder if there is something else
going on.
What are your absolute result?
Are they much faster, much slower or similar to mine?
Also it would help if you find out characteristics of your
test hardware.
I think trying to look at those wouldn't tell me anything
helpful. Too many unknowns. And still no way to test or
measure any changes to the various algorithms.
Considering that, since I now have no way of doing any
useful measuring, it seems there is little point in any
further development or investigation on my part. It's
been fun, even if ultimately inconclusive.
I am still interested in combination of speed that does
not suck with O(N) worst-case memory footprint.
I already have couple of variants of the former,
Did you mean you some algorithms whose worst case memory
behavior is strictly less than O( total number of pixels )?
I think it would be helpful to adopt a standard terminology
where the pixel field is of size M x N, otherwise I'm not
sure what O(N) refers to.
but so
far they are all unreasonably slow - ~5 times slower than
the best.
I'm no longer working on the problem but I'm interested to
hear what you come up with.
On Fri, 19 Apr 2024 14:59:20 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Did you mean you some algorithms whose worst case memory
behavior is strictly less than O( total number of pixels )?
I think it would be helpful to adopt a standard terminology
where the pixel field is of size M x N, otherwise I'm not
sure what O(N) refers to.
No, I mean O(max(M,N)) plus possibly some logarithmic component that
loses significance when images grow bigger.
More so, if bounding rectangle of the shape is A x B then I'd like
memory requirements to be O(max(A,B)), but so far it does not appear
to be possible, or at least not possible without significant
complications and further slowdown. So, as an intermediate goal I am
willing to accept that allocation would be O(max(M,N)). but amount of
touched memory is O(max(A,B)).
but so
far they are all unreasonably slow - ~5 times slower than
the best.
I'm no longer working on the problem but I'm interested to
hear what you come up with.
old_color, s->new_color, src_todo, exp_todo, 1);if (!ret)
old_color, s->new_color, src_todo, exp_todo, 1);if (!ret)
Sysop: | Tetrazocine |
---|---|
Location: | Melbourne, VIC, Australia |
Users: | 7 |
Nodes: | 8 (0 / 8) |
Uptime: | 131:11:51 |
Calls: | 46 |
Files: | 21,492 |
Messages: | 64,871 |