• Re: filling area by color atack safety

    From Peter 'Shaggy' Haywood@3:633/280.2 to All on Fri Mar 22 13:04:39 2024
    Groovy hepcat fir was jivin' in comp.lang.c on Wed, 20 Mar 2024 11:48
    am. It's a cool scene! Dig it.

    i was slightly thinking a bit of this recursion more generally and
    i observed that those very long depth chains are kinda problem of this recursion becouse maybe it is more fitted to be run parrallel

    I wasn't going to post this here, since it's really an algorithm
    issue, rather than a C issue. But the thread seems to have gone on for
    some time with you seeming to be unable to solve this. So I'll give you
    this as a clue.
    The (or, at least, a) solution is only partially recursive. What I
    have used is a line-based algorithm, each line being filled iteratively
    (in a simple loop) from left to right. Recursion from line to line
    completes the algorithm. Thus, the recursion level is greatly reduced.
    And you should find that this approach fills an area of any shape.
    Note, however, that for some pathological cases (very large and
    complex shapes), this can still create a fairly large level of
    recursion. Maybe a more complex approach can deal with this. What I
    present here is just a very simple one which, in most cases, should
    have a level of recursion well within reason.
    I use a two part approach. The first part (called floodfill in the
    code below) just sets up for the second part. The second part (called r_floodfill here, for recursive floodfill) does the actual work, but is
    only called by floodfill(). It goes something like this (although this
    is incomplete, untested and not code I've actually used, just an
    example):

    static void r_floodfill(unsigned y, unsigned x, pixel_t new_clr, pixel_t old_clr)
    {
    unsigned start, end;

    /* Find start and end of line within floodfill area. */
    start = end = x;
    while(old_clr == get_pixel(y, start - 1))
    --start;
    while(old_clr == get_pixel(y, end + 1))
    ++end;

    /* Fill line with new colour. */
    for(x = start; x <= end; x++)
    set_pixel(y, x, new_clr);

    /* Run along again, checking pixel colours above and below,
    and recursing if appropriate. */
    for(x = start; x <= end; x++)
    {
    if(old_clr == get_pixel(y - 1, x))
    r_floodfill(y - 1, x, new_clr, old_clr);
    if(old_clr == get_pixel(y + 1, x))
    r_floodfill(y + 1, x, new_clr, old_clr);
    }
    }

    void floodfill(unsigned y, unsigned x, pixel_t new_clr)
    {
    pixel_t old_clr = get_pixel(y, x);

    /* Only proceed if colours differ. */
    if(new_clr != old_clr)
    r_floodfill(y, x, new_clr, old_clr);
    }

    To use this, simply call floodfill() passing the coordinates of the
    starting point for the fill (y and x) and the fill colour (new_clr).

    --


    ----- Dig the NEW and IMPROVED news sig!! -----


    -------------- Shaggy was here! ---------------
    Ain't I'm a dawg!!

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Sat Mar 23 01:55:26 2024
    On Fri, 22 Mar 2024 13:04:39 +1100
    Peter 'Shaggy' Haywood <phaywood@alphalink.com.au> wrote:

    Groovy hepcat fir was jivin' in comp.lang.c on Wed, 20 Mar 2024 11:48
    am. It's a cool scene! Dig it.

    i was slightly thinking a bit of this recursion more generally and
    i observed that those very long depth chains are kinda problem of
    this recursion becouse maybe it is more fitted to be run parrallel

    I wasn't going to post this here, since it's really an algorithm
    issue, rather than a C issue. But the thread seems to have gone on for
    some time with you seeming to be unable to solve this. So I'll give
    you this as a clue.
    The (or, at least, a) solution is only partially recursive. What I
    have used is a line-based algorithm, each line being filled
    iteratively (in a simple loop) from left to right. Recursion from
    line to line completes the algorithm. Thus, the recursion level is
    greatly reduced. And you should find that this approach fills an area
    of any shape. Note, however, that for some pathological cases (very
    large and complex shapes), this can still create a fairly large level
    of recursion. Maybe a more complex approach can deal with this. What I present here is just a very simple one which, in most cases, should
    have a level of recursion well within reason.
    I use a two part approach. The first part (called floodfill in the
    code below) just sets up for the second part. The second part (called r_floodfill here, for recursive floodfill) does the actual work, but
    is only called by floodfill(). It goes something like this (although
    this is incomplete, untested and not code I've actually used, just an example):

    static void r_floodfill(unsigned y, unsigned x, pixel_t new_clr,
    pixel_t old_clr)
    {
    unsigned start, end;

    /* Find start and end of line within floodfill area. */
    start = end = x;
    while(old_clr == get_pixel(y, start - 1))
    --start;
    while(old_clr == get_pixel(y, end + 1))
    ++end;

    /* Fill line with new colour. */
    for(x = start; x <= end; x++)
    set_pixel(y, x, new_clr);

    /* Run along again, checking pixel colours above and below,
    and recursing if appropriate. */
    for(x = start; x <= end; x++)
    {
    if(old_clr == get_pixel(y - 1, x))
    r_floodfill(y - 1, x, new_clr, old_clr);
    if(old_clr == get_pixel(y + 1, x))
    r_floodfill(y + 1, x, new_clr, old_clr);
    }
    }

    void floodfill(unsigned y, unsigned x, pixel_t new_clr)
    {
    pixel_t old_clr = get_pixel(y, x);

    /* Only proceed if colours differ. */
    if(new_clr != old_clr)
    r_floodfill(y, x, new_clr, old_clr);
    }

    To use this, simply call floodfill() passing the coordinates of the starting point for the fill (y and x) and the fill colour (new_clr).


    It looks like anisotropic variant of my St. George Cross algorithm.
    Or like recursive variant of Malcolm's algorithm.
    Being anisotropic, it has higher amount of glass jaws. In particular,
    it would be very slow for not uncommon 'jail window' patterns.
    * *** *** *** ***
    * * * * * * * * *
    * * * * * * * * *
    * * * * * * * * *
    * * * * * * * * *
    * * * * * * * * *
    * * * * * * * * *
    * * * * * * * * *
    * * * * * * * * *
    *** *** *** *** *

    Also, implementation is still recursive and the worst-case recursion
    depth is still O(N), where N is total number of recolored pixels, so
    unlike many other solutions presented here, you didn't solve fir's
    original problem.
    And in presented form it's not thread-safe. Which is not a problem for
    fir, but nonn-desirable for the rest of us.

    Conclusion: sorry, you aren't going to get a cookie for your effort.






    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Sat Mar 23 02:31:16 2024
    On Fri, 22 Mar 2024 17:55:26 +0300
    Michael S <already5chosen@yahoo.com> wrote:

    On Fri, 22 Mar 2024 13:04:39 +1100
    Peter 'Shaggy' Haywood <phaywood@alphalink.com.au> wrote:


    To use this, simply call floodfill() passing the coordinates of
    the starting point for the fill (y and x) and the fill colour
    (new_clr).

    It looks like anisotropic variant of my St. George Cross algorithm.
    Or like recursive variant of Malcolm's algorithm.
    Being anisotropic, it has higher amount of glass jaws. In particular,
    it would be very slow for not uncommon 'jail window' patterns.
    * *** *** *** ***
    * * * * * * * * *
    * * * * * * * * *
    * * * * * * * * *
    * * * * * * * * *
    * * * * * * * * *
    * * * * * * * * *
    * * * * * * * * *
    * * * * * * * * *
    *** *** *** *** *

    Also, implementation is still recursive and the worst-case recursion
    depth is still O(N), where N is total number of recolored pixels, so
    unlike many other solutions presented here, you didn't solve fir's
    original problem.
    And in presented form it's not thread-safe. Which is not a problem for
    fir, but nonn-desirable for the rest of us.

    Conclusion: sorry, you aren't going to get a cookie for your effort.



    So, what is my own practical answer?
    Assuming that speed is not a top priority and that simplicity
    is pretty high on priority scale and that it should work with big
    images and default stack size under Windows, I will go with following
    not particularly fast and not particularly slow algorithm that I call
    "deferred stack". That is, it's mostly explicit stack, but (explicit) recursion is deferred until all four neighbors of current pixel saved
    on todo stack.
    "Not particularly slow" means that I did see cases where some other
    algorithms is 2 times faster, but had never seen 3x difference.
    In case x and y are known to fit in uint16_t, UI type could be redefined accordingly. It will make execution faster, but not by much.

    #include <stdlib.h>
    #include <stddef.h>
    #include <string.h>

    typedef unsigned char Color;
    typedef int UI;

    int floodfill_r(
    Color *image,
    int width,
    int height,
    int x0,
    int y0,
    Color old,
    Color new)
    {
    if (width < 0 || height < 0)
    return 0;

    if (x0 < 0 || x0 >= width || y0 < 0 || y0 >= height)
    return 0;

    size_t x = x0;
    size_t y = y0;
    if (image[y*width+x] != old)
    return 0;

    const ptrdiff_t INITIAL_TODO_SIZE = 128;
    struct Point { UI x, y; } ;
    struct Point *todo = malloc(INITIAL_TODO_SIZE * sizeof *todo );
    if (!todo)
    return -1;
    struct Point* todo_end = &todo[INITIAL_TODO_SIZE];

    todo[0].x = x; todo[0].y = y;
    struct Point* sp = &todo[1];
    do {
    x = sp[-1].x; y = sp[-1].y;
    --sp;
    if (image[y*width+x] == old) {
    image[y*width+x] = new;
    if (x > 0 && image[y*width+x-1] == old) {
    sp->x = x - 1; sp->y = y; ++sp;
    }
    if (y > 0 && image[y*width+x-width] == old) {
    sp->x = x; sp->y = y - 1; ++sp;
    }
    if (x+1 < width && image[y*width+x+1] == old) {
    sp->x = x + 1; sp->y = y; ++sp;
    }
    if (y+1 < height && image[y*width+x+width] == old) {
    sp->x = x; sp->y = y + 1; ++sp;
    }

    if (todo_end-sp < 4) {
    ptrdiff_t used = sp-todo;
    ptrdiff_t size = todo_end - todo;
    size += size/4;
    struct Point* new_todo = realloc(todo, size * sizeof *todo );
    if(!new_todo) {
    free(todo);
    return -1;
    }
    todo = new_todo;
    sp = &todo[used];
    todo_end = &todo[size];
    }
    }
    } while (sp != todo);

    free( todo );
    return 1;
    }





    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Ben Bacarisse@3:633/280.2 to All on Sat Mar 23 11:21:00 2024
    Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:

    On 17/03/2024 11:25, Ben Bacarisse wrote:
    Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:

    On 16/03/2024 13:55, Ben Bacarisse wrote:
    Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:

    Recursion make programs harder to reason about and prove correct.
    Are you prepared to offer any evidence to support this astonishing
    statement or can we just assume it's another Malcolmism?

    Example given. A recursive algorithm which is hard to reason about and
    prove correct, because we don't really know whether under perfectly
    reasonable assumptions it will or will not blow the stack.
    Had you offered a proof that your code neither "blows the stack" nor
    runs out of any other resource we'd have a starting point for
    comparison, but you have not done that.
    Mind you, had you done that, we would have something that might
    eventually become only one piece of evidence for what is an
    astonishingly general remark. Broadly applicable remarks require either
    broadly applicable evidence or a wealth of distinct cases.
    Your "rule" suggests that all reasoning is impeded by the presence of
    recursion and I don't think you can support that claim. This is
    characteristic of many of your remarks -- they are general "rules" that
    often remain rules even when there is evidence to the contrary.
    I'll make another point in the hope of clarifying the matter. An
    algorithm or code is usually proved correct (or not!) under the
    assumption that it has adequate resources -- usually time and storage.
    Further reasoning may then be done to determine the resource
    requirements since this is so often dependent on context. This
    separation is helpful as you don't usually want to tie "correctness" to
    some specific installation. The code might run on a system with a
    dynamically allocated stack, for example, that has very similar
    limitations to "heap" memory.
    To put is more generally, we often want to prove properties of code that
    are independent of physical constraints. Your remark includes this kind
    reasoning. Did you intend it to?

    The convetional wisdom is the opposite, But here, conventional wisdom
    fails. Because heaps are unlimited whilst stacks are not.

    I put off answering for enough time that I now don't care anymore. I
    think that's a win for everyone.

    --
    Ben.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From fir@3:633/280.2 to All on Sat Mar 23 21:06:42 2024
    Peter 'Shaggy' Haywood wrote:
    Groovy hepcat fir was jivin' in comp.lang.c on Wed, 20 Mar 2024 11:48
    am. It's a cool scene! Dig it.

    i was slightly thinking a bit of this recursion more generally and
    i observed that those very long depth chains are kinda problem of this
    recursion becouse maybe it is more fitted to be run parrallel

    I wasn't going to post this here, since it's really an algorithm
    issue, rather than a C issue. But the thread seems to have gone on for
    some time with you seeming to be unable to solve this. So I'll give you
    this as a clue.
    The (or, at least, a) solution is only partially recursive. What I
    have used is a line-based algorithm, each line being filled iteratively
    (in a simple loop) from left to right. Recursion from line to line
    completes the algorithm. Thus, the recursion level is greatly reduced.
    And you should find that this approach fills an area of any shape.
    Note, however, that for some pathological cases (very large and
    complex shapes), this can still create a fairly large level of
    recursion. Maybe a more complex approach can deal with this. What I
    present here is just a very simple one which, in most cases, should
    have a level of recursion well within reason.
    I use a two part approach. The first part (called floodfill in the
    code below) just sets up for the second part. The second part (called r_floodfill here, for recursive floodfill) does the actual work, but is
    only called by floodfill(). It goes something like this (although this
    is incomplete, untested and not code I've actually used, just an
    example):

    static void r_floodfill(unsigned y, unsigned x, pixel_t new_clr, pixel_t old_clr)
    {
    unsigned start, end;

    /* Find start and end of line within floodfill area. */
    start = end = x;
    while(old_clr == get_pixel(y, start - 1))
    --start;
    while(old_clr == get_pixel(y, end + 1))
    ++end;

    /* Fill line with new colour. */
    for(x = start; x <= end; x++)
    set_pixel(y, x, new_clr);

    /* Run along again, checking pixel colours above and below,
    and recursing if appropriate. */
    for(x = start; x <= end; x++)
    {
    if(old_clr == get_pixel(y - 1, x))
    r_floodfill(y - 1, x, new_clr, old_clr);
    if(old_clr == get_pixel(y + 1, x))
    r_floodfill(y + 1, x, new_clr, old_clr);
    }
    }

    void floodfill(unsigned y, unsigned x, pixel_t new_clr)
    {
    pixel_t old_clr = get_pixel(y, x);

    /* Only proceed if colours differ. */
    if(new_clr != old_clr)
    r_floodfill(y, x, new_clr, old_clr);
    }

    To use this, simply call floodfill() passing the coordinates of the starting point for the fill (y and x) and the fill colour (new_clr).


    well this is ok improvement for consideration - i hovever resolved a
    problem even in 3 ways as you could note reading more carefully
    1) put a stack to 100 MB and forget
    2) ui wrote strightforward iteretive version (in draft) (this with
    AddPixel(.. )
    3) i noticed that the best method would be to introduce so called call
    queue in c (probably best solution imo)

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: i2pn2 (i2pn.org) (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Sun Mar 24 05:48:16 2024
    scott@slp53.sl.home (Scott Lurndal) writes:

    Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:

    The convetional wisdom is the opposite, But here, conventional wisdom
    fails. Because heaps are unlimited while stacks are not.

    That's not actually true. The size of both are bounded, yes.

    It's certainly possible (in POSIX, anyway) for the stack bounds
    to be unlimited (given sufficient real memory and/or backing
    store) and the heap size to be bounded. See 'setrlimit'.

    The sizes of both heaps and stacks are bounded, because
    pointers have a fixed number of bits. Certainly these
    sizes can be very very large, but they are not unbounded.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Mon Mar 25 03:33:52 2024
    On Wed, 20 Mar 2024 10:01:10 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Tue, 19 Mar 2024 21:40:22 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Mon, 18 Mar 2024 22:42:14 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

    [...]

    Here is the refinement that uses a resizing rather than
    fixed-size buffer.
    [code]

    This variant is significantly slower than Malcolm's.
    2x slower for solid rectangle, 6x slower for snake shape.

    Slower with some shapes, faster in others.

    In my small test suit I found no cases where this specific code is measurably faster than code of Malcolm.

    My test cases include pixel fields of 32k by 32k, with for
    example filling the entire field starting at the center point.
    Kind of a stress test but it turned up some interesting results.

    I did find one case in which they are approximately equal. I call
    it "slalom shape" and it's more or less designed to be the worst
    case for algorithms that are trying to speed themselves by take
    advantage of straight lines.
    The slalom shape is generated by following code:
    [code]

    Thanks, I may try that.

    In any case
    the code was written for clarity of presentation, with
    no attention paid to low-level performance.

    Yes, your code is easy to understand. Could have been easier
    still if persistent indices had more meaningful names.

    I have a different view on that question. However I take your
    point.

    In other post I showed optimized variant of your algorithm: -
    4-neighbors loop unrolled. Majority of the speed up come not from unrolling itself, but from specialization of in-rectangle check
    enabled by unroll.
    - Todo queue implemented as circular buffer.
    - Initial size of queue increased.
    This optimized variant is more competitive with 'line-grabby'
    algorithms in filling solid shapes and faster than them in
    'slalom' case.

    Yes, unrolling is an obvious improvement. I deliberately chose a
    simple (and non-optimized) method to make it easier to see how it
    works. Simple optimizations are left as an exercise for the
    reader. :)

    Generally, I like your algorithm.
    It was surprising for me that queue can work better than stack, my intuition suggested otherwise, but facts are facts.

    Using a stack is like a depth-first search, and a queue is like a breadth-first search. For a pixel field of size N x N, doing a
    depth-first search can lead to memory usage of order N**2,
    whereas a breadth-first search has a "frontier" at most O(N).
    Another way to think of it is that breadth-first gets rid of
    visited nodes as fast as it can, but depth-first keeps them
    around for a long time when everything is reachable from anywhere
    (as will be the case in large simple reasons).


    For my test cases the FIFO depth of your algorithm never exceeds min(width,height)*2+2. I wonder if existence of this or similar limit
    can be proven theoretically.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Mon Mar 25 04:24:45 2024
    Michael S <already5chosen@yahoo.com> writes:

    On Wed, 20 Mar 2024 10:01:10 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    [...]

    Generally, I like your algorithm.
    It was surprising for me that queue can work better than stack, my
    intuition suggested otherwise, but facts are facts.

    Using a stack is like a depth-first search, and a queue is like a
    breadth-first search. For a pixel field of size N x N, doing a
    depth-first search can lead to memory usage of order N**2,
    whereas a breadth-first search has a "frontier" at most O(N).
    Another way to think of it is that breadth-first gets rid of
    visited nodes as fast as it can, but depth-first keeps them
    around for a long time when everything is reachable from anywhere
    (as will be the case in large simple reasons).

    For my test cases the FIFO depth of your algorithm never exceeds min(width,height)*2+2. I wonder if existence of this or similar
    limit can be proven theoretically.

    I believe it is possible to prove the strict FIFO algorithm is
    O(N) for an N x N pixel field, but I haven't tried to do so in
    any rigorous way, nor do I know what the constant is. It does
    seem to be larger than 2.

    As for finding a worst case, try this (expressed in pseudo code):

    let pc = { width/2, height/2 }
    // assume pixel field 'field' starts out as all zeroes
    color 8 "legs" with the value '1' as follows:

    leg from { 1, pc.y-1 } to { pc.x -1, pc.y-1 }
    leg from { 1, pc.y+1 } to { pc.x -1, pc.y+1 }
    leg from { px.x + 1, pc.y-1 } to { width-2, pc.y-1 }
    leg from { px.x + 1, pc.y+1 } to { width-2, pc.y+1 }

    leg from { px.x - 1, 1 } to { px.x -1, pc.y-1 }
    leg from { px.x + 1, 1 } to { px.x +1, pc.y-1 }
    leg from { px.x - 1, pc.y+1 } to { px.x -1, height/2 }
    leg from { px.x + 1, pc.y+1 } to { px.x +1, height/2 }


    So the pixel field should look like this (with longer legs for a
    bigger pixel field), with '-' being 0 and '*' being 1:

    +-----------------------+
    | - - - - - - - - - - - |
    | - - - - * - * - - - - |
    | - - - - * - * - - - - |
    | - - - - * - * - - - - |
    | - * * * * - * * * * - |
    | - - - - - - - - - - - |
    | - * * * * - * * * * - |
    | - - - - * - * - - - - |
    | - - - - * - * - - - - |
    | - - - - * - * - - - - |
    | - - - - - - - - - - - |
    +-----------------------+

    Now start coloring at the center point with a new value
    of 2.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Mon Mar 25 04:27:58 2024
    On Wed, 20 Mar 2024 07:27:38 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Tue, 19 Mar 2024 11:57:53 +0000
    Malcolm McLean <malcolm.arthur.mclean@gmail.com> wrote:


    The nice thing about Tim's method is that we can expect that
    performance depends on number of recolored pixels and almost
    nothing else.

    One aspect that I consider a significant plus is my code never
    does poorly. Certainly it isn't the fastest in all cases, but
    it's never abysmally slow.


    To be fair, none of presented algorithms is abysmally slow.
    When compared by number of visited points, they all appear to be within
    factor of 2 or 2.5 of each other.
    Some of them for some patterns could be 10-15 times slower than
    others, but it does not happen for all patterns and when it happens
    it's because of problematic implementation rather because of
    differences in algorithms.
    Even original naive recursive algorithm is not too slow when
    implemented in optimized asm - 2.2x slower than the fastest for solid
    square shape and closer than that for other shapes.

    The big difference between algorithms is not a speed, but amount of
    auxiliary memory used in the worst case. Your algorithm appears to be
    the best in that department, Malcolm's algorithm it's also quite good
    and all others (plain recursion, stacks, my deferred stack, all my
    cross variants, lines-oriented recursion of : Peter 'Shaggy'
    Haywood) are a lot worse.

    But even by that metric, the difference between different
    implementations of the same algorithm is often much bigger than
    difference between algorithms.

    For example, solid 200x200 image with starting point in the corner
    recolored by code presented in first Malcolm's post (not his own
    algorithm, but recursive algorithm that he presented as a reference
    point) on x86-64/gcc consumes 5,094,784 bytes of stack. After small modification (all non-changing parameters aggregated in structure
    and passed by reference) the footprint falls to 2,547,328 B.
    Coding the same algorithm (well, almost the same) in asm reduces it to
    32,0000 B. Coding it with explicit stack cuts it to 40,000 B. Now I
    didn't actually coded it, but I know how to compress explicit stack
    down to 2 bits per level of recursion. If implemented, it would be
    10,000B, i.e. comparable with much more economical algorithm of Malcolm
    and 512x smaller than original implementation of [well, almost] the
    same algorithm!



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Mon Mar 25 07:26:16 2024
    Michael S <already5chosen@yahoo.com> writes:

    On Wed, 20 Mar 2024 07:27:38 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Tue, 19 Mar 2024 11:57:53 +0000
    Malcolm McLean <malcolm.arthur.mclean@gmail.com> wrote:
    [...]
    The nice thing about Tim's method is that we can expect that
    performance depends on number of recolored pixels and almost
    nothing else.

    One aspect that I consider a significant plus is my code never
    does poorly. Certainly it isn't the fastest in all cases, but
    it's never abysmally slow.

    To be fair, none of presented algorithms is abysmally slow. When
    compared by number of visited points, they all appear to be within
    factor of 2 or 2.5 of each other.

    Certainly "abysmally slow" is subjective, but working in a large
    pixel field, filling the whole field starting at the center,
    Malcolm's code runs slower than my unoptimized code by a factor of
    10 (and a tad slower than that compared to my optimized code).

    Some of them for some patterns could be 10-15 times slower than
    others, but it does not happen for all patterns and when it
    happens it's because of problematic implementation rather because
    of differences in algorithms.

    In the case of Malcolm's code I think it's the algorithm, because
    it doesn't scale linearly. Malcolm's code runs faster than mine
    for small colorings, but slows down dramatically as the image
    being colored gets bigger.

    The big difference between algorithms is not a speed, but amount of
    auxiliary memory used in the worst case. Your algorithm appears to be
    the best in that department, [...]

    Yes, my unoptimized algorithm was designed to use as little
    memory as possible. The optimized version traded space for
    speed: it runs a little bit faster but incurs a non-trivial cost
    in terms of space used. I think it's still not too bad, an upper
    bound of a small multiple of N for an NxN pixel field.

    But even by that metric, the difference between different
    implementations of the same algorithm is often much bigger than
    difference between algorithms.

    If I am not mistaken the original naive recursive algorithm has a
    space cost that is O( N**2 ) for an NxN pixel field. The big-O
    difference swamps everything else, just like the big-O difference
    in runtime does for that metric.


    For example, solid 200x200 image with starting point in the corner
    [...]

    On small pixel fields almost any algorithm is probably not too
    bad. These days any serious algorithm should scale well up
    to at least 4K by 4K, and tested up to at least 16K x 16K.
    Tricks that make some things faster for small images sometimes
    fall on their face when confronted with a larger image. My code
    isn't likely to win many races on small images, but on large
    images I expect it will always be competitive even if it doesn't
    finish in first place.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Scott Lurndal@3:633/280.2 to All on Mon Mar 25 07:48:52 2024
    Reply-To: slp53@pacbell.net

    Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
    scott@slp53.sl.home (Scott Lurndal) writes:

    Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:

    The convetional wisdom is the opposite, But here, conventional wisdom
    fails. Because heaps are unlimited while stacks are not.

    That's not actually true. The size of both are bounded, yes.

    It's certainly possible (in POSIX, anyway) for the stack bounds
    to be unlimited (given sufficient real memory and/or backing
    store) and the heap size to be bounded. See 'setrlimit'.

    The sizes of both heaps and stacks are bounded, because
    pointers have a fixed number of bits. Certainly these
    sizes can be very very large, but they are not unbounded.

    I was referring to the term of art used in POSIX, where
    unlimited simply means that the operating system doesn't
    limit them (and as I pointed out, physical limits, including
    address space size (which is often only 48 bits, regardless
    of the 64-bit pointer space)) will dominate.

    $ ulimit -a
    address space limit (Kibytes) (-M) unlimited
    core file size (blocks) (-c) 0
    cpu time (seconds) (-t) unlimited
    data size (Kibytes) (-d) unlimited
    file size (blocks) (-f) unlimited
    locks (-x) unlimited

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: UsenetServer - www.usenetserver.com (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Mon Mar 25 08:26:53 2024
    On 3/24/2024 1:26 PM, Tim Rentsch wrote:
    Michael S <already5chosen@yahoo.com> writes:

    On Wed, 20 Mar 2024 07:27:38 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Tue, 19 Mar 2024 11:57:53 +0000
    Malcolm McLean <malcolm.arthur.mclean@gmail.com> wrote:
    [...]
    The nice thing about Tim's method is that we can expect that
    performance depends on number of recolored pixels and almost
    nothing else.

    One aspect that I consider a significant plus is my code never
    does poorly. Certainly it isn't the fastest in all cases, but
    it's never abysmally slow.

    To be fair, none of presented algorithms is abysmally slow. When
    compared by number of visited points, they all appear to be within
    factor of 2 or 2.5 of each other.

    Certainly "abysmally slow" is subjective, but working in a large
    pixel field, filling the whole field starting at the center,
    Malcolm's code runs slower than my unoptimized code by a factor of
    10 (and a tad slower than that compared to my optimized code).

    Some of them for some patterns could be 10-15 times slower than
    others, but it does not happen for all patterns and when it
    happens it's because of problematic implementation rather because
    of differences in algorithms.

    In the case of Malcolm's code I think it's the algorithm, because
    it doesn't scale linearly. Malcolm's code runs faster than mine
    for small colorings, but slows down dramatically as the image
    being colored gets bigger.

    The big difference between algorithms is not a speed, but amount of
    auxiliary memory used in the worst case. Your algorithm appears to be
    the best in that department, [...]

    Yes, my unoptimized algorithm was designed to use as little
    memory as possible. The optimized version traded space for
    speed: it runs a little bit faster but incurs a non-trivial cost
    in terms of space used. I think it's still not too bad, an upper
    bound of a small multiple of N for an NxN pixel field.

    But even by that metric, the difference between different
    implementations of the same algorithm is often much bigger than
    difference between algorithms.

    If I am not mistaken the original naive recursive algorithm has a
    space cost that is O( N**2 ) for an NxN pixel field. The big-O
    difference swamps everything else, just like the big-O difference
    in runtime does for that metric.


    For example, solid 200x200 image with starting point in the corner
    [...]

    On small pixel fields almost any algorithm is probably not too
    bad. These days any serious algorithm should scale well up
    to at least 4K by 4K, and tested up to at least 16K x 16K.
    Tricks that make some things faster for small images sometimes
    fall on their face when confronted with a larger image. My code
    isn't likely to win many races on small images, but on large
    images I expect it will always be competitive even if it doesn't
    finish in first place.

    This might be relevant:

    https://developer.download.nvidia.com/video/gputechconf/gtc/2019/presentation/s9111-a-new-direct-connected-component-labeling-and-analysis-algorithm-for-gpus.pdf

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Mon Mar 25 09:04:32 2024
    On Sun, 24 Mar 2024 10:24:45 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Wed, 20 Mar 2024 10:01:10 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    [...]

    Generally, I like your algorithm.
    It was surprising for me that queue can work better than stack, my
    intuition suggested otherwise, but facts are facts.

    Using a stack is like a depth-first search, and a queue is like a
    breadth-first search. For a pixel field of size N x N, doing a
    depth-first search can lead to memory usage of order N**2,
    whereas a breadth-first search has a "frontier" at most O(N).
    Another way to think of it is that breadth-first gets rid of
    visited nodes as fast as it can, but depth-first keeps them
    around for a long time when everything is reachable from anywhere
    (as will be the case in large simple reasons).

    For my test cases the FIFO depth of your algorithm never exceeds min(width,height)*2+2. I wonder if existence of this or similar
    limit can be proven theoretically.

    I believe it is possible to prove the strict FIFO algorithm is
    O(N) for an N x N pixel field, but I haven't tried to do so in
    any rigorous way, nor do I know what the constant is. It does
    seem to be larger than 2.

    As for finding a worst case, try this (expressed in pseudo code):

    let pc = { width/2, height/2 }
    // assume pixel field 'field' starts out as all zeroes
    color 8 "legs" with the value '1' as follows:

    leg from { 1, pc.y-1 } to { pc.x -1, pc.y-1 }
    leg from { 1, pc.y+1 } to { pc.x -1, pc.y+1 }
    leg from { px.x + 1, pc.y-1 } to { width-2, pc.y-1 }
    leg from { px.x + 1, pc.y+1 } to { width-2, pc.y+1 }

    leg from { px.x - 1, 1 } to { px.x -1, pc.y-1 }
    leg from { px.x + 1, 1 } to { px.x +1, pc.y-1 }
    leg from { px.x - 1, pc.y+1 } to { px.x -1, height/2 }
    leg from { px.x + 1, pc.y+1 } to { px.x +1, height/2 }


    So the pixel field should look like this (with longer legs for a
    bigger pixel field), with '-' being 0 and '*' being 1:

    +-----------------------+
    | - - - - - - - - - - - |
    | - - - - * - * - - - - |
    | - - - - * - * - - - - |
    | - - - - * - * - - - - |
    | - * * * * - * * * * - |
    | - - - - - - - - - - - |
    | - * * * * - * * * * - |
    | - - - - * - * - - - - |
    | - - - - * - * - - - - |
    | - - - - * - * - - - - |
    | - - - - - - - - - - - |
    +-----------------------+

    Now start coloring at the center point with a new value
    of 2.


    Yes, I see. It is close to min(width,height)*4.
    Thank you.




    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Mon Mar 25 09:28:44 2024
    On Sun, 24 Mar 2024 13:26:16 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Wed, 20 Mar 2024 07:27:38 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Tue, 19 Mar 2024 11:57:53 +0000
    Malcolm McLean <malcolm.arthur.mclean@gmail.com> wrote:
    [...]
    The nice thing about Tim's method is that we can expect that
    performance depends on number of recolored pixels and almost
    nothing else.

    One aspect that I consider a significant plus is my code never
    does poorly. Certainly it isn't the fastest in all cases, but
    it's never abysmally slow.

    To be fair, none of presented algorithms is abysmally slow. When
    compared by number of visited points, they all appear to be within
    factor of 2 or 2.5 of each other.

    Certainly "abysmally slow" is subjective, but working in a large
    pixel field, filling the whole field starting at the center,
    Malcolm's code runs slower than my unoptimized code by a factor of
    10 (and a tad slower than that compared to my optimized code).

    Some of them for some patterns could be 10-15 times slower than
    others, but it does not happen for all patterns and when it
    happens it's because of problematic implementation rather because
    of differences in algorithms.

    In the case of Malcolm's code I think it's the algorithm, because
    it doesn't scale linearly. Malcolm's code runs faster than mine
    for small colorings, but slows down dramatically as the image
    being colored gets bigger.

    The big difference between algorithms is not a speed, but amount of auxiliary memory used in the worst case. Your algorithm appears to
    be the best in that department, [...]

    Yes, my unoptimized algorithm was designed to use as little
    memory as possible. The optimized version traded space for
    speed: it runs a little bit faster but incurs a non-trivial cost
    in terms of space used. I think it's still not too bad, an upper
    bound of a small multiple of N for an NxN pixel field.

    But even by that metric, the difference between different
    implementations of the same algorithm is often much bigger than
    difference between algorithms.

    If I am not mistaken the original naive recursive algorithm has a
    space cost that is O( N**2 ) for an NxN pixel field. The big-O
    difference swamps everything else, just like the big-O difference
    in runtime does for that metric.


    For example, solid 200x200 image with starting point in the corner
    [...]

    On small pixel fields almost any algorithm is probably not too
    bad. These days any serious algorithm should scale well up
    to at least 4K by 4K, and tested up to at least 16K x 16K.
    Tricks that make some things faster for small images sometimes
    fall on their face when confronted with a larger image. My code
    isn't likely to win many races on small images, but on large
    images I expect it will always be competitive even if it doesn't
    finish in first place.

    You are right. At 1920*1080 except for few special patterns, your
    code is faster than Malcolm's by factor of 1.5x to 1.8. Same for 4K.
    Auxiliary memory arrays of Malcolm are still quite small at these image
    sizes, but speed suffers.
    I wonder if it is a problem of algorithm or of implementation. Since I
    still didn't figure out his idea, I can't improve his implementation in
    order check it.

    One thing that I were not expecting at this bigger pictures, is good performance of simple recursive algorithm. Of course, not of original
    form of it, but of variation with explicit stack.
    For many shapes it has quite large memory footprint and despite that it
    is not slow. Probably the stack has very good locality of reference.

    Here is the code:

    #include <stdlib.h>
    #include <stddef.h>

    typedef unsigned char Color;

    int floodfill4(
    Color *image,
    int width,
    int height,
    int x0,
    int y0,
    Color old,
    Color new)
    {
    if (width <= 0 || height <= 0)
    return 0;

    if (x0 < 0 || x0 >= width || y0 < 0 || y0 >= height)
    return 0;

    const size_t w = width;
    Color* image_end = &image[w*height];

    size_t x = x0;
    Color* row = &image[w*y0];
    if (row[x] != old)
    return 0;

    const ptrdiff_t INITIAL_STACK_SZ = 256;
    unsigned char* stack = malloc(INITIAL_STACK_SZ*sizeof(*stack));
    if (!stack)
    return -1;
    unsigned char* sp = stack;
    unsigned char* end_stack = &stack[INITIAL_STACK_SZ];

    enum { ST_LEFT, ST_RIGHT, ST_UP, ST_DOWN, ST_BEG };

    recursive_call:
    row[x] = new;
    if (sp==end_stack) {
    ptrdiff_t size = sp - stack;
    ptrdiff_t new_size = size+size/2;
    unsigned char* new_stack = realloc(stack, new_size *
    sizeof(*stack)); if (!new_stack) {
    free(stack);
    return -1;
    }
    stack = new_stack;
    sp = &stack[size];
    end_stack = &stack[new_size];
    }

    for (unsigned state = ST_BEG;;) {
    switch (state) {
    case ST_BEG:

    ++x;
    if (x != width) {
    if (row[x] == old) {
    *sp++ = ST_RIGHT; goto recursive_call; // recursive call
    }
    }
    case ST_RIGHT:
    --x;

    if (x > 0) {
    --x;
    if (row[x] == old) {
    *sp++ = ST_LEFT; goto recursive_call; // recursive call
    }
    case ST_LEFT:
    ++x;
    }

    if (row != image) {
    row -= w;
    if (row[x] == old) {
    *sp++ = ST_UP; goto recursive_call; // recursive call
    }
    case ST_UP:
    row += w;
    }

    row += w;
    if (row != image_end) {
    if (row[x] == old) {
    *sp++ = ST_DOWN; goto recursive_call; // recursive call
    }
    case ST_DOWN:
    }
    row -= w;
    break;
    }

    if (sp == stack)
    break;

    state = *--sp; // pop stack (back to caller)
    }

    free(stack);
    return 1; // done
    }







    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Wed Mar 27 02:52:18 2024
    On Mon, 25 Mar 2024 01:28:44 +0300
    Michael S <already5chosen@yahoo.com> wrote:

    On Sun, 24 Mar 2024 13:26:16 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Wed, 20 Mar 2024 07:27:38 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Tue, 19 Mar 2024 11:57:53 +0000
    Malcolm McLean <malcolm.arthur.mclean@gmail.com> wrote:
    [...]
    The nice thing about Tim's method is that we can expect that
    performance depends on number of recolored pixels and almost
    nothing else.

    One aspect that I consider a significant plus is my code never
    does poorly. Certainly it isn't the fastest in all cases, but
    it's never abysmally slow.

    To be fair, none of presented algorithms is abysmally slow. When compared by number of visited points, they all appear to be within
    factor of 2 or 2.5 of each other.

    Certainly "abysmally slow" is subjective, but working in a large
    pixel field, filling the whole field starting at the center,
    Malcolm's code runs slower than my unoptimized code by a factor of
    10 (and a tad slower than that compared to my optimized code).

    Some of them for some patterns could be 10-15 times slower than
    others, but it does not happen for all patterns and when it
    happens it's because of problematic implementation rather because
    of differences in algorithms.

    In the case of Malcolm's code I think it's the algorithm, because
    it doesn't scale linearly. Malcolm's code runs faster than mine
    for small colorings, but slows down dramatically as the image
    being colored gets bigger.

    The big difference between algorithms is not a speed, but amount
    of auxiliary memory used in the worst case. Your algorithm
    appears to be the best in that department, [...]

    Yes, my unoptimized algorithm was designed to use as little
    memory as possible. The optimized version traded space for
    speed: it runs a little bit faster but incurs a non-trivial cost
    in terms of space used. I think it's still not too bad, an upper
    bound of a small multiple of N for an NxN pixel field.

    But even by that metric, the difference between different
    implementations of the same algorithm is often much bigger than difference between algorithms.

    If I am not mistaken the original naive recursive algorithm has a
    space cost that is O( N**2 ) for an NxN pixel field. The big-O
    difference swamps everything else, just like the big-O difference
    in runtime does for that metric.


    For example, solid 200x200 image with starting point in the corner
    [...]

    On small pixel fields almost any algorithm is probably not too
    bad. These days any serious algorithm should scale well up
    to at least 4K by 4K, and tested up to at least 16K x 16K.
    Tricks that make some things faster for small images sometimes
    fall on their face when confronted with a larger image. My code
    isn't likely to win many races on small images, but on large
    images I expect it will always be competitive even if it doesn't
    finish in first place.

    You are right. At 1920*1080 except for few special patterns, your
    code is faster than Malcolm's by factor of 1.5x to 1.8. Same for 4K. Auxiliary memory arrays of Malcolm are still quite small at these
    image sizes, but speed suffers.
    I wonder if it is a problem of algorithm or of implementation. Since I
    still didn't figure out his idea, I can't improve his implementation
    in order check it.

    One thing that I were not expecting at this bigger pictures, is good performance of simple recursive algorithm. Of course, not of original
    form of it, but of variation with explicit stack.
    For many shapes it has quite large memory footprint and despite that
    it is not slow. Probably the stack has very good locality of
    reference.

    Here is the code:
    <snip>


    The most robust code that I found so far that performs well both with
    small pictures and with large and huge, is a variation on the same
    theme of explicit stack, may be, more properly called trace back.
    It operates on 2x2 squares instead of individual pixels.
    The worst case auxiliary memory footprint of this variant is rather big,
    up to picture_size/4 bytes. The code is *not* simple, but complexity
    appears to be necessary for robust performance with various shapes and
    sizes.

    Todo queue based variants have very low memory footprint and perform
    well for as long as recolored shape fits in the fast levels of
    cache hierarchy, but suffer sharp slowdown when shape grows beyond
    that. It seems, the problem of this algorithms is that the front
    of recoloring is interleaved and focus of processing jumps randomly
    across the front which leads to poor locality and to trashing of the
    cache. May be, for huge pictures some sort of priority queue will
    perform better than simple FIFO ? May be, implemented as binary heap? https://en.wikipedia.org/wiki/Binary_heap

    Thought are interesting, but it's unlikely that it could lead to faster
    code than one presented below.

    #include <stdlib.h>
    #include <stddef.h>

    typedef unsigned char Color;

    static __inline
    unsigned check_column(Color *row, size_t x, size_t w, Color *end_image,
    Color old)
    {
    unsigned b = row[x+0] == old ? 1<<0 : 0;
    if (row+w != end_image && row[x+w] == old)
    b |= 1 << 2;
    return b;
    }

    static __inline
    unsigned check_row(Color *row, size_t x, size_t w, Color old)
    {
    unsigned b = row[x+0] == old ? 1<<0 : 0;
    if (x+1 != w && row[x+1] == old)
    b |= 1 << 1;
    return b;
    }

    int floodfill4(
    Color *image,
    int width,
    int height,
    int x0,
    int y0,
    Color old,
    Color new)
    {
    if (width <= 0 || height <= 0)
    return 0;

    if (x0 < 0 || x0 >= width || y0 < 0 || y0 >= height)
    return 0;

    const size_t w = width;

    size_t col0 = x0;
    Color* row0 = &image[w * y0];
    if (row0[col0] != old)
    return 0;

    int offs = 0;
    if (y0 & 1) {
    row0 -= w;
    offs = 2;
    }
    if (col0 & 1) {
    col0 -= 1;
    offs |= 1;
    }

    Color* end_image = &image[w * height];

    const ptrdiff_t INITIAL_STACK_SZ = 256;
    unsigned char* stack = malloc(INITIAL_STACK_SZ*sizeof(*stack));
    if (!stack)
    return -1;
    unsigned char* sp = stack;
    unsigned char* end_stack = &stack[INITIAL_STACK_SZ];

    enum {
    // state
    ST_LEFT, ST_RIGHT, ST_UP, ST_DOWN,
    ST_BEG,
    STATE_BITS = 3,
    // mask
    MSK_B00 = 1 << 2, MSK_B01 = 1 << 3,
    MSK_B10 = 1 << 4, MSK_B11 = 1 << 5,
    MSK_B0x = MSK_B00 | MSK_B01,
    MSK_B1x = MSK_B10 | MSK_B11,
    MSK_Bx0 = MSK_B00 | MSK_B10,
    MSK_Bx1 = MSK_B01 | MSK_B11,
    MSK_Bxx = MSK_Bx0 | MSK_Bx1,
    MSK_BITS = MSK_Bxx,
    // from
    FROM_LEFT = 0 << 6,
    FROM_RIGHT = 1 << 6,
    FROM_UP = 2 << 6,
    FROM_DOWN = 3 << 6,
    FROM_BITS = 3 << 6,
    };

    unsigned bit_mask0 = check_row(row0, col0, w, old)*MSK_B00;
    if (row0+w != end_image)
    bit_mask0 |= check_row(row0+w, col0, w, old)*MSK_B10;
    static const unsigned char kill_diag_tab[4][2] = {
    {MSK_B01 | MSK_B10, ~MSK_B11},
    {MSK_B00 | MSK_B11, ~MSK_B10},
    {MSK_B00 | MSK_B11, ~MSK_B01},
    {MSK_B01 | MSK_B10, ~MSK_B00},
    };
    if ((bit_mask0 & kill_diag_tab[offs][0])==0)
    bit_mask0 &= kill_diag_tab[offs][1];

    for (int rep = 0; rep < 2; ++rep) {
    unsigned bit_mask = bit_mask0;
    Color* row = row0;
    size_t x = col0;
    unsigned from = rep == 0 ? FROM_DOWN : FROM_LEFT;

    recursive_call:
    if (bit_mask & MSK_B00) row[x+0] = new;
    if (bit_mask & MSK_B01) row[x+1] = new;
    if (bit_mask & MSK_B10) row[x+w+0] = new;
    if (bit_mask & MSK_B11) row[x+w+1] = new;

    if (sp==end_stack) {
    ptrdiff_t size = sp - stack;
    ptrdiff_t new_size = size+size/2;
    unsigned char* new_stack = realloc(stack, new_size *
    sizeof(*stack));
    if (!new_stack) {
    free(stack);
    return -1;
    }
    stack = new_stack;
    sp = &stack[size];
    end_stack = &stack[new_size];
    }

    for (unsigned state = ST_BEG;;) {
    switch (state) {
    case ST_BEG:

    if (from != FROM_RIGHT && bit_mask & MSK_Bx1) { // look right
    x += 2;
    if (x != w) {
    unsigned bx0 = check_column(row, x, w, end_image, old);
    if (bx0 & (bit_mask/MSK_B01)) {
    // recursive call
    *sp++ = bit_mask | from | ST_RIGHT;
    bit_mask = bx0*MSK_B00;
    x += 1;
    if (x != w) {
    unsigned bx1 = check_column(row, x, w, end_image, old);
    if (bx0 & bx1)
    bit_mask |= bx1*MSK_B01;
    }
    x -= 1;
    from = FROM_LEFT;
    goto recursive_call;
    }
    }
    case ST_RIGHT:
    x -= 2;
    }

    if (from != FROM_LEFT && bit_mask & MSK_Bx0) { // look left
    if (x > 0) {
    unsigned bx1 = check_column(row, x-1, w, end_image, old);
    if (bx1 & (bit_mask/MSK_B00)) {
    // recursive call
    *sp++ = bit_mask | from | ST_LEFT;
    bit_mask = bx1*MSK_B01;
    x -= 2;
    unsigned bx0 = check_column(row, x, w, end_image, old);
    if (bx0 & bx1)
    bit_mask |= bx0*MSK_B00;
    from = FROM_RIGHT;
    goto recursive_call;
    case ST_LEFT:
    x += 2;
    }
    }
    }

    if (from != FROM_UP && bit_mask & MSK_B0x) { // look up
    if (row != image) {
    row -= w;
    unsigned b1x = check_row(row, x, w, old);
    row -= w;
    if (b1x & (bit_mask/MSK_B00)) {
    // recursive call
    *sp++ = bit_mask | from | ST_UP;
    bit_mask = b1x*MSK_B10;
    unsigned b0x = check_row(row, x, w, old);
    if (b0x & b1x)
    bit_mask |= b0x*MSK_B00;
    from = FROM_DOWN;
    goto recursive_call;
    case ST_UP:
    }
    row += w*2;
    }
    }

    if (from != FROM_DOWN && bit_mask & MSK_B1x) { // look down
    row += w*2;
    if (row != end_image) {
    unsigned b0x = check_row(row, x, w, old);
    if (b0x & (bit_mask/MSK_B10)) {
    // recursive call
    *sp++ = bit_mask | from | ST_DOWN;
    bit_mask = b0x*MSK_B00;
    row += w;
    if (row != end_image) {
    unsigned b1x = check_row(row, x, w, old);
    if (b0x & b1x)
    bit_mask |= b1x*MSK_B10;
    }
    row -= w;
    from = FROM_UP;
    goto recursive_call;
    }
    }
    case ST_DOWN:
    row -= w*2;
    }
    break;
    }

    if (sp == stack)
    break;

    unsigned stack_val = *--sp; // pop stack (back to caller)
    state = stack_val & STATE_BITS;
    bit_mask = stack_val & MSK_BITS;
    from = stack_val & FROM_BITS;
    }
    }

    free(stack);
    return 1; // done
    }







    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Fri Mar 29 16:51:21 2024
    scott@slp53.sl.home (Scott Lurndal) writes:

    Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

    scott@slp53.sl.home (Scott Lurndal) writes:

    Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:

    The convetional wisdom is the opposite, But here, conventional wisdom >>>>> fails. Because heaps are unlimited while stacks are not.

    That's not actually true. The size of both are bounded, yes.

    It's certainly possible (in POSIX, anyway) for the stack bounds
    to be unlimited (given sufficient real memory and/or backing
    store) and the heap size to be bounded. See 'setrlimit'.

    The sizes of both heaps and stacks are bounded, because
    pointers have a fixed number of bits. Certainly these
    sizes can be very very large, but they are not unbounded.

    I was referring to the term of art used in POSIX, where
    unlimited simply means that the operating system doesn't
    limit them [.. elaboration ..]

    The earlier sentence was confusing, as the sentence construction
    suggested "unlimited" was a general term rather than one with a
    specific meaning in POSIX. In any case thank you for the
    education.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Fri Mar 29 17:04:36 2024
    Michael S <already5chosen@yahoo.com> writes:

    [..various fill algorithms and how they scale..]

    One thing that I were not expecting at this bigger pictures, is good performance of simple recursive algorithm. Of course, not of original
    form of it, but of variation with explicit stack.
    For many shapes it has quite large memory footprint and despite that it
    is not slow. Probably the stack has very good locality of reference.

    [algorithm]

    You are indeed a very clever fellow. I'm impressed.

    Intrigued by your idea, I wrote something along the same lines,
    only shorter and (at least for me) a little easier to grok.
    If someone is interested I can post it.

    I see you have also done a revised algorithm based on the same
    idea, but more elaborate (to save on runtime footprint?).
    Still working on formulating a response to that one...

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Sat Mar 30 00:21:41 2024
    On Thu, 28 Mar 2024 23:04:36 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    [..various fill algorithms and how they scale..]

    One thing that I were not expecting at this bigger pictures, is good performance of simple recursive algorithm. Of course, not of
    original form of it, but of variation with explicit stack.
    For many shapes it has quite large memory footprint and despite
    that it is not slow. Probably the stack has very good locality of reference.

    [algorithm]

    You are indeed a very clever fellow. I'm impressed.


    Yes, the use of switch is clever :(
    It more resemble computed GO TO in old FORTRAN or indirect jumps in asm
    than idiomatic C switch. But it is a legal* C.


    Intrigued by your idea, I wrote something along the same lines,
    only shorter and (at least for me) a little easier to grok.
    If someone is interested I can post it.

    If non-trivially different, why not?


    I see you have also done a revised algorithm based on the same
    idea, but more elaborate (to save on runtime footprint?).
    Still working on formulating a response to that one...

    The original purpose of enhancement was to amortize non-trivial and
    probably not very fast call stack emulation logic over more than one
    pixel. 2x2 just happens to be the biggest block that still has very
    simple in-block recoloring logic. ~4x reduction in the size of
    auxiliary memory is just a pleasant side effect.

    Exactly the same 4x reduction in memory size could have been achieved
    with single-pixel variant by using packed array for 2-bit state
    (==trace back) stack elements. But it would be the same or slower than
    original while the enhanced variant is robustly faster than original.

    After implementing the first enhancement I paid attention that at 4K
    size the timing (per pixel) for few of my test cases is significantly
    worse than at smaller images. So, I added another enhancement aiming to minimize cache trashing effects by never looking back at immediate
    parent of current block. The info about location of the parent nicely
    fitted into remaining 2 bits of stack octet.


    ------
    * - the versions I posted are not exactly legal C; they are illegal C non-rejected by gcc. But they can be trivially made into legal C by
    adding semicolon after one of the case labels.




    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Sat Mar 30 17:58:26 2024
    Michael S <already5chosen@yahoo.com> writes:

    On Thu, 28 Mar 2024 23:04:36 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    [..various fill algorithms and how they scale..]

    One thing that I were not expecting at this bigger pictures, is
    good performance of simple recursive algorithm. Of course, not of
    original form of it, but of variation with explicit stack. For
    many shapes it has quite large memory footprint and despite that
    it is not slow. Probably the stack has very good locality of
    reference.

    [algorithm]

    You are indeed a very clever fellow. I'm impressed.

    Yes, the use of switch is clever :(

    In my view the cleverness is how "recursion" is accomplished by a
    means of a combination of using a stack to store a "return address"
    and restoring state by undoing a change rather than storing the
    old value. Using a switch() is just a detail (and to my way of
    thinking how the switch() is done here obscures the basic idea and
    makes the code harder to understand, but never mind that).

    It more resemble computed GO TO in old FORTRAN or indirect jumps
    in asm than idiomatic C switch. But it is a legal* C.

    I did program in FORTRAN briefly but don't remember ever using
    computed GO TO. And yes, I found that missing semicolon and put it
    back. Is there some reason you don't always use -pedantic? I
    pretty much always do.

    Intrigued by your idea, I wrote something along the same lines,
    only shorter and (at least for me) a little easier to grok.
    If someone is interested I can post it.

    If non-trivially different, why not?

    I hope to soon but am unable to right now (and maybe for a week
    or so due to circumstances beyond my control). For sure the
    code is different; whether it is non-trivially different I
    leave for others to judge.

    I see you have also done a revised algorithm based on the same
    idea, but more elaborate (to save on runtime footprint?).
    Still working on formulating a response to that one...

    The original purpose of enhancement was to amortize non-trivial
    and probably not very fast call stack emulation logic over more
    than one pixel. 2x2 just happens to be the biggest block that
    still has very simple in-block recoloring logic. ~4x reduction in
    the size of auxiliary memory is just a pleasant side effect.

    Exactly the same 4x reduction in memory size could have been
    achieved with single-pixel variant by using packed array for 2-bit
    state (==trace back) stack elements. But it would be the same or
    slower than original while the enhanced variant is robustly faster
    than original.

    An alternate idea is to use a 64-bit integer for 32 "top of stack"
    elements, or up to 32 I should say, and a stack with 64-bit values.
    Just an idea, it may not turn out to be useful.

    The few measurements I have done don't show a big difference in
    performance between the two methods. But I admit I wasn't paying
    close attention, and like I said only a few patterns of filling were
    exercised.

    After implementing the first enhancement I paid attention that at
    4K size the timing (per pixel) for few of my test cases is
    significantly worse than at smaller images. So, I added another
    enhancement aiming to minimize cache trashing effects by never
    looking back at immediate parent of current block. The info about
    location of the parent nicely fitted into remaining 2 bits of
    stack octet.

    The idea of not going back to the originator (what you call the
    parent) is something I developed independently before looking at
    your latest code (and mostly I still haven't). Seems like a good
    idea.

    Two further comments.

    One, the new code is a lot more complicated than the previous
    code. I'm not sure the performance gain is worth the cost
    in complexity. What kind of speed improvements do you see,
    in terms of percent?

    Two, and more important, the new algorithm still uses O(NxN) memory
    for an N by N pixel field. We really would like to get that down to
    O(N) memory (and of course run just as fast). Have you looked into
    how that might be done?

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Sat Mar 30 18:54:19 2024
    Michael S <already5chosen@yahoo.com> writes:

    [...]

    The most robust code that I found so far that performs well both
    with small pictures and with large and huge, is a variation on the
    same theme of explicit stack, may be, more properly called trace
    back. It operates on 2x2 squares instead of individual pixels.

    The worst case auxiliary memory footprint of this variant is rather
    big, up to picture_size/4 bytes. The code is *not* simple, but
    complexity appears to be necessary for robust performance with
    various shapes and sizes.

    [...]

    I took a cursory look just now, after reading your other later
    posting. I think I have a general sense, especially in conjunction
    with the explanatory comments.

    I'm still hoping to find a method that is both fast and has
    good memory use, which is to say O(N) for an NxN pixel field.

    Something that would help is to have a library of test cases,
    by which I mean patterns to be colored, so that a set of
    methods could be tried, and timed, over all the patterns in
    the library. Do you have something like that? So far all
    my testing has been ad hoc.

    Incidentally, it looks like your code assumes X varies more rapidly
    than Y, so a "by row" order, whereas my code assumes Y varies more
    rapidly than X, a "by column" order. The difference doesn't matter
    as long as the pixel field is square and the test cases either are
    symmetric about the X == Y axis or duplicate a non-symmetric pattern
    about the X == Y axis. I would like to be able to run comparisons
    between different methods and get usable results without having
    to jump around because of different orientations. I'm not sure
    how to accommodate that.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Sun Mar 31 05:15:06 2024
    On Fri, 29 Mar 2024 23:58:26 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:


    I did program in FORTRAN briefly but don't remember ever using
    computed GO TO. And yes, I found that missing semicolon and put it
    back. Is there some reason you don't always use -pedantic? I
    pretty much always do.



    Just a habit.
    In "real" work, as opposed to hobby, I use gcc almost exclusively for
    small embedded targets and quite often with 3-rd party libraries in
    source form. In such environment rising warnings level above -Wall
    would be counterproductive, because it would be hard to see relevant
    warning behind walls of false alarms.
    May be, for hobby, where I have full control on everything, switching
    to -Wpedantic is not a bad idea.



    An alternate idea is to use a 64-bit integer for 32 "top of stack"
    elements, or up to 32 I should say, and a stack with 64-bit values.
    Just an idea, it may not turn out to be useful.


    That's just a detail of how to do pack/unpack with minimal
    overhead. It does not change the principle that 'packed' version would
    be less memory hungry but on modern PC with GBs of RAM it will not be
    faster than original.
    Memory footprint can directly affect speed when access patterns have
    poor locality or when the rate of access exceeds 10-20 GB/s. In our
    case locality of stack access is very good and the rate of stack
    access, even on ultra fast processor, is less than 1 GB/s.


    The few measurements I have done don't show a big difference in
    performance between the two methods. But I admit I wasn't paying
    close attention, and like I said only a few patterns of filling were exercised.

    After implementing the first enhancement I paid attention that at
    4K size the timing (per pixel) for few of my test cases is
    significantly worse than at smaller images. So, I added another enhancement aiming to minimize cache trashing effects by never
    looking back at immediate parent of current block. The info about
    location of the parent nicely fitted into remaining 2 bits of
    stack octet.

    The idea of not going back to the originator (what you call the
    parent) is something I developed independently before looking at
    your latest code (and mostly I still haven't). Seems like a good
    idea.


    I call it a principle of Lot's wife.
    That is yet another reason to not grow blocks above 2x2.
    For bigger blocks it does not apply.


    Two further comments.

    One, the new code is a lot more complicated than the previous
    code. I'm not sure the performance gain is worth the cost
    in complexity. What kind of speed improvements do you see,
    in terms of percent?


    On my 11 y.o. and not top-of-the-line even then home PC for 4K
    image (3840 x 2160) with cross-in-cross shape that I took from one of
    your previous post, it is 2.43 times faster.
    I don't remember how it compares on more modern systems. Anyway, right
    now I have no test systems more modern than 3 y.o. Zen3.


    Two, and more important, the new algorithm still uses O(NxN) memory
    for an N by N pixel field. We really would like to get that down to
    O(N) memory (and of course run just as fast). Have you looked into
    how that might be done?

    Using this particular principle of not saving (x,y) in auxiliary
    storage, I don't believe that it is possible to have a footprint
    smaller than O(W*H).





    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Sun Mar 31 05:26:57 2024
    On Sat, 30 Mar 2024 00:54:19 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    [...]

    The most robust code that I found so far that performs well both
    with small pictures and with large and huge, is a variation on the
    same theme of explicit stack, may be, more properly called trace
    back. It operates on 2x2 squares instead of individual pixels.

    The worst case auxiliary memory footprint of this variant is rather
    big, up to picture_size/4 bytes. The code is *not* simple, but
    complexity appears to be necessary for robust performance with
    various shapes and sizes.

    [...]

    I took a cursory look just now, after reading your other later
    posting. I think I have a general sense, especially in conjunction
    with the explanatory comments.

    I'm still hoping to find a method that is both fast and has
    good memory use, which is to say O(N) for an NxN pixel field.

    Something that would help is to have a library of test cases,
    by which I mean patterns to be colored, so that a set of
    methods could be tried, and timed, over all the patterns in
    the library. Do you have something like that? So far all
    my testing has been ad hoc.


    I am not 100% sure about the meaning of 'ad hoc', but I'd guess that
    mine are ad hoc too. Below are shapes that I use apart from solid
    rectangles. I run them at 5 sizes: 25x19, 200x200, 1280x720, 1920x1080, 3840x2160. That is certainly not enough for correction tests, but feel
    that it is sufficient for speed tests.

    static void make_standing_snake(
    unsigned char *image,
    int width, int height,
    unsigned char background_c,
    unsigned char pen_c)
    {
    for (int y = 0; y < height; ++y) {
    unsigned char* p = &image[y*width];
    if (y % 2 == 0) {
    memset(p, pen_c, width);
    } else {
    memset(p, background_c, width);
    if (y % 4 == 1)
    p[width-1] = pen_c;
    else
    p[0] = pen_c;
    }
    }
    }

    static void make_prostrate_snake(
    unsigned char *image,
    int width, int height,
    unsigned char background_c,
    unsigned char pen_c)
    {
    memset(image, background_c, sizeof(*image)*width*height);
    // vertical bars
    for (int y = 0; y < height; ++y)
    for (int x = 0; x < width; x += 2)
    image[y*width+x] = pen_c;

    // connect bars at top
    for (int x = 3; x < width; x += 4)
    image[x] = pen_c;

    // connect bars at bottom
    for (int x = 1; x < width; x += 4)
    image[(height-1)*width+x] = pen_c;
    }


    static void make_slalom(
    unsigned char *image,
    int width, int height,
    unsigned char background_c,
    unsigned char pen_c)
    {
    const int n_col = width/3;
    const int n_row = (height-3)/4;

    // top row
    // P B B P P P
    for (int col = 0; col < n_col; ++col) {
    unsigned char c = (col & 1)==0 ? background_c : pen_c;
    image[col*3] = pen_c; image[col*3+1] = c; image[col*3+2] = c;
    }
    for (int x = n_col*3; x < width; ++x)
    image[x] = image[n_col*3-1];

    // main image: consists of 3x4 blocks filled by following pattern
    // P B B
    // P P B
    // B P B
    // P P B
    for (int row = 0; row < n_row; ++row) {
    for (int col = 0; col < n_col; ++col) {
    unsigned char* p = &image[(row*4+1)*width+col*3];
    p[0] = pen_c; p[1] = background_c; p[2] = background_c; p
    += width; p[0] = pen_c; p[1] = pen_c; p[2] =
    background_c; p += width; p[0] = background_c; p[1] = pen_c;
    p[2] = background_c; p += width; p[0] = pen_c; p[1] = pen_c;
    p[2] = background_c; p += width; }
    }

    // near-bottom rows
    // P B B
    for (int y = n_row*4+1; y < height-1; ++y) {
    for (int col = 0; col < n_col; ++col) {
    unsigned char* p = &image[y*width+col*3];
    p[0] = pen_c; p[1] = background_c; p[2] = background_c;
    }
    }

    // bottom row - all P
    // P P P P B B
    unsigned char *b_row = &image[width*(height-1)];
    for (int col = 0; col < n_col; ++col) {
    unsigned char c = (col & 1)==1 ? background_c : pen_c;
    b_row[col*3+0] = pen_c;
    b_row[col*3+1] = c;
    b_row[col*3+2] = c;
    }
    for (int x = n_col*3; x < width; ++x)
    b_row[x] = b_row[n_col*3-1];

    // rightmost columns
    for (int x = n_col*3; x < width; ++x) {
    for (int y = 1; y < height-1; ++y)
    image[y*width+x] = background_c;
    }
    }

    static void make_slalom90(
    unsigned char *image,
    int width, int height,
    unsigned char background_c,
    unsigned char pen_c)
    {
    const int n_col = (width-3)/4;
    const int n_row = height/3;

    // leftmost column
    // P
    // B
    // B
    // P
    // P
    // P
    for (int row = 0; row < n_row; ++row) {
    unsigned char c = (row & 1)==0 ? background_c : pen_c;
    image[(row*3+0)*width] = pen_c;
    image[(row*3+1)*width] = c;
    image[(row*3+2)*width] = c;
    }
    for (int y = n_row*3; y < height; ++y)
    image[y*width] = image[(n_row*3-1)*width];

    // main image: consists of 4x3 blocks filled by following pattern
    // P P B P
    // B P P P
    // B B B B
    for (int row = 0; row < n_row; ++row) {
    for (int col = 0; col < n_col; ++col) {
    unsigned char* p = &image[(row*3*width)+(col*4+1)];
    p[0] = pen_c; p[1] = pen_c; p[2] = background_c;
    p[3] = pen_c; p += width; p[0] = background_c; p[1] = pen_c;
    p[2] = pen_c; p[3] = pen_c; p += width; p[0] = background_c;
    p[1] = background_c; p[2] = background_c; p[3] = background_c; }
    }

    // near-rightmost column
    // P
    // B
    // B
    for (int row = 0; row < n_row; ++row) {
    for (int x = n_col*4+1; x < width-1; ++x) {
    unsigned char* p = &image[row*width*3+x];
    p[0*width] = pen_c;
    p[1*width] = background_c;
    p[2*width] = background_c;
    }
    }

    // rightmost column
    // P
    // P
    // P
    // P
    // B
    // B
    unsigned char *r_col = &image[width-1];
    for (int row = 0; row < n_row; ++row) {
    unsigned char c = (row & 1)==1 ? background_c : pen_c;
    r_col[(row*3+0)*width] = pen_c;
    r_col[(row*3+1)*width] = c;
    r_col[(row*3+2)*width] = c;
    }
    for (int y = n_row*3; y < height; ++y)
    r_col[y*width] = r_col[(n_row*3-1)*width];

    // bottom rows
    for (int y = n_row*3; y < height; ++y) {
    for (int x = 1; x < width-1; ++x)
    image[y*width+x] = background_c;
    }
    }

    static void make_crosss_in_cross(
    unsigned char* image,
    int width,
    int height,
    int xc,
    int yc,
    unsigned char background_c,
    unsigned char pen_c)
    {
    memset(image, pen_c, width*height);

    if (xc > 1 && xc+1 < width-1 && yc > 1 && yc+1 < height-1) {
    memset(&image[(yc-1)*width+1], background_c, xc-1);
    memset(&image[(yc+1)*width+1], background_c, xc-1);
    memset(&image[(yc-1)*width+xc+1], background_c, width-xc-2);
    memset(&image[(yc+1)*width+xc+1], background_c, width-xc-2);
    for (int y = 1; y < yc; ++y) {
    image[y*width+xc-1] = background_c;
    image[y*width+xc+1] = background_c;
    }
    for (int y = yc+1; y < height-1; ++y) {
    image[y*width+xc-1] = background_c;
    image[y*width+xc+1] = background_c;
    }
    }
    }


    Incidentally, it looks like your code assumes X varies more rapidly
    than Y, so a "by row" order, whereas my code assumes Y varies more
    rapidly than X, a "by column" order.

    It is not so much about what I assume as about what is cheaper for
    CPU hardware.

    The difference doesn't matter
    as long as the pixel field is square and the test cases either are
    symmetric about the X == Y axis or duplicate a non-symmetric pattern
    about the X == Y axis. I would like to be able to run comparisons
    between different methods and get usable results without having
    to jump around because of different orientations. I'm not sure
    how to accommodate that.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Sun Mar 31 05:59:03 2024
    Michael S <already5chosen@yahoo.com> writes:

    On Thu, 28 Mar 2024 23:04:36 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Intrigued by your idea, I wrote something along the same lines,
    only shorter and (at least for me) a little easier to grok.
    If someone is interested I can post it.

    If non-trivially different, why not?

    Here is the code:

    void
    stack_fill( UI w0, UI h0, UC pixels[], Point p0, UC old, UC new ){
    U64 w = ( assert( w0 > 0 ), w0 );
    U64 h = ( assert( h0 > 0 ), h0 );
    U64 px = ( assert( p0.x < w ), p0.x );
    U64 py = ( assert( p0.y < h ), p0.y );

    UC *x0 = ( assert( pixels ), pixels );
    UC *x = x0 + px*h;
    UC *xm = x0 + h*w - h;

    U64 y0 = 0;
    U64 y = py;
    U64 ym = h-1;

    UC *s0 = malloc( sizeof *s0 );
    UC *s = s0;
    UC *sn = s0 ? s0+1 : s0;

    if( s0 && x[y] == old ) do {
    x[y] = new;
    if( s == sn ){
    U64 s_offset = s - s0;
    U64 n = (sn-s0+1) *3 /2;
    UC *new_s0 = realloc( s0, n * sizeof *new_s0 );

    if( ! new_s0 ) break;
    s0 = new_s0, s = s0 + s_offset, sn = s0 + n;
    }

    if( y < ym && x[y+1] == old ){
    y += 1, *s++ = 2; continue; UNDO_UP:
    y -= 1;
    }
    if( y > y0 && x[y-1] == old ){
    y -= 1, *s++ = 3; continue; UNDO_DOWN:
    y += 1;
    }
    if( x < xm && y[x+h] == old ){
    x += h, *s++ = 0; continue; UNDO_LEFT:
    x -= h;
    }
    if( x > x0 && y[x-h] == old ){
    x -= h, *s++ = 1; continue; UNDO_RIGHT:
    x += h;
    }

    if( s == s0 ) break;

    switch( *--s & 3 ){
    case 0: goto UNDO_LEFT;
    case 1: goto UNDO_RIGHT;
    case 2: goto UNDO_UP;
    case 3: goto UNDO_DOWN;
    }
    } while( 1 );

    free( s0 );
    }

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Sun Mar 31 19:54:38 2024
    On Sat, 30 Mar 2024 21:15:06 +0300
    Michael S <already5chosen@yahoo.com> wrote:

    On Fri, 29 Mar 2024 23:58:26 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:


    One, the new code is a lot more complicated than the previous
    code. I'm not sure the performance gain is worth the cost
    in complexity. What kind of speed improvements do you see,
    in terms of percent?


    On my 11 y.o. and not top-of-the-line even then home PC for 4K
    image (3840 x 2160) with cross-in-cross shape that I took from one of
    your previous post, it is 2.43 times faster.
    I don't remember how it compares on more modern systems. Anyway, right
    now I have no test systems more modern than 3 y.o. Zen3.



    I tested on newer hardware - Intel Coffee Lake (Xeon-E 2176G) and AMD
    Zen3 (EPYC 7543P).
    Here I no longer see significant drop in speed of the 1x1 variant at 4K
    size, but I still see that more complicated variant provides nice speed
    up. Up to 1.56x on Coffee Lake and up to 3x on Zen3.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Sat Apr 6 01:30:33 2024
    On Sun, 24 Mar 2024 10:24:45 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Wed, 20 Mar 2024 10:01:10 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    [...]

    Generally, I like your algorithm.
    It was surprising for me that queue can work better than stack, my
    intuition suggested otherwise, but facts are facts.

    Using a stack is like a depth-first search, and a queue is like a
    breadth-first search. For a pixel field of size N x N, doing a
    depth-first search can lead to memory usage of order N**2,
    whereas a breadth-first search has a "frontier" at most O(N).
    Another way to think of it is that breadth-first gets rid of
    visited nodes as fast as it can, but depth-first keeps them
    around for a long time when everything is reachable from anywhere
    (as will be the case in large simple reasons).

    For my test cases the FIFO depth of your algorithm never exceeds min(width,height)*2+2. I wonder if existence of this or similar
    limit can be proven theoretically.

    I believe it is possible to prove the strict FIFO algorithm is
    O(N) for an N x N pixel field, but I haven't tried to do so in
    any rigorous way, nor do I know what the constant is. It does
    seem to be larger than 2.


    It seems that in worst case the strict FIFO algorithm is the same as
    the rest of them, i.e. O(NN) where NN is the number of re-colored
    points. Below is an example of the shape for which I measured memory consumption for 3840x2160 image almost exactly 4x as much as for
    1920x1080.

    static void make_fractal_tree_recursive(
    unsigned char* image,
    int width,
    int nx,
    int ny,
    unsigned char pen_c)
    {
    if (nx < 3 && ny < 3) {
    // small rectangle - solid fill
    for (int y = 0; y < ny; ++y)
    for (int x = 0; x < nx; ++x)
    image[width*y+x] = pen_c;
    return;
    }
    if (nx >= ny) {
    int xc = (nx-1)/2;
    if (xc - 1 > 0) { // left sub-plot
    make_fractal_tree_recursive(image, width, xc - 1, ny, pen_c);
    }
    if (xc + 2 < nx) { // right sub-plot
    make_fractal_tree_recursive(&image[xc+2], width,
    nx - (xc + 2), ny, pen_c);
    }
    // draw vertical cross
    for (int y = 0; y < ny; ++y)
    image[width*y+xc] = pen_c;
    int yc = (ny-1)/2;
    image[width*yc+xc-1] = pen_c;
    image[width*yc+xc+1] = pen_c;
    } else {
    int yc = (ny-1)/2;
    if (yc - 1 > 0) { // upper sub-plot
    make_fractal_tree_recursive(image, width, nx, yc - 1, pen_c);
    }
    if (yc + 2 < ny) { // lower sub-plot
    make_fractal_tree_recursive(&image[(yc+2)*width], width, nx,
    ny -(yc + 2), pen_c);
    }
    // draw horizontal cross
    for (int x = 0; x < nx; ++x)
    image[width*yc+x] = pen_c;
    int xc = (nx-1)/2;
    image[width*(yc-1)+xc] = pen_c;
    image[width*(yc+1)+xc] = pen_c;
    }
    }

    static void make_fractal_tree(
    unsigned char* image,
    int width,
    int height,
    unsigned char background_c,
    unsigned char pen_c)
    {
    memset(image, background_c, width*height);
    make_fractal_tree_recursive(image, width, width, height, pen_c);
    }



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Tue Apr 9 18:00:34 2024
    Michael S <already5chosen@yahoo.com> writes:

    On Sat, 30 Mar 2024 00:54:19 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
    [...]
    Something that would help is to have a library of test cases,
    by which I mean patterns to be colored, so that a set of
    methods could be tried, and timed, over all the patterns in
    the library. Do you have something like that? So far all
    my testing has been ad hoc.

    I am not 100% sure about the meaning of 'ad hoc', but I'd guess
    that mine are ad hoc too. Below are shapes that I use apart from
    solid rectangles. I run them at 5 sizes: 25x19, 200x200,
    1280x720, 1920x1080, 3840x2160. That is certainly not enough for
    correction tests, but feel that it is sufficient for speed tests.

    [code]

    I got these, thank you.

    Here is a pattern generating function I wrote for my own
    testing. Disclaimer: slightly changed from my original
    source, hopefully any errors inadvertently introduced can
    be corrected easily. Also, it uses the value 0 for the
    background and the value 1 for the pattern to be colored.

    #include <math.h>
    #include <stddef.h>
    #include <string.h>

    typedef unsigned char Pixel;

    extern void
    ellipse_with_hole( Pixel *field, unsigned w, unsigned h ){
    size_t i, j;
    double wc = w/2, hc = h/2;

    double a = (w > h ? wc : hc) -1;
    double b = (w > h ? hc : wc) -1;

    double b3 = 1+6*b/8;
    double radius = b/2.5;
    double cx = w > h ? wc : b3+1;
    double cy = w > h ? b3+1 : hc;

    double focus = sqrt( a*a - b*b );
    double f1x = w > h ? wc - focus : wc;
    double f1y = w > h ? hc : hc - focus;
    double f2x = w > h ? wc + focus : wc;
    double f2y = w > h ? hc : hc + focus;

    memset( field, 0, w*h );

    for( i = 0; i < w; i++ ){
    for( j = 0; j < h; j++ ){
    double dx = i - cx, dy = j - cy;
    double r2 = radius * radius;
    if( dx * dx + dy*dy <= r2 ) continue;
    double dx1 = i - f1x, dy1 = j - f1y;
    double dx2 = i - f2x, dy2 = j - f2y;
    double sum2 = a*2;
    double d1 = sqrt( dx1*dx1 + dy1*dy1 );
    double d2 = sqrt( dx2*dx2 + dy2*dy2 );
    if( d1 + d2 > 2*a ) continue;
    field[ i+j*w ] = 1;
    }}
    }

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Tue Apr 9 18:55:39 2024
    Michael S <already5chosen@yahoo.com> writes:

    On Fri, 29 Mar 2024 23:58:26 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    I did program in FORTRAN briefly but don't remember ever using
    computed GO TO. And yes, I found that missing semicolon and put it
    back. Is there some reason you don't always use -pedantic? I
    pretty much always do.

    Just a habit.
    In "real" work, as opposed to hobby, I use gcc almost exclusively for
    small embedded targets and quite often with 3-rd party libraries in
    source form. In such environment rising warnings level above -Wall
    would be counterproductive, because it would be hard to see relevant
    warning behind walls of false alarms.
    May be, for hobby, where I have full control on everything, switching
    to -Wpedantic is not a bad idea.

    My experience with third party libraries is that sometimes they use
    extensions, probably mostly gcc-isms. Not much to be done in that
    case. Of course turning on -pedantic could be done selectively.

    It might be worth an experiment of turning off -Wall while turning
    on -pedantic to see how big or how little the problem is.

    The idea of not going back to the originator (what you call the
    parent) is something I developed independently before looking at
    your latest code (and mostly I still haven't). Seems like a good
    idea.

    I call it a principle of Lot's wife.
    That is yet another reason to not grow blocks above 2x2.
    For bigger blocks it does not apply.

    Here is an updated version of my "stacking" code. On my test
    system (and I don't even know exactly what CPU it has, probably
    about 5 years old) this code runs about 30% faster than your 2x2
    version, averaged across all patterns and all sizes above the
    smallest ones (25x19 and 19x25).

    #include <assert.h>
    #include <stdlib.h>

    typedef unsigned char UC, Color;
    typedef size_t Index, Count;
    typedef struct { Index x, y; } Point;

    extern Count
    stack_plus( UC *field, Index w, Index h, Point p0, Color old, Color new ){
    Index px = ( assert( p0.x < w ), p0.x );
    Index py = ( assert( p0.y < h ), p0.y );

    Index x0 = 0;
    Index x = px;
    Index xm = w-1;

    UC *y0 = field;
    UC *y = y0 + py*w;
    UC *ym = y0 + h*w - w;

    UC *s0 = malloc( 8 * sizeof *s0 );
    UC *s = s0;
    UC *sn = s0 ? s0+8 : s0;

    Count r = 0;

    if( s0 ) goto START_FOUR;

    while( s != s0 ){
    switch( *--s & 15 ){
    case 0: goto UNDO_START_LEFT;
    case 1: goto UNDO_START_RIGHT;
    case 2: goto UNDO_START_UP;
    case 3: goto UNDO_START_DOWN;

    case 4: goto UNDO_LEFT_DOWN;
    case 5: goto UNDO_LEFT_LEFT;
    case 6: goto UNDO_LEFT_UP;

    case 7: goto UNDO_UP_LEFT;
    case 8: goto UNDO_UP_UP;
    case 9: goto UNDO_UP_RIGHT;

    case 10: goto UNDO_RIGHT_UP;
    case 11: goto UNDO_RIGHT_RIGHT;
    case 12: goto UNDO_RIGHT_DOWN;

    case 13: goto UNDO_DOWN_RIGHT;
    case 14: goto UNDO_DOWN_DOWN;
    case 15: goto UNDO_DOWN_LEFT;
    }

    START_FOUR:
    if( y[x] != old ) continue;
    y[x] = new; r++;
    if( x < xm && y[x+1] == old ){
    x += 1, *s++ = 0; goto START_LEFT; UNDO_START_LEFT:
    x -= 1;
    }
    if( x > x0 && y[x-1] == old ){
    x -= 1, *s++ = 1; goto START_RIGHT; UNDO_START_RIGHT:
    x += 1;
    }
    if( y < ym && x[y+w] == old ){
    y += w, *s++ = 2; goto START_UP; UNDO_START_UP:
    y -= w;
    }
    if( y > y0 && x[y-w] == old ){
    y -= w, *s++ = 3; goto START_DOWN; UNDO_START_DOWN:
    y += w;
    }
    continue;

    START_LEFT:
    y[x] = new; r++;
    if( s == sn ){
    Index s_offset = s - s0;
    Index n = (sn-s0+1) *3 /2;
    UC *new_s0 = realloc( s0, n * sizeof *new_s0 );

    if( ! new_s0 ) break;
    s0 = new_s0, s = s0 + s_offset, sn = s0 + n;
    }
    if( x < xm && y[x+1] == old ){
    x += 1, *s++ = 5; goto START_LEFT; UNDO_LEFT_LEFT:
    x -= 1;
    }
    if( y > y0 && x[y-w] == old ){
    y -= w, *s++ = 4; goto START_DOWN; UNDO_LEFT_DOWN:
    y += w;
    }
    if( y < ym && x[y+w] == old ){
    y += w, *s++ = 6; goto START_UP; UNDO_LEFT_UP:
    y -= w;
    }
    continue;

    START_UP:
    y[x] = new; r++;
    if( s == sn ){
    Index s_offset = s - s0;
    Index n = (sn-s0+1) *3 /2;
    UC *new_s0 = realloc( s0, n * sizeof *new_s0 );

    if( ! new_s0 ) break;
    s0 = new_s0, s = s0 + s_offset, sn = s0 + n;
    }
    if( x < xm && y[x+1] == old ){
    x += 1, *s++ = 7; goto START_LEFT; UNDO_UP_LEFT:
    x -= 1;
    }
    if( x > x0 && y[x-1] == old ){
    x -= 1, *s++ = 9; goto START_RIGHT; UNDO_UP_RIGHT:
    x += 1;
    }
    if( y < ym && x[y+w] == old ){
    y += w, *s++ = 8; goto START_UP; UNDO_UP_UP:
    y -= w;
    }
    continue;

    START_RIGHT:
    y[x] = new; r++;
    if( s == sn ){
    Index s_offset = s - s0;
    Index n = (sn-s0+1) *3 /2;
    UC *new_s0 = realloc( s0, n * sizeof *new_s0 );

    if( ! new_s0 ) break;
    s0 = new_s0, s = s0 + s_offset, sn = s0 + n;
    }
    if( x > x0 && y[x-1] == old ){
    x -= 1, *s++ = 11; goto START_RIGHT; UNDO_RIGHT_RIGHT:
    x += 1;
    }
    if( y < ym && x[y+w] == old ){
    y += w, *s++ = 10; goto START_UP; UNDO_RIGHT_UP:
    y -= w;
    }
    if( y > y0 && x[y-w] == old ){
    y -= w, *s++ = 12; goto START_DOWN; UNDO_RIGHT_DOWN:
    y += w;
    }
    continue;

    START_DOWN:
    y[x] = new; r++;
    if( s == sn ){
    Index s_offset = s - s0;
    Index n = (sn-s0+1) *3 /2;
    UC *new_s0 = realloc( s0, n * sizeof *new_s0 );

    if( ! new_s0 ) break;
    s0 = new_s0, s = s0 + s_offset, sn = s0 + n;
    }
    if( x > x0 && y[x-1] == old ){
    x -= 1, *s++ = 13; goto START_RIGHT; UNDO_DOWN_RIGHT:
    x += 1;
    }
    if( x < xm && y[x+1] == old ){
    x += 1, *s++ = 15; goto START_LEFT; UNDO_DOWN_LEFT:
    x -= 1;
    }
    if( y > y0 && x[y-w] == old ){
    y -= w, *s++ = 14; goto START_DOWN; UNDO_DOWN_DOWN:
    y += w;
    }
    continue;

    }

    return free( s0 ), r;
    }

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Tue Apr 9 19:32:31 2024
    Michael S <already5chosen@yahoo.com> writes:

    On Sat, 30 Mar 2024 21:15:06 +0300
    Michael S <already5chosen@yahoo.com> wrote:

    On Fri, 29 Mar 2024 23:58:26 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    One, the new code is a lot more complicated than the previous
    code. I'm not sure the performance gain is worth the cost
    in complexity. What kind of speed improvements do you see,
    in terms of percent?

    On my 11 y.o. and not top-of-the-line even then home PC for 4K
    image (3840 x 2160) with cross-in-cross shape that I took from one of
    your previous post, it is 2.43 times faster.
    I don't remember how it compares on more modern systems. Anyway, right
    now I have no test systems more modern than 3 y.o. Zen3.

    I tested on newer hardware - Intel Coffee Lake (Xeon-E 2176G) and AMD
    Zen3 (EPYC 7543P).
    Here I no longer see significant drop in speed of the 1x1 variant at 4K
    size, but I still see that more complicated variant provides nice speed
    up. Up to 1.56x on Coffee Lake and up to 3x on Zen3.

    On my test system the numbers are closer and also more evenly
    balanced: ratios range from about 0.70 to about 1.40, roughly
    evenly split with the 2x2 version somewhat better. There was
    one outlier at approximately 1.48. More precisely, the ratios
    have an average of 1.06 (which means the 1x1 version is about
    6 percent slower on average), with a standard deviation of 0.21.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Thu Apr 11 12:47:11 2024
    Michael S <already5chosen@yahoo.com> writes:

    On Sun, 24 Mar 2024 10:24:45 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Wed, 20 Mar 2024 10:01:10 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    [...]

    Generally, I like your algorithm.
    It was surprising for me that queue can work better than stack, my
    intuition suggested otherwise, but facts are facts.

    Using a stack is like a depth-first search, and a queue is like a
    breadth-first search. For a pixel field of size N x N, doing a
    depth-first search can lead to memory usage of order N**2,
    whereas a breadth-first search has a "frontier" at most O(N).
    Another way to think of it is that breadth-first gets rid of
    visited nodes as fast as it can, but depth-first keeps them
    around for a long time when everything is reachable from anywhere
    (as will be the case in large simple reasons).

    For my test cases the FIFO depth of your algorithm never exceeds
    min(width,height)*2+2. I wonder if existence of this or similar
    limit can be proven theoretically.

    I believe it is possible to prove the strict FIFO algorithm is
    O(N) for an N x N pixel field, but I haven't tried to do so in
    any rigorous way, nor do I know what the constant is. It does
    seem to be larger than 2.

    Before I do anything else I should correct a bug in my earlier
    FIFO algorithm. The initialization of the variable jx should
    read

    Index const jx = used*3 < open ? k : j+open/3 &m;

    rather than what it used to. (The type may have changed but that
    is incidental; what matters is the value of the initializing
    expression.) I don't know what I was thinking when I wrote the
    previous version, it's just completely wrong.

    It seems that in worst case the strict FIFO algorithm is the same as
    the rest of them, i.e. O(NN) where NN is the number of re-colored
    points. Below is an example of the shape for which I measured memory consumption for 3840x2160 image almost exactly 4x as much as for
    1920x1080.

    I agree, the empirical evidence here and in my own tests is quite
    compelling.

    That said, the constant factor for the FIFO algorithm is lower
    than the stack-based algorithms, even taking into account the
    difference in sizes for queue and stack elements. Moreover cases
    where FIFO algorithms are O( NxN ) are unusual and sparse,
    whereas the stack-based algorithms tend to use a lot of memory
    in lots of common and routine cases. On the average FIFO
    algorithms typically use a lot less memory (or so I conjecture).

    [code to generate fractal tree pattern]

    Thank you for this. I incorporated it into my set of test
    patterns more or less as soon as it was posted.

    Now that I have taken some time to play around with different
    algorithms and have been more systematic in doing speed
    comparisons between different algorithms, on different patterns,
    and with a good range of sizes, I have some general thoughts
    to offer.

    Stack-based methods tend to do well on long skinny patterns and
    tend to do not as well on fatter patterns such as circles or
    squares. The fractal pattern is ideal for a stack-based method.
    Conversely, patterns that are mostly solid shapes don't fare as
    well under stack-based methods, at least not the ones that have
    been posted in this thread, and also they tend to use more memory
    in those cases.

    I've been playing around with a more elaborate, mostly FIFO
    method, in hopes of getting something that offers the best
    of both worlds. The results so far are encouraging, but a
    fair amount of tuning has been necessary (and perhaps more
    still is), and comparisons have been done on just the one
    test server I have available. So I don't know how well it
    would hold up on other hardware, including especially more
    recent hardware. Under these circumstances I feel it is
    premature to post actual code, especially since the code
    is still in flux.

    This topic has been more interesting that I was expecting, and
    also more challenging. I have a strong rule against writing
    functions more than about 60 lines long. For the problem of
    writing an acceptably quick flood-fill algorithm, I think it would
    at the very least be a lot of work to write code to do that while
    still observing a limit on function length of even 100 lines, let
    alone 60.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Thu Apr 11 22:20:33 2024
    On Wed, 10 Apr 2024 19:47:11 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Sun, 24 Mar 2024 10:24:45 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Wed, 20 Mar 2024 10:01:10 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    [...]

    Generally, I like your algorithm.
    It was surprising for me that queue can work better than stack,
    my intuition suggested otherwise, but facts are facts.

    Using a stack is like a depth-first search, and a queue is like a
    breadth-first search. For a pixel field of size N x N, doing a
    depth-first search can lead to memory usage of order N**2,
    whereas a breadth-first search has a "frontier" at most O(N).
    Another way to think of it is that breadth-first gets rid of
    visited nodes as fast as it can, but depth-first keeps them
    around for a long time when everything is reachable from anywhere
    (as will be the case in large simple reasons).

    For my test cases the FIFO depth of your algorithm never exceeds
    min(width,height)*2+2. I wonder if existence of this or similar
    limit can be proven theoretically.

    I believe it is possible to prove the strict FIFO algorithm is
    O(N) for an N x N pixel field, but I haven't tried to do so in
    any rigorous way, nor do I know what the constant is. It does
    seem to be larger than 2.

    Before I do anything else I should correct a bug in my earlier
    FIFO algorithm. The initialization of the variable jx should
    read

    Index const jx = used*3 < open ? k : j+open/3 &m;


    I lost track, sorry. I can not find your code that contains line
    similar to this.
    Can you point to specific post?

    rather than what it used to. (The type may have changed but that
    is incidental; what matters is the value of the initializing
    expression.) I don't know what I was thinking when I wrote the
    previous version, it's just completely wrong.

    It seems that in worst case the strict FIFO algorithm is the same as
    the rest of them, i.e. O(NN) where NN is the number of re-colored
    points. Below is an example of the shape for which I measured
    memory consumption for 3840x2160 image almost exactly 4x as much as
    for 1920x1080.

    I agree, the empirical evidence here and in my own tests is quite
    compelling.


    BTW, I am no longer agree with myself about "the rest of them".
    By now, I know at least one method that is O(W*log(H)). It is even
    quite fast for majority of my test shapes. Unfortunately, [in its
    current form] it is abysmally slow (100x) for minority of tests.
    [In it's current form] it has other disadvantages as well like
    consuming non-trivial amount of memory when handling small spot in the
    big image. But that can be improved. I am less sure that worst-case
    speed can be improved enough to make it generally acceptable.

    I think, I said enough for you to figure out a general principle of
    this algorithm. I don't want to post code here before I try few
    improvements.

    That said, the constant factor for the FIFO algorithm is lower
    than the stack-based algorithms, even taking into account the
    difference in sizes for queue and stack elements. Moreover cases
    where FIFO algorithms are O( NxN ) are unusual and sparse,
    whereas the stack-based algorithms tend to use a lot of memory
    in lots of common and routine cases. On the average FIFO
    algorithms typically use a lot less memory (or so I conjecture).

    [code to generate fractal tree pattern]

    Thank you for this. I incorporated it into my set of test
    patterns more or less as soon as it was posted.

    Now that I have taken some time to play around with different
    algorithms and have been more systematic in doing speed
    comparisons between different algorithms, on different patterns,
    and with a good range of sizes, I have some general thoughts
    to offer.

    Stack-based methods tend to do well on long skinny patterns and
    tend to do not as well on fatter patterns such as circles or
    squares. The fractal pattern is ideal for a stack-based method.
    Conversely, patterns that are mostly solid shapes don't fare as
    well under stack-based methods, at least not the ones that have
    been posted in this thread, and also they tend to use more memory
    in those cases.


    Indeed, with solid shapes it uses more memory. But at least in my tests
    on my hardware with this sort of shapes it is easily faster than
    anything else. The difference vs the best of the rest is especially big
    at 4K images on AMD Zen3 based hardware, but even on Intel Skylake which generally serves as equalizer between different algorithms, the speed
    advantage of 2x2 stack is significant.

    I've been playing around with a more elaborate, mostly FIFO
    method, in hopes of getting something that offers the best
    of both worlds. The results so far are encouraging, but a
    fair amount of tuning has been necessary (and perhaps more
    still is), and comparisons have been done on just the one
    test server I have available. So I don't know how well it
    would hold up on other hardware, including especially more
    recent hardware. Under these circumstances I feel it is
    premature to post actual code, especially since the code
    is still in flux.

    This topic has been more interesting that I was expecting, and
    also more challenging.

    That's not the first time in my practice where problems with simple
    formulation begots interesting challenges.
    Didn't Donald Knuth wrote 300 or 400 pages about sorting and still
    ended up quite far away from exhausting the topic?

    I have a strong rule against writing
    functions more than about 60 lines long. For the problem of
    writing an acceptably quick flood-fill algorithm, I think it would
    at the very least be a lot of work to write code to do that while
    still observing a limit on function length of even 100 lines, let
    alone 60.

    So why not break it down to smaller pieces ?
    Myself, I have no rules. In my real work I am quite happy with
    dispatchers of network messages that are 250-300 lines long. But if I
    had this sort of rules, I'd certainly decompose.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Fri Apr 12 14:06:38 2024
    Michael S <already5chosen@yahoo.com> writes:

    (I'm replying in pieces.)

    On Wed, 10 Apr 2024 19:47:11 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Before I do anything else I should correct a bug in my earlier
    FIFO algorithm. The initialization of the variable jx should
    read

    Index const jx = used*3 < open ? k : j+open/3 &m;

    I lost track, sorry. I can not find your code that contains line
    similar to this.
    Can you point to specific post?

    Easier for me just to repost the corrected algorithm. The
    type UC is an unsigned char, the types Index and Count are
    size_t (or maybe unsigned long long), the type U32 is a
    32-bit unsigned type.

    Please excuse any minor glitches, I have done some hand
    editing to take out various bits of diagnostic code.

    extern Count
    fifo_fill( UC *field, Index w, Index h, Point p0, UC old, UC new ){
    Index const xm = w-1;
    Index const ym = h-1;

    Index j = 0;
    Index k = 0;
    Index n = 1u << 10;
    Index m = n-1;
    U32 *todo = malloc( n * sizeof *todo );
    Index x = p0.x;
    Index y = p0.y;

    if( !todo || x >= w || y >= h || field[ x+y*w ] != old ) return 0;

    todo[ k++ ] = x<<16 | y;

    while( j != k ){
    Index used = j < k ? k-j : k+n-j;
    Index open = n - used;
    if( open < used/16 ){
    Index new_n = n*2;
    Index new_m = new_n-1;
    Index new_j = j < k ? j : j+n;
    U32 *t = realloc( todo, new_n * sizeof *t );
    if( ! t ) break;
    if( j != new_j ) memcpy( t+new_j, t+j, (n-j) * sizeof *t );
    todo = t, n = new_n, m = new_m, j = new_j, open = n-used;
    }
    assert( (k-j&m) == used && open+used == n );

    Index const jx = used*3 < open ? k : j+open/3 &m; // here it is!
    while( j != jx ){
    if( (k-j&m) > mm ) mm = k-j&m;
    U32 p = todo[j]; j = j+1 &m;
    x = p >> 16, y = p & 0xFFFF;
    if( x > 0 && field[ x-1 + y*w ] == old ){
    todo[k] = x-1<<16 | y, k = k+1&m, field[ x-1 + y*w ] = new;
    }
    if( y > 0 && field[ x + (y-1)*w ] == old ){
    todo[k] = x<<16 | y-1, k = k+1&m, field[ x + (y-1)*w ] = new;
    }
    if( x < xm && field[ x+1 + y*w ] == old ){
    todo[k] = x+1<<16 | y, k = k+1&m, field[ x+1 + y*w ] = new;
    }
    if( y < ym && field[ x + (y+1)*w ] == old ){
    todo[k] = x<<16 | y+1, k = k+1&m, field[ x + (y+1)*w ] = new;
    }
    }
    }

    return free( todo ), 0;
    }

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Fri Apr 12 14:55:22 2024
    Michael S <already5chosen@yahoo.com> writes:

    On Wed, 10 Apr 2024 19:47:11 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    It seems that in worst case the strict FIFO algorithm is the same
    as the rest of them, i.e. O(NN) where NN is the number of
    re-colored points. Below is an example of the shape for which I
    measured memory consumption for 3840x2160 image almost exactly 4x
    as much as for 1920x1080.

    I agree, the empirical evidence here and in my own tests is quite
    compelling.

    BTW, I am no longer agree with myself about "the rest of them".
    By now, I know at least one method that is O(W*log(H)). It is even
    quite fast for majority of my test shapes. Unfortunately, [in its
    current form] it is abysmally slow (100x) for minority of tests.
    [In it's current form] it has other disadvantages as well like
    consuming non-trivial amount of memory when handling small spot in the
    big image. But that can be improved. I am less sure that worst-case
    speed can be improved enough to make it generally acceptable.

    I think, I said enough for you to figure out a general principle of
    this algorithm. I don't want to post code here before I try few improvements.

    Thank you for the implied compliment. At this point I think the
    probability that I will figure it out anytime soon is pretty low.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Fri Apr 12 15:09:51 2024
    Michael S <already5chosen@yahoo.com> writes:

    On Wed, 10 Apr 2024 19:47:11 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Stack-based methods tend to do well on long skinny patterns and
    tend to do not as well on fatter patterns such as circles or
    squares. The fractal pattern is ideal for a stack-based method.
    Conversely, patterns that are mostly solid shapes don't fare as
    well under stack-based methods, at least not the ones that have
    been posted in this thread, and also they tend to use more memory
    in those cases.

    Indeed, with solid shapes it uses more memory. But at least in my
    tests on my hardware with this sort of shapes it is easily faster
    than anything else. The difference vs the best of the rest is
    especially big at 4K images on AMD Zen3 based hardware, but even on
    Intel Skylake which generally serves as equalizer between different algorithms, the speed advantage of 2x2 stack is significant.

    This comment makes me wonder if I should post my timing results.
    Maybe I will (and including an appropriate disclaimer).

    I do timings over these sizes:

    25 x 19
    19 x 25
    200 x 200
    1280 x 760
    760 x 1280
    1920 x 1080
    1080 x 1920
    3840 x 2160
    2160 x 3840
    4096 x 4096
    38400 x 21600
    21600 x 38400
    32767 x 32767
    32768 x 32768

    with these patterns:

    fractal
    slalom
    rotated slalom
    horizontal snake and vertical snake
    cross in cross
    donut aka ellipse with hole
    entire field starting from center

    If you have other patterns to suggest that would be great,
    I can try to incorporate them (especially if there is
    code to generate the pattern).

    Patterns are allowed to include a nominal start point,
    otherwise I can make an arbitrary choice and assign one.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Fri Apr 12 15:38:59 2024
    Michael S <already5chosen@yahoo.com> writes:

    On Wed, 10 Apr 2024 19:47:11 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    This topic has been more interesting that I was expecting, and
    also more challenging.

    That's not the first time in my practice where problems with
    simple formulation begots interesting challenges.
    Didn't Donald Knuth wrote 300 or 400 pages about sorting and
    still ended up quite far away from exhausting the topic?

    In my copy of volume 3 of TAOCP, the chapter on sorting takes up
    388 pages. On the other hand, only 108 pages of that deals with
    what we normally think of as sorting algorithms today, and even
    that part is longer than it needs to be because of Knuth's
    exhaustive (and exhausting) writing style. Don Knuth would
    never write a book in the style of The C Programming Language.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Fri Apr 12 15:43:10 2024
    Michael S <already5chosen@yahoo.com> writes:

    On Wed, 10 Apr 2024 19:47:11 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    I have a strong rule against writing
    functions more than about 60 lines long. For the problem of
    writing an acceptably quick flood-fill algorithm, I think it would
    at the very least be a lot of work to write code to do that while
    still observing a limit on function length of even 100 lines, let
    alone 60.

    So why not break it down to smaller pieces ?

    The better algorithms I have done are long and also make liberal
    use of goto's. Maybe it isn't impossible to break one or more
    of these algorithms into smaller pieces, but C doesn't make it
    easy.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Fri Apr 12 18:13:05 2024
    On Thu, 11 Apr 2024 22:09:51 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Wed, 10 Apr 2024 19:47:11 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Stack-based methods tend to do well on long skinny patterns and
    tend to do not as well on fatter patterns such as circles or
    squares. The fractal pattern is ideal for a stack-based method.
    Conversely, patterns that are mostly solid shapes don't fare as
    well under stack-based methods, at least not the ones that have
    been posted in this thread, and also they tend to use more memory
    in those cases.

    Indeed, with solid shapes it uses more memory. But at least in my
    tests on my hardware with this sort of shapes it is easily faster
    than anything else. The difference vs the best of the rest is
    especially big at 4K images on AMD Zen3 based hardware, but even on
    Intel Skylake which generally serves as equalizer between different algorithms, the speed advantage of 2x2 stack is significant.

    This comment makes me wonder if I should post my timing results.
    Maybe I will (and including an appropriate disclaimer).

    I do timings over these sizes:

    25 x 19
    19 x 25
    200 x 200
    1280 x 760
    760 x 1280
    1920 x 1080
    1080 x 1920
    3840 x 2160
    2160 x 3840
    4096 x 4096
    38400 x 21600
    21600 x 38400
    32767 x 32767
    32768 x 32768


    I didn't went that far up (ended at 4K) and I only test landscape sizes.
    May be, I'd add portrait option to see anisotropic behaviors.
    For bigger sizes, correctness is interesting, speed - not so much, since
    they are unlikely to be edited in interactive manner.

    with these patterns:

    fractal
    slalom
    rotated slalom
    horizontal snake and vertical snake
    cross in cross
    donut aka ellipse with hole
    entire field starting from center

    If you have other patterns to suggest that would be great,
    I can try to incorporate them (especially if there is
    code to generate the pattern).

    Patterns are allowed to include a nominal start point,
    otherwise I can make an arbitrary choice and assign one.

    My suit is about the same with following exceptions:
    1. I didn't add donut yet
    2. + 3 greeds with cell size 2, 3 and 4
    3. + fractal tree
    4. + entire field starting from corner
    It seems, neither of us tests the cases in which linear dimensions of
    the shape are much smaller than those of the field.

    static void make_grid(
    unsigned char *image,
    int width, int height,
    unsigned char background_c,
    unsigned char pen_c, int cell_sz)
    {
    for (int y = 0; y < height; ++y) {
    unsigned char* p = &image[y*width];
    if (y % cell_sz == 0) {
    memset(p, pen_c, width);
    } else {
    for (int x = 0; x < width; ++x)
    p[x] = x % cell_sz ? background_c : pen_c;
    }
    }
    }





    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Sat Apr 13 04:59:25 2024
    Michael S <already5chosen@yahoo.com> writes:

    On Wed, 10 Apr 2024 19:47:11 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Stack-based methods tend to do well on long skinny patterns and
    tend to do not as well on fatter patterns such as circles or
    squares. The fractal pattern is ideal for a stack-based method.
    Conversely, patterns that are mostly solid shapes don't fare as
    well under stack-based methods, at least not the ones that have
    been posted in this thread, and also they tend to use more memory
    in those cases.

    Indeed, with solid shapes it uses more memory. But at least in my
    tests on my hardware with this sort of shapes it is easily faster
    than anything else. The difference vs the best of the rest is
    especially big at 4K images on AMD Zen3 based hardware, but even
    on Intel Skylake which generally serves as equalizer between
    different algorithms, the speed advantage of 2x2 stack is
    significant.

    I'm curious to know how your 2x2 algorithm compares to my
    second (longer) stack-based algorithm when run on the Zen3.
    On my test hardware they are roughly comparable, depending
    on size and pattern. My curiosity includes the fatter
    patterns as well as the long skinny ones.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Sun Apr 14 01:30:03 2024
    Michael S <already5chosen@yahoo.com> writes:

    On Thu, 11 Apr 2024 22:09:51 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
    [...]

    I do timings over these sizes:

    25 x 19
    19 x 25
    200 x 200
    1280 x 760
    760 x 1280
    1920 x 1080
    1080 x 1920
    3840 x 2160
    2160 x 3840
    4096 x 4096
    38400 x 21600
    21600 x 38400
    32767 x 32767
    32768 x 32768

    I didn't went that far up (ended at 4K)

    I test large sizes for three reasons. One, even if viewable
    area is smaller, virtual displays might be much larger. Two,
    to see how the algorithms scale. Three, larger areas have
    relatively less influence from edge effects.

    Also I have now added

    275 x 25 25 x 275
    400 x 300 300 x 400
    640 x 480 480 x 640
    1600 x 900 900 x 1600
    16000 x 9000 9000 x 16000


    and I only test landscape sizes. May be, I'd add portrait option
    to see anisotropic behaviors.

    I decided to do both, one, for symmetry (and there are still some
    applications for portrait mode), and two, to see whether that has
    an effect on behavior (indeed my latest algorithm is anisotropic,
    so it is good to test the flipped sizes).

    with these patterns:

    fractal
    slalom
    rotated slalom
    horizontal snake and vertical snake
    cross in cross
    donut aka ellipse with hole
    entire field starting from center

    If you have other patterns to suggest that would be great,
    I can try to incorporate them (especially if there is
    code to generate the pattern).

    Patterns are allowed to include a nominal start point,
    otherwise I can make an arbitrary choice and assign one.

    My suit is about the same with following exceptions:
    1. I didn't add donut yet
    2. + 3 greeds with cell size 2, 3 and 4
    3. + fractal tree

    By "fractal" I meant fractal tree. Sorry if that was confusing.

    4. + entire field starting from corner

    I used to do that but took it out as redundant. I've added
    it back now. :)

    It seems, neither of us tests the cases in which linear dimensions
    of the shape are much smaller than those of the field.

    Shouldn't make a difference (for any of the algorithms shown) as
    long as there is at least a 1 pixel border around the pattern.
    Maybe I will add that variation (ick, a lot of work). By the
    way the donut pattern already has a 1 pixel border, ie, does
    not touch any edge.

    static void make_grid(
    unsigned char *image,
    int width, int height,
    unsigned char background_c,
    unsigned char pen_c, int cell_sz)
    {
    for (int y = 0; y < height; ++y) {
    unsigned char* p = &image[y*width];
    if (y % cell_sz == 0) {
    memset(p, pen_c, width);
    } else {
    for (int x = 0; x < width; ++x)
    p[x] = x % cell_sz ? background_c : pen_c;
    }
    }
    }

    Ahh, this is what you meant by greed. A nice set of patterns.
    I wrote a variation where the "line width" as well as the
    "hole width" is variable, and added a bunch of those to my
    tests (so a full timing suite now runs for several hours).

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Sun Apr 14 03:26:39 2024
    On Fri, 12 Apr 2024 11:59:25 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Wed, 10 Apr 2024 19:47:11 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Stack-based methods tend to do well on long skinny patterns and
    tend to do not as well on fatter patterns such as circles or
    squares. The fractal pattern is ideal for a stack-based method.
    Conversely, patterns that are mostly solid shapes don't fare as
    well under stack-based methods, at least not the ones that have
    been posted in this thread, and also they tend to use more memory
    in those cases.

    Indeed, with solid shapes it uses more memory. But at least in my
    tests on my hardware with this sort of shapes it is easily faster
    than anything else. The difference vs the best of the rest is
    especially big at 4K images on AMD Zen3 based hardware, but even
    on Intel Skylake which generally serves as equalizer between
    different algorithms, the speed advantage of 2x2 stack is
    significant.

    I'm curious to know how your 2x2 algorithm compares to my
    second (longer) stack-based algorithm when run on the Zen3.
    On my test hardware they are roughly comparable, depending
    on size and pattern. My curiosity includes the fatter
    patterns as well as the long skinny ones.

    This particular server turned off right now.
    Hopefully, next Monday I would be able to test on it.
    It would help if in the mean time you point me to specific post with
    code.




    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Sun Apr 14 03:54:46 2024
    Michael S <already5chosen@yahoo.com> writes:

    On Fri, 12 Apr 2024 11:59:25 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    I'm curious to know how your 2x2 algorithm compares to my
    second (longer) stack-based algorithm when run on the Zen3.
    On my test hardware they are roughly comparable, depending
    on size and pattern. My curiosity includes the fatter
    patterns as well as the long skinny ones.

    This particular server turned off right now.
    Hopefully, next Monday I would be able to test on it.
    It would help if in the mean time you point me to specific post
    with code.

    Does this help? Message-ID: <8634ru3ofo.fsf@linuxsc.com>

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Sun Apr 14 06:11:59 2024
    On Sat, 13 Apr 2024 10:54:46 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Fri, 12 Apr 2024 11:59:25 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    I'm curious to know how your 2x2 algorithm compares to my
    second (longer) stack-based algorithm when run on the Zen3.
    On my test hardware they are roughly comparable, depending
    on size and pattern. My curiosity includes the fatter
    patterns as well as the long skinny ones.

    This particular server turned off right now.
    Hopefully, next Monday I would be able to test on it.
    It would help if in the mean time you point me to specific post
    with code.

    Does this help? Message-ID: <8634ru3ofo.fsf@linuxsc.com>

    Yes, it is.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Wed Apr 17 07:47:22 2024
    On Fri, 12 Apr 2024 11:59:25 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Wed, 10 Apr 2024 19:47:11 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Stack-based methods tend to do well on long skinny patterns and
    tend to do not as well on fatter patterns such as circles or
    squares. The fractal pattern is ideal for a stack-based method.
    Conversely, patterns that are mostly solid shapes don't fare as
    well under stack-based methods, at least not the ones that have
    been posted in this thread, and also they tend to use more memory
    in those cases.

    Indeed, with solid shapes it uses more memory. But at least in my
    tests on my hardware with this sort of shapes it is easily faster
    than anything else. The difference vs the best of the rest is
    especially big at 4K images on AMD Zen3 based hardware, but even
    on Intel Skylake which generally serves as equalizer between
    different algorithms, the speed advantage of 2x2 stack is
    significant.

    I'm curious to know how your 2x2 algorithm compares to my
    second (longer) stack-based algorithm when run on the Zen3.
    On my test hardware they are roughly comparable, depending
    on size and pattern. My curiosity includes the fatter
    patterns as well as the long skinny ones.

    Finally found the time for speed measurements.

    I tested four algorithms:
    1. stack_2x2 - stack-like processing where each element is 2x2 rectangle
    with Lot's wife amendment.
    2. stack_timr1 - first variant of stack by Tim Rentsch
    3. stack_timr2 - second variant of stack by Tim Rentsch
    4. queue_timr - "take no prisoners" queue by Tim Rentsch, the one with power-of-two circular buffer, (x,y) packed to 32 bits and inner loop
    optimized for solid shapes.

    Tests were run on four CPUs
    1. IVB - Intel Core i7-3570 at 3700 MHz. As far as CPUs are going,
    rather old thing.
    2. HSW - Intel Xeon E3-1271 v3 at 4000 MHz. Only couple of years
    younger than above.
    3. SKC - Intel Xeon E-2176G at 4250 MHz. Significantly younger, but microarchitecture exists since 2015.
    4. ZN3 - AMD EPYC 7543P at 3700 MHz. The only one on my roaster whose microarchitecture can be considered relatively modern.

    As you can see, with exception of the oldest CPU, your 2d stack variant
    is not an improvement over the first.

    What surprised me after I put all results together, was a poor showing
    of SKC. I can't remember any other of my microbenchmarks (and I do
    plenty) were this CPU was so decisively beaten by its older cousin.

    The columns are as following:
    1. Shape name
    2. Starting point (x,y)
    3. Number of points to recolor
    4. total test duration, seconds
    5. time per pixel - normalized to image area, nsec
    6. time per pixel - normalized to number of points to recolor, nsec


    IVB,stack_2x2:
    [25 x 19] * 421054
    Solid square ( 12, 9) 475 0.547 2.73 2.73
    Solid square ( 0, 0) 475 0.537 2.68 2.68
    standing snake-like shape ( 0, 0) 259 0.522 2.61 4.79
    prostrate snake-like shape ( 0, 0) 259 0.528 2.64 4.84
    slalom shape ( 0, 0) 233 0.459 2.29 4.68
    slalom shape(rotated) ( 0, 0) 223 0.455 2.27 4.85
    cross-in-cross ( 0, 0) 403 0.515 2.57 3.04
    fractal tree ( 12, 0) 247 0.469 2.34 4.51
    greed(2) ( 0, 0) 367 0.558 2.79 3.61
    greed(3) ( 0, 0) 283 0.463 2.31 3.89
    greed(4) ( 0, 0) 223 0.399 1.99 4.25
    donut ( 23, 9) 238 0.305 1.52 3.04
    [200 x 200] * 5002
    Solid square ( 100, 100) 40000 0.461 2.30 2.30
    Solid square ( 0, 0) 40000 0.460 2.30 2.30
    standing snake-like shape ( 0, 0) 20100 0.382 1.91 3.80
    prostrate snake-like shape ( 0, 0) 20100 0.474 2.37 4.71
    slalom shape ( 0, 0) 19802 0.435 2.17 4.39
    slalom shape(rotated) ( 0, 0) 19802 0.450 2.25 4.54
    cross-in-cross ( 0, 0) 39216 0.470 2.35 2.40
    fractal tree ( 99, 0) 18674 0.432 2.16 4.62
    greed(2) ( 0, 0) 30000 0.458 2.29 3.05
    greed(3) ( 0, 0) 22311 0.413 2.06 3.70
    greed(4) ( 0, 0) 17500 0.348 1.74 3.98
    donut ( 199, 100) 25830 0.315 1.57 2.44
    [1280 x 720] * 218
    Solid square ( 640, 360) 921600 0.450 2.24 2.24
    Solid square ( 0, 0) 921600 0.450 2.24 2.24
    standing snake-like shape ( 0, 0) 461160 0.371 1.85 3.69
    prostrate snake-like shape ( 0, 0) 461440 0.469 2.33 4.66
    slalom shape ( 0, 0) 460082 0.437 2.18 4.36
    slalom shape(rotated) ( 0, 0) 460800 0.448 2.23 4.46
    cross-in-cross ( 0, 0) 917616 0.452 2.25 2.26
    fractal tree ( 639, 0) 445860 0.460 2.29 4.73
    greed(2) ( 0, 0) 691200 0.468 2.33 3.11
    greed(3) ( 0, 0) 512160 0.406 2.02 3.64
    greed(4) ( 0, 0) 403200 0.344 1.71 3.91
    donut (1279, 360) 655856 0.326 1.62 2.28
    [1920 x 1080] * 98
    Solid square ( 960, 540) 2073600 0.453 2.23 2.23
    Solid square ( 0, 0) 2073600 0.457 2.25 2.25
    standing snake-like shape ( 0, 0) 1037340 0.374 1.84 3.68
    prostrate snake-like shape ( 0, 0) 1037760 0.474 2.33 4.66
    slalom shape ( 0, 0) 1036800 0.443 2.18 4.36
    slalom shape(rotated) ( 0, 0) 1036800 0.452 2.22 4.45
    cross-in-cross ( 0, 0) 2067616 0.457 2.25 2.26
    fractal tree ( 959, 0) 1034612 0.453 2.23 4.47
    greed(2) ( 0, 0) 1555200 0.450 2.21 2.95
    greed(3) ( 0, 0) 1152000 0.407 2.00 3.61
    greed(4) ( 0, 0) 907200 0.346 1.70 3.89
    donut (1919, 540) 1477788 0.326 1.60 2.25
    [3840 x 2160] * 26
    Solid square (1920,1080) 8294400 0.500 2.32 2.32
    Solid square ( 0, 0) 8294400 0.539 2.50 2.50
    standing snake-like shape ( 0, 0) 4148280 0.449 2.08 4.16
    prostrate snake-like shape ( 0, 0) 4149120 0.746 3.46 6.92
    slalom shape ( 0, 0) 4147200 0.703 3.26 6.52
    slalom shape(rotated) ( 0, 0) 4147200 0.537 2.49 4.98
    cross-in-cross ( 0, 0) 8282416 0.518 2.40 2.41
    fractal tree (1919, 0) 4135652 0.514 2.38 4.78
    greed(2) ( 0, 0) 6220800 0.533 2.47 3.30
    greed(3) ( 0, 0) 4608000 0.468 2.17 3.91
    greed(4) ( 0, 0) 3628800 0.386 1.79 4.09
    donut (3839,1080) 5919706 0.356 1.65 2.31

    IVB,stack_timr1:
    [25 x 19] * 421054
    Solid square ( 12, 9) 475 1.132 5.66 5.66
    Solid square ( 0, 0) 475 1.171 5.85 5.85
    standing snake-like shape ( 0, 0) 259 0.724 3.62 6.64
    prostrate snake-like shape ( 0, 0) 259 0.712 3.56 6.53
    slalom shape ( 0, 0) 233 0.632 3.16 6.44
    slalom shape(rotated) ( 0, 0) 223 0.632 3.16 6.73
    cross-in-cross ( 0, 0) 403 0.931 4.65 5.49
    fractal tree ( 12, 0) 247 0.537 2.68 5.16
    greed(2) ( 0, 0) 367 0.866 4.33 5.60
    greed(3) ( 0, 0) 283 0.724 3.62 6.08
    greed(4) ( 0, 0) 223 0.618 3.09 6.58
    donut ( 23, 9) 238 0.632 3.16 6.31
    [200 x 200] * 5002
    Solid square ( 100, 100) 40000 0.764 3.82 3.82
    Solid square ( 0, 0) 40000 0.759 3.79 3.79
    standing snake-like shape ( 0, 0) 20100 0.389 1.94 3.87
    prostrate snake-like shape ( 0, 0) 20100 0.400 2.00 3.98
    slalom shape ( 0, 0) 19802 0.388 1.94 3.92
    slalom shape(rotated) ( 0, 0) 19802 0.388 1.94 3.92
    cross-in-cross ( 0, 0) 39216 0.763 3.81 3.89
    fractal tree ( 99, 0) 18674 0.372 1.86 3.98
    greed(2) ( 0, 0) 30000 0.591 2.95 3.94
    greed(3) ( 0, 0) 22311 0.445 2.22 3.99
    greed(4) ( 0, 0) 17500 0.345 1.72 3.94
    donut ( 199, 100) 25830 0.517 2.58 4.00
    [1280 x 720] * 218
    Solid square ( 640, 360) 921600 0.793 3.95 3.95
    Solid square ( 0, 0) 921600 0.868 4.32 4.32
    standing snake-like shape ( 0, 0) 461160 0.420 2.09 4.18
    prostrate snake-like shape ( 0, 0) 461440 0.441 2.20 4.38
    slalom shape ( 0, 0) 460082 0.434 2.16 4.33
    slalom shape(rotated) ( 0, 0) 460800 0.429 2.14 4.27
    cross-in-cross ( 0, 0) 917616 0.801 3.99 4.00
    fractal tree ( 639, 0) 445860 0.389 1.94 4.00
    greed(2) ( 0, 0) 691200 0.614 3.06 4.07
    greed(3) ( 0, 0) 512160 0.424 2.11 3.80
    greed(4) ( 0, 0) 403200 0.334 1.66 3.80
    donut (1279, 360) 655856 0.572 2.85 4.00
    [1920 x 1080] * 98
    Solid square ( 960, 540) 2073600 0.793 3.90 3.90
    Solid square ( 0, 0) 2073600 0.909 4.47 4.47
    standing snake-like shape ( 0, 0) 1037340 0.415 2.04 4.08
    prostrate snake-like shape ( 0, 0) 1037760 0.442 2.18 4.35
    slalom shape ( 0, 0) 1036800 0.431 2.12 4.24
    slalom shape(rotated) ( 0, 0) 1036800 0.425 2.09 4.18
    cross-in-cross ( 0, 0) 2067616 0.843 4.15 4.16
    fractal tree ( 959, 0) 1034612 0.395 1.94 3.90
    greed(2) ( 0, 0) 1555200 0.614 3.02 4.03
    greed(3) ( 0, 0) 1152000 0.430 2.12 3.81
    greed(4) ( 0, 0) 907200 0.341 1.68 3.84
    donut (1919, 540) 1477788 0.571 2.81 3.94
    [3840 x 2160] * 26
    Solid square (1920,1080) 8294400 0.923 4.28 4.28
    Solid square ( 0, 0) 8294400 1.109 5.14 5.14
    standing snake-like shape ( 0, 0) 4148280 0.521 2.42 4.83
    prostrate snake-like shape ( 0, 0) 4149120 1.186 5.50 10.99
    slalom shape ( 0, 0) 4147200 0.938 4.35 8.70
    slalom shape(rotated) ( 0, 0) 4147200 0.545 2.53 5.05
    cross-in-cross ( 0, 0) 8282416 0.999 4.63 4.64
    fractal tree (1919, 0) 4135652 0.433 2.01 4.03
    greed(2) ( 0, 0) 6220800 0.738 3.42 4.56
    greed(3) ( 0, 0) 4608000 0.529 2.45 4.42
    greed(4) ( 0, 0) 3628800 0.417 1.93 4.42
    donut (3839,1080) 5919706 0.666 3.09 4.33


    IVB,stack_timr2:
    [25 x 19] * 421054
    Solid square ( 12, 9) 475 0.963 4.81 4.81
    Solid square ( 0, 0) 475 0.990 4.95 4.95
    standing snake-like shape ( 0, 0) 259 0.615 3.07 5.64
    prostrate snake-like shape ( 0, 0) 259 0.673 3.36 6.17
    slalom shape ( 0, 0) 233 0.761 3.80 7.76
    slalom shape(rotated) ( 0, 0) 223 0.815 4.07 8.68
    cross-in-cross ( 0, 0) 403 1.160 5.80 6.84
    fractal tree ( 12, 0) 247 0.740 3.70 7.12
    greed(2) ( 0, 0) 367 1.093 5.46 7.07
    greed(3) ( 0, 0) 283 0.762 3.81 6.39
    greed(4) ( 0, 0) 223 0.621 3.10 6.61
    donut ( 23, 9) 238 0.753 3.76 7.51
    [200 x 200] * 5002
    Solid square ( 100, 100) 40000 0.587 2.93 2.93
    Solid square ( 0, 0) 40000 0.588 2.94 2.94
    standing snake-like shape ( 0, 0) 20100 0.299 1.49 2.97
    prostrate snake-like shape ( 0, 0) 20100 0.311 1.55 3.09
    slalom shape ( 0, 0) 19802 0.481 2.40 4.86
    slalom shape(rotated) ( 0, 0) 19802 0.586 2.93 5.92
    cross-in-cross ( 0, 0) 39216 0.609 3.04 3.10
    fractal tree ( 99, 0) 18674 0.539 2.69 5.77
    greed(2) ( 0, 0) 30000 0.909 4.54 6.06
    greed(3) ( 0, 0) 22311 0.468 2.34 4.19
    greed(4) ( 0, 0) 17500 0.371 1.85 4.24
    donut ( 199, 100) 25830 0.418 2.09 3.24
    [1280 x 720] * 218
    Solid square ( 640, 360) 921600 0.602 3.00 3.00
    Solid square ( 0, 0) 921600 0.741 3.69 3.69
    standing snake-like shape ( 0, 0) 461160 0.326 1.62 3.24
    prostrate snake-like shape ( 0, 0) 461440 0.342 1.70 3.40
    slalom shape ( 0, 0) 460082 0.519 2.58 5.17
    slalom shape(rotated) ( 0, 0) 460800 0.626 3.12 6.23
    cross-in-cross ( 0, 0) 917616 0.666 3.31 3.33
    fractal tree ( 639, 0) 445860 0.565 2.81 5.81
    greed(2) ( 0, 0) 691200 0.938 4.67 6.23
    greed(3) ( 0, 0) 512160 0.491 2.44 4.40
    greed(4) ( 0, 0) 403200 0.352 1.75 4.00
    donut (1279, 360) 655856 0.450 2.24 3.15
    [1920 x 1080] * 98
    Solid square ( 960, 540) 2073600 0.611 3.01 3.01
    Solid square ( 0, 0) 2073600 0.759 3.74 3.74
    standing snake-like shape ( 0, 0) 1037340 0.330 1.62 3.25
    prostrate snake-like shape ( 0, 0) 1037760 0.350 1.72 3.44
    slalom shape ( 0, 0) 1036800 0.525 2.58 5.17
    slalom shape(rotated) ( 0, 0) 1036800 0.636 3.13 6.26
    cross-in-cross ( 0, 0) 2067616 0.674 3.32 3.33
    fractal tree ( 959, 0) 1034612 0.605 2.98 5.97
    greed(2) ( 0, 0) 1555200 0.923 4.54 6.06
    greed(3) ( 0, 0) 1152000 0.463 2.28 4.10
    greed(4) ( 0, 0) 907200 0.359 1.77 4.04
    donut (1919, 540) 1477788 0.431 2.12 2.98
    [3840 x 2160] * 26
    Solid square (1920,1080) 8294400 0.703 3.26 3.26
    Solid square ( 0, 0) 8294400 0.847 3.93 3.93
    standing snake-like shape ( 0, 0) 4148280 0.400 1.85 3.71
    prostrate snake-like shape ( 0, 0) 4149120 0.815 3.78 7.55
    slalom shape ( 0, 0) 4147200 0.871 4.04 8.08
    slalom shape(rotated) ( 0, 0) 4147200 0.734 3.40 6.81
    cross-in-cross ( 0, 0) 8282416 0.774 3.59 3.59
    fractal tree (1919, 0) 4135652 0.658 3.05 6.12
    greed(2) ( 0, 0) 6220800 1.023 4.74 6.32
    greed(3) ( 0, 0) 4608000 0.554 2.57 4.62
    greed(4) ( 0, 0) 3628800 0.451 2.09 4.78
    donut (3839,1080) 5919706 0.498 2.31 3.24

    IVB,queue_timr:
    [25 x 19] * 421054
    Solid square ( 12, 9) 475 0.828 4.14 4.14
    Solid square ( 0, 0) 475 0.890 4.45 4.45
    standing snake-like shape ( 0, 0) 259 0.642 3.21 5.89
    prostrate snake-like shape ( 0, 0) 259 0.709 3.54 6.50
    slalom shape ( 0, 0) 233 0.589 2.94 6.00
    slalom shape(rotated) ( 0, 0) 223 0.573 2.86 6.10
    cross-in-cross ( 0, 0) 403 0.713 3.56 4.20
    fractal tree ( 12, 0) 247 0.448 2.24 4.31
    greed(2) ( 0, 0) 367 0.675 3.37 4.37
    greed(3) ( 0, 0) 283 0.512 2.56 4.30
    greed(4) ( 0, 0) 223 0.409 2.04 4.36
    donut ( 23, 9) 238 0.439 2.19 4.38

    [200 x 200] * 5002
    Solid square ( 100, 100) 40000 0.893 4.46 4.46
    Solid square ( 0, 0) 40000 0.786 3.93 3.93
    standing snake-like shape ( 0, 0) 20100 0.555 2.77 5.52
    prostrate snake-like shape ( 0, 0) 20100 0.557 2.78 5.54
    slalom shape ( 0, 0) 19802 0.571 2.85 5.76
    slalom shape(rotated) ( 0, 0) 19802 0.548 2.74 5.53
    cross-in-cross ( 0, 0) 39216 0.736 3.68 3.75
    fractal tree ( 99, 0) 18674 0.569 2.84 6.09
    greed(2) ( 0, 0) 30000 0.615 3.07 4.10
    greed(3) ( 0, 0) 22311 0.453 2.26 4.06
    greed(4) ( 0, 0) 17500 0.357 1.78 4.08
    donut ( 199, 100) 25830 0.531 2.65 4.11

    [1280 x 720] * 218
    Solid square ( 640, 360) 921600 0.785 3.91 3.91
    Solid square ( 0, 0) 921600 0.761 3.79 3.79
    standing snake-like shape ( 0, 0) 461160 0.551 2.74 5.48
    prostrate snake-like shape ( 0, 0) 461440 0.552 2.75 5.49
    slalom shape ( 0, 0) 460082 0.564 2.81 5.62
    slalom shape(rotated) ( 0, 0) 460800 0.557 2.77 5.54
    cross-in-cross ( 0, 0) 917616 0.755 3.76 3.77
    fractal tree ( 639, 0) 445860 0.448 2.23 4.61
    greed(2) ( 0, 0) 691200 0.645 3.21 4.28
    greed(3) ( 0, 0) 512160 0.481 2.39 4.31
    greed(4) ( 0, 0) 403200 0.377 1.88 4.29
    donut (1279, 360) 655856 0.572 2.85 4.00

    [1920 x 1080] * 98
    Solid square ( 960, 540) 2073600 0.854 4.20 4.20
    Solid square ( 0, 0) 2073600 0.829 4.08 4.08
    standing snake-like shape ( 0, 0) 1037340 0.557 2.74 5.48
    prostrate snake-like shape ( 0, 0) 1037760 0.574 2.82 5.64
    slalom shape ( 0, 0) 1036800 0.583 2.87 5.74
    slalom shape(rotated) ( 0, 0) 1036800 0.563 2.77 5.54
    cross-in-cross ( 0, 0) 2067616 0.822 4.05 4.06
    fractal tree ( 959, 0) 1034612 0.468 2.30 4.62
    greed(2) ( 0, 0) 1555200 0.664 3.27 4.36
    greed(3) ( 0, 0) 1152000 0.483 2.38 4.28
    greed(4) ( 0, 0) 907200 0.389 1.91 4.38
    donut (1919, 540) 1477788 0.617 3.04 4.26

    [3840 x 2160] * 26
    Solid square (1920,1080) 8294400 1.407 6.52 6.52
    Solid square ( 0, 0) 8294400 1.555 7.21 7.21
    standing snake-like shape ( 0, 0) 4148280 0.596 2.76 5.53
    prostrate snake-like shape ( 0, 0) 4149120 0.851 3.95 7.89
    slalom shape ( 0, 0) 4147200 0.802 3.72 7.44
    slalom shape(rotated) ( 0, 0) 4147200 0.600 2.78 5.56
    cross-in-cross ( 0, 0) 8282416 1.522 7.06 7.07
    fractal tree (1919, 0) 4135652 1.151 5.34 10.70
    greed(2) ( 0, 0) 6220800 1.410 6.54 8.72
    greed(3) ( 0, 0) 4608000 1.450 6.72 12.10
    greed(4) ( 0, 0) 3628800 1.432 6.64 15.18
    donut (3839,1080) 5919706 1.114 5.17 7.24

    HSW,stack_2x2:
    [25 x 19] * 421054
    Solid square ( 12, 9) 475 0.391 1.95 1.95
    Solid square ( 0, 0) 475 0.402 2.01 2.01
    standing snake-like shape ( 0, 0) 259 0.389 1.94 3.57
    prostrate snake-like shape ( 0, 0) 259 0.351 1.75 3.22
    slalom shape ( 0, 0) 233 0.360 1.80 3.67
    slalom shape(rotated) ( 0, 0) 223 0.366 1.83 3.90
    cross-in-cross ( 0, 0) 403 0.385 1.92 2.27
    fractal tree ( 12, 0) 247 0.382 1.91 3.67
    greed(2) ( 0, 0) 367 0.416 2.08 2.69
    greed(3) ( 0, 0) 283 0.361 1.80 3.03
    greed(4) ( 0, 0) 223 0.306 1.53 3.26
    donut ( 23, 9) 238 0.225 1.12 2.25
    [200 x 200] * 5002
    Solid square ( 100, 100) 40000 0.344 1.72 1.72
    Solid square ( 0, 0) 40000 0.343 1.71 1.71
    standing snake-like shape ( 0, 0) 20100 0.274 1.37 2.73
    prostrate snake-like shape ( 0, 0) 20100 0.311 1.55 3.09
    slalom shape ( 0, 0) 19802 0.333 1.66 3.36
    slalom shape(rotated) ( 0, 0) 19802 0.344 1.72 3.47
    cross-in-cross ( 0, 0) 39216 0.352 1.76 1.79
    fractal tree ( 99, 0) 18674 0.497 2.48 5.32
    greed(2) ( 0, 0) 30000 0.338 1.69 2.25
    greed(3) ( 0, 0) 22311 0.317 1.58 2.84
    greed(4) ( 0, 0) 17500 0.247 1.23 2.82
    donut ( 199, 100) 25830 0.237 1.18 1.83
    [1280 x 720] * 218
    Solid square ( 640, 360) 921600 0.334 1.66 1.66
    Solid square ( 0, 0) 921600 0.332 1.65 1.65
    standing snake-like shape ( 0, 0) 461160 0.263 1.31 2.62
    prostrate snake-like shape ( 0, 0) 461440 0.342 1.70 3.40
    slalom shape ( 0, 0) 460082 0.352 1.75 3.51
    slalom shape(rotated) ( 0, 0) 460800 0.346 1.72 3.44
    cross-in-cross ( 0, 0) 917616 0.336 1.67 1.68
    fractal tree ( 639, 0) 445860 0.437 2.18 4.50
    greed(2) ( 0, 0) 691200 0.326 1.62 2.16
    greed(3) ( 0, 0) 512160 0.303 1.51 2.71
    greed(4) ( 0, 0) 403200 0.245 1.22 2.79
    donut (1279, 360) 655856 0.243 1.21 1.70
    [1920 x 1080] * 98
    Solid square ( 960, 540) 2073600 0.337 1.66 1.66
    Solid square ( 0, 0) 2073600 0.335 1.65 1.65
    standing snake-like shape ( 0, 0) 1037340 0.265 1.30 2.61
    prostrate snake-like shape ( 0, 0) 1037760 0.333 1.64 3.27
    slalom shape ( 0, 0) 1036800 0.344 1.69 3.39
    slalom shape(rotated) ( 0, 0) 1036800 0.342 1.68 3.37
    cross-in-cross ( 0, 0) 2067616 0.338 1.66 1.67
    fractal tree ( 959, 0) 1034612 0.472 2.32 4.66
    greed(2) ( 0, 0) 1555200 0.328 1.61 2.15
    greed(3) ( 0, 0) 1152000 0.305 1.50 2.70
    greed(4) ( 0, 0) 907200 0.245 1.21 2.76
    donut (1919, 540) 1477788 0.244 1.20 1.68
    [3840 x 2160] * 26
    Solid square (1920,1080) 8294400 0.375 1.74 1.74
    Solid square ( 0, 0) 8294400 0.402 1.86 1.86
    standing snake-like shape ( 0, 0) 4148280 0.323 1.50 2.99
    prostrate snake-like shape ( 0, 0) 4149120 0.561 2.60 5.20
    slalom shape ( 0, 0) 4147200 0.574 2.66 5.32
    slalom shape(rotated) ( 0, 0) 4147200 0.407 1.89 3.77
    cross-in-cross ( 0, 0) 8282416 0.384 1.78 1.78
    fractal tree (1919, 0) 4135652 0.508 2.36 4.72
    greed(2) ( 0, 0) 6220800 0.395 1.83 2.44
    greed(3) ( 0, 0) 4608000 0.350 1.62 2.92
    greed(4) ( 0, 0) 3628800 0.275 1.28 2.91
    donut (3839,1080) 5919706 0.262 1.21 1.70

    HSW,stack_timr1:
    [25 x 19] * 421054
    Solid square ( 12, 9) 475 0.801 4.00 4.00
    Solid square ( 0, 0) 475 0.845 4.22 4.22
    standing snake-like shape ( 0, 0) 259 0.511 2.55 4.69
    prostrate snake-like shape ( 0, 0) 259 0.516 2.58 4.73
    slalom shape ( 0, 0) 233 0.520 2.60 5.30
    slalom shape(rotated) ( 0, 0) 223 0.476 2.38 5.07
    cross-in-cross ( 0, 0) 403 0.694 3.47 4.09
    fractal tree ( 12, 0) 247 0.414 2.07 3.98
    greed(2) ( 0, 0) 367 0.645 3.22 4.17
    greed(3) ( 0, 0) 283 0.552 2.76 4.63
    greed(4) ( 0, 0) 223 0.476 2.38 5.07
    donut ( 23, 9) 238 0.469 2.34 4.68
    [200 x 200] * 5002
    Solid square ( 100, 100) 40000 0.447 2.23 2.23
    Solid square ( 0, 0) 40000 0.444 2.22 2.22
    standing snake-like shape ( 0, 0) 20100 0.229 1.14 2.28
    prostrate snake-like shape ( 0, 0) 20100 0.250 1.25 2.49
    slalom shape ( 0, 0) 19802 0.270 1.35 2.73
    slalom shape(rotated) ( 0, 0) 19802 0.260 1.30 2.62
    cross-in-cross ( 0, 0) 39216 0.459 2.29 2.34
    fractal tree ( 99, 0) 18674 0.260 1.30 2.78
    greed(2) ( 0, 0) 30000 0.387 1.93 2.58
    greed(3) ( 0, 0) 22311 0.295 1.47 2.64
    greed(4) ( 0, 0) 17500 0.231 1.15 2.64
    donut ( 199, 100) 25830 0.316 1.58 2.45
    [1280 x 720] * 218
    Solid square ( 640, 360) 921600 0.457 2.27 2.27
    Solid square ( 0, 0) 921600 0.515 2.56 2.56
    standing snake-like shape ( 0, 0) 461160 0.248 1.23 2.47
    prostrate snake-like shape ( 0, 0) 461440 0.321 1.60 3.19
    slalom shape ( 0, 0) 460082 0.312 1.55 3.11
    slalom shape(rotated) ( 0, 0) 460800 0.285 1.42 2.84
    cross-in-cross ( 0, 0) 917616 0.466 2.32 2.33
    fractal tree ( 639, 0) 445860 0.271 1.35 2.79
    greed(2) ( 0, 0) 691200 0.406 2.02 2.69
    greed(3) ( 0, 0) 512160 0.278 1.38 2.49
    greed(4) ( 0, 0) 403200 0.223 1.11 2.54
    donut (1279, 360) 655856 0.335 1.67 2.34
    [1920 x 1080] * 98
    Solid square ( 960, 540) 2073600 0.454 2.23 2.23
    Solid square ( 0, 0) 2073600 0.554 2.73 2.73
    standing snake-like shape ( 0, 0) 1037340 0.240 1.18 2.36
    prostrate snake-like shape ( 0, 0) 1037760 0.317 1.56 3.12
    slalom shape ( 0, 0) 1036800 0.306 1.51 3.01
    slalom shape(rotated) ( 0, 0) 1036800 0.293 1.44 2.88
    cross-in-cross ( 0, 0) 2067616 0.511 2.51 2.52
    fractal tree ( 959, 0) 1034612 0.276 1.36 2.72
    greed(2) ( 0, 0) 1555200 0.402 1.98 2.64
    greed(3) ( 0, 0) 1152000 0.283 1.39 2.51
    greed(4) ( 0, 0) 907200 0.224 1.10 2.52
    donut (1919, 540) 1477788 0.331 1.63 2.29
    [3840 x 2160] * 26
    Solid square (1920,1080) 8294400 0.566 2.62 2.62
    Solid square ( 0, 0) 8294400 0.708 3.28 3.28
    standing snake-like shape ( 0, 0) 4148280 0.337 1.56 3.12
    prostrate snake-like shape ( 0, 0) 4149120 0.930 4.31 8.62
    slalom shape ( 0, 0) 4147200 0.735 3.41 6.82
    slalom shape(rotated) ( 0, 0) 4147200 0.387 1.79 3.59
    cross-in-cross ( 0, 0) 8282416 0.630 2.92 2.93
    fractal tree (1919, 0) 4135652 0.302 1.40 2.81
    greed(2) ( 0, 0) 6220800 0.516 2.39 3.19
    greed(3) ( 0, 0) 4608000 0.367 1.70 3.06
    greed(4) ( 0, 0) 3628800 0.286 1.33 3.03
    donut (3839,1080) 5919706 0.398 1.85 2.59

    HSW,stack_timr2:
    [25 x 19] * 421054
    Solid square ( 12, 9) 475 0.746 3.73 3.73
    Solid square ( 0, 0) 475 0.781 3.90 3.90
    standing snake-like shape ( 0, 0) 259 0.479 2.39 4.39
    prostrate snake-like shape ( 0, 0) 259 0.525 2.62 4.81
    slalom shape ( 0, 0) 233 0.546 2.73 5.57
    slalom shape(rotated) ( 0, 0) 223 0.532 2.66 5.67
    cross-in-cross ( 0, 0) 403 0.804 4.02 4.74
    fractal tree ( 12, 0) 247 0.466 2.33 4.48
    greed(2) ( 0, 0) 367 0.668 3.34 4.32
    greed(3) ( 0, 0) 283 0.552 2.76 4.63
    greed(4) ( 0, 0) 223 0.446 2.23 4.75
    donut ( 23, 9) 238 0.521 2.60 5.20
    [200 x 200] * 5002
    Solid square ( 100, 100) 40000 0.432 2.16 2.16
    Solid square ( 0, 0) 40000 0.430 2.15 2.15
    standing snake-like shape ( 0, 0) 20100 0.223 1.11 2.22
    prostrate snake-like shape ( 0, 0) 20100 0.271 1.35 2.70
    slalom shape ( 0, 0) 19802 0.360 1.80 3.63
    slalom shape(rotated) ( 0, 0) 19802 0.385 1.92 3.89
    cross-in-cross ( 0, 0) 39216 0.468 2.34 2.39
    fractal tree ( 99, 0) 18674 0.457 2.28 4.89
    greed(2) ( 0, 0) 30000 0.468 2.34 3.12
    greed(3) ( 0, 0) 22311 0.353 1.76 3.16
    greed(4) ( 0, 0) 17500 0.275 1.37 3.14
    donut ( 199, 100) 25830 0.340 1.70 2.63
    [1280 x 720] * 218
    Solid square ( 640, 360) 921600 0.450 2.24 2.24
    Solid square ( 0, 0) 921600 0.560 2.79 2.79
    standing snake-like shape ( 0, 0) 461160 0.244 1.21 2.43
    prostrate snake-like shape ( 0, 0) 461440 0.282 1.40 2.80
    slalom shape ( 0, 0) 460082 0.412 2.05 4.11
    slalom shape(rotated) ( 0, 0) 460800 0.414 2.06 4.12
    cross-in-cross ( 0, 0) 917616 0.497 2.47 2.48
    fractal tree ( 639, 0) 445860 0.426 2.12 4.38
    greed(2) ( 0, 0) 691200 0.496 2.47 3.29
    greed(3) ( 0, 0) 512160 0.373 1.86 3.34
    greed(4) ( 0, 0) 403200 0.282 1.40 3.21
    donut (1279, 360) 655856 0.312 1.55 2.18
    [1920 x 1080] * 98
    Solid square ( 960, 540) 2073600 0.437 2.15 2.15
    Solid square ( 0, 0) 2073600 0.559 2.75 2.75
    standing snake-like shape ( 0, 0) 1037340 0.235 1.16 2.31
    prostrate snake-like shape ( 0, 0) 1037760 0.274 1.35 2.69
    slalom shape ( 0, 0) 1036800 0.396 1.95 3.90
    slalom shape(rotated) ( 0, 0) 1036800 0.407 2.00 4.01
    cross-in-cross ( 0, 0) 2067616 0.490 2.41 2.42
    fractal tree ( 959, 0) 1034612 0.457 2.25 4.51
    greed(2) ( 0, 0) 1555200 0.486 2.39 3.19
    greed(3) ( 0, 0) 1152000 0.346 1.70 3.06
    greed(4) ( 0, 0) 907200 0.268 1.32 3.01
    donut (1919, 540) 1477788 0.297 1.46 2.05
    [3840 x 2160] * 26
    Solid square (1920,1080) 8294400 0.523 2.43 2.43
    Solid square ( 0, 0) 8294400 0.649 3.01 3.01
    standing snake-like shape ( 0, 0) 4148280 0.307 1.42 2.85
    prostrate snake-like shape ( 0, 0) 4149120 0.614 2.85 5.69
    slalom shape ( 0, 0) 4147200 0.666 3.09 6.18
    slalom shape(rotated) ( 0, 0) 4147200 0.495 2.30 4.59
    cross-in-cross ( 0, 0) 8282416 0.585 2.71 2.72
    fractal tree (1919, 0) 4135652 0.435 2.02 4.05
    greed(2) ( 0, 0) 6220800 0.583 2.70 3.60
    greed(3) ( 0, 0) 4608000 0.426 1.98 3.56
    greed(4) ( 0, 0) 3628800 0.342 1.59 3.62
    donut (3839,1080) 5919706 0.369 1.71 2.40

    HSW,queue_timr:
    [25 x 19] * 421054
    Solid square ( 12, 9) 475 0.698 3.49 3.49
    Solid square ( 0, 0) 475 0.709 3.54 3.54
    standing snake-like shape ( 0, 0) 259 0.517 2.58 4.74
    prostrate snake-like shape ( 0, 0) 259 0.518 2.59 4.75
    slalom shape ( 0, 0) 233 0.478 2.39 4.87
    slalom shape(rotated) ( 0, 0) 223 0.447 2.23 4.76
    cross-in-cross ( 0, 0) 403 0.577 2.88 3.40
    fractal tree ( 12, 0) 247 0.374 1.87 3.60
    greed(2) ( 0, 0) 367 0.515 2.57 3.33
    greed(3) ( 0, 0) 283 0.409 2.04 3.43
    greed(4) ( 0, 0) 223 0.336 1.68 3.58
    donut ( 23, 9) 238 0.379 1.89 3.78
    [200 x 200] * 5002
    Solid square ( 100, 100) 40000 0.662 3.31 3.31
    Solid square ( 0, 0) 40000 0.619 3.09 3.09
    standing snake-like shape ( 0, 0) 20100 0.443 2.21 4.41
    prostrate snake-like shape ( 0, 0) 20100 0.446 2.23 4.44
    slalom shape ( 0, 0) 19802 0.440 2.20 4.44
    slalom shape(rotated) ( 0, 0) 19802 0.439 2.19 4.43
    cross-in-cross ( 0, 0) 39216 0.629 3.14 3.21
    fractal tree ( 99, 0) 18674 0.618 3.09 6.62
    greed(2) ( 0, 0) 30000 0.477 2.38 3.18
    greed(3) ( 0, 0) 22311 0.364 1.82 3.26
    greed(4) ( 0, 0) 17500 0.289 1.44 3.30
    donut ( 199, 100) 25830 0.482 2.41 3.73
    [1280 x 720] * 218
    Solid square ( 640, 360) 921600 0.669 3.33 3.33
    Solid square ( 0, 0) 921600 0.628 3.13 3.13
    standing snake-like shape ( 0, 0) 461160 0.444 2.21 4.42
    prostrate snake-like shape ( 0, 0) 461440 0.445 2.21 4.42
    slalom shape ( 0, 0) 460082 0.444 2.21 4.43
    slalom shape(rotated) ( 0, 0) 460800 0.441 2.20 4.39
    cross-in-cross ( 0, 0) 917616 0.631 3.14 3.15
    fractal tree ( 639, 0) 445860 0.373 1.86 3.84
    greed(2) ( 0, 0) 691200 0.492 2.45 3.27
    greed(3) ( 0, 0) 512160 0.377 1.88 3.38
    greed(4) ( 0, 0) 403200 0.304 1.51 3.46
    donut (1279, 360) 655856 0.505 2.51 3.53
    [1920 x 1080] * 98
    Solid square ( 960, 540) 2073600 0.711 3.50 3.50
    Solid square ( 0, 0) 2073600 0.664 3.27 3.27
    standing snake-like shape ( 0, 0) 1037340 0.448 2.20 4.41
    prostrate snake-like shape ( 0, 0) 1037760 0.460 2.26 4.52
    slalom shape ( 0, 0) 1036800 0.450 2.21 4.43
    slalom shape(rotated) ( 0, 0) 1036800 0.448 2.20 4.41
    cross-in-cross ( 0, 0) 2067616 0.666 3.28 3.29
    fractal tree ( 959, 0) 1034612 0.391 1.92 3.86
    greed(2) ( 0, 0) 1555200 0.509 2.50 3.34
    greed(3) ( 0, 0) 1152000 0.378 1.86 3.35
    greed(4) ( 0, 0) 907200 0.306 1.51 3.44
    donut (1919, 540) 1477788 0.517 2.54 3.57
    [3840 x 2160] * 26
    Solid square (1920,1080) 8294400 1.093 5.07 5.07
    Solid square ( 0, 0) 8294400 1.240 5.75 5.75
    standing snake-like shape ( 0, 0) 4148280 0.481 2.23 4.46
    prostrate snake-like shape ( 0, 0) 4149120 0.643 2.98 5.96
    slalom shape ( 0, 0) 4147200 0.597 2.77 5.54
    slalom shape(rotated) ( 0, 0) 4147200 0.477 2.21 4.42
    cross-in-cross ( 0, 0) 8282416 1.116 5.17 5.18
    fractal tree (1919, 0) 4135652 0.937 4.34 8.71
    greed(2) ( 0, 0) 6220800 1.133 5.25 7.01
    greed(3) ( 0, 0) 4608000 1.102 5.11 9.20
    greed(4) ( 0, 0) 3628800 1.020 4.73 10.81
    donut (3839,1080) 5919706 0.892 4.14 5.80


    SKC,stack_2x2:
    [25 x 19] * 421054
    Solid square ( 12, 9) 475 0.425 2.12 2.12
    Solid square ( 0, 0) 475 0.422 2.11 2.11
    standing snake-like shape ( 0, 0) 259 0.376 1.88 3.45
    prostrate snake-like shape ( 0, 0) 259 0.370 1.85 3.39
    slalom shape ( 0, 0) 233 0.360 1.80 3.67
    slalom shape(rotated) ( 0, 0) 223 0.339 1.69 3.61
    cross-in-cross ( 0, 0) 403 0.398 1.99 2.35
    fractal tree ( 12, 0) 247 0.355 1.77 3.41
    greed(2) ( 0, 0) 367 0.400 2.00 2.59
    greed(3) ( 0, 0) 283 0.354 1.77 2.97
    greed(4) ( 0, 0) 223 0.305 1.52 3.25
    donut ( 23, 9) 238 0.243 1.21 2.42
    [200 x 200] * 5002
    Solid square ( 100, 100) 40000 0.390 1.95 1.95
    Solid square ( 0, 0) 40000 0.389 1.94 1.94
    standing snake-like shape ( 0, 0) 20100 0.335 1.67 3.33
    prostrate snake-like shape ( 0, 0) 20100 0.342 1.71 3.40
    slalom shape ( 0, 0) 19802 0.337 1.68 3.40
    slalom shape(rotated) ( 0, 0) 19802 0.337 1.68 3.40
    cross-in-cross ( 0, 0) 39216 0.400 2.00 2.04
    fractal tree ( 99, 0) 18674 0.342 1.71 3.66
    greed(2) ( 0, 0) 30000 0.379 1.89 2.53
    greed(3) ( 0, 0) 22311 0.320 1.60 2.87
    greed(4) ( 0, 0) 17500 0.264 1.32 3.02
    donut ( 199, 100) 25830 0.271 1.35 2.10
    [1280 x 720] * 218
    Solid square ( 640, 360) 921600 0.392 1.95 1.95
    Solid square ( 0, 0) 921600 0.391 1.95 1.95
    standing snake-like shape ( 0, 0) 461160 0.336 1.67 3.34
    prostrate snake-like shape ( 0, 0) 461440 0.344 1.71 3.42
    slalom shape ( 0, 0) 460082 0.348 1.73 3.47
    slalom shape(rotated) ( 0, 0) 460800 0.333 1.66 3.31
    cross-in-cross ( 0, 0) 917616 0.389 1.94 1.94
    fractal tree ( 639, 0) 445860 0.344 1.71 3.54
    greed(2) ( 0, 0) 691200 0.366 1.82 2.43
    greed(3) ( 0, 0) 512160 0.315 1.57 2.82
    greed(4) ( 0, 0) 403200 0.262 1.30 2.98
    donut (1279, 360) 655856 0.276 1.37 1.93
    [1920 x 1080] * 98
    Solid square ( 960, 540) 2073600 0.389 1.91 1.91
    Solid square ( 0, 0) 2073600 0.389 1.91 1.91
    standing snake-like shape ( 0, 0) 1037340 0.333 1.64 3.28
    prostrate snake-like shape ( 0, 0) 1037760 0.339 1.67 3.33
    slalom shape ( 0, 0) 1036800 0.343 1.69 3.38
    slalom shape(rotated) ( 0, 0) 1036800 0.332 1.63 3.27
    cross-in-cross ( 0, 0) 2067616 0.390 1.92 1.92
    fractal tree ( 959, 0) 1034612 0.347 1.71 3.42
    greed(2) ( 0, 0) 1555200 0.370 1.82 2.43
    greed(3) ( 0, 0) 1152000 0.316 1.56 2.80
    greed(4) ( 0, 0) 907200 0.261 1.28 2.94
    donut (1919, 540) 1477788 0.279 1.37 1.93
    [3840 x 2160] * 26
    Solid square (1920,1080) 8294400 0.429 1.99 1.99
    Solid square ( 0, 0) 8294400 0.456 2.11 2.11
    standing snake-like shape ( 0, 0) 4148280 0.393 1.82 3.64
    prostrate snake-like shape ( 0, 0) 4149120 0.483 2.24 4.48
    slalom shape ( 0, 0) 4147200 0.473 2.19 4.39
    slalom shape(rotated) ( 0, 0) 4147200 0.394 1.83 3.65
    cross-in-cross ( 0, 0) 8282416 0.439 2.04 2.04
    fractal tree (1919, 0) 4135652 0.374 1.73 3.48
    greed(2) ( 0, 0) 6220800 0.431 2.00 2.66
    greed(3) ( 0, 0) 4608000 0.363 1.68 3.03
    greed(4) ( 0, 0) 3628800 0.290 1.34 3.07
    donut (3839,1080) 5919706 0.304 1.41 1.98

    SKC,stack_timr1:
    [25 x 19] * 421054
    Solid square ( 12, 9) 475 0.863 4.31 4.31
    Solid square ( 0, 0) 475 0.913 4.56 4.56
    standing snake-like shape ( 0, 0) 259 0.588 2.94 5.39
    prostrate snake-like shape ( 0, 0) 259 0.584 2.92 5.36
    slalom shape ( 0, 0) 233 0.551 2.75 5.62
    slalom shape(rotated) ( 0, 0) 223 0.535 2.67 5.70
    cross-in-cross ( 0, 0) 403 0.771 3.85 4.54
    fractal tree ( 12, 0) 247 0.449 2.24 4.32
    greed(2) ( 0, 0) 367 0.737 3.68 4.77
    greed(3) ( 0, 0) 283 0.616 3.08 5.17
    greed(4) ( 0, 0) 223 0.535 2.67 5.70
    donut ( 23, 9) 238 0.532 2.66 5.31
    [200 x 200] * 5002
    Solid square ( 100, 100) 40000 0.588 2.94 2.94
    Solid square ( 0, 0) 40000 0.588 2.94 2.94
    standing snake-like shape ( 0, 0) 20100 0.301 1.50 2.99
    prostrate snake-like shape ( 0, 0) 20100 0.316 1.58 3.14
    slalom shape ( 0, 0) 19802 0.300 1.50 3.03
    slalom shape(rotated) ( 0, 0) 19802 0.298 1.49 3.01
    cross-in-cross ( 0, 0) 39216 0.596 2.98 3.04
    fractal tree ( 99, 0) 18674 0.303 1.51 3.24
    greed(2) ( 0, 0) 30000 0.467 2.33 3.11
    greed(3) ( 0, 0) 22311 0.353 1.76 3.16
    greed(4) ( 0, 0) 17500 0.281 1.40 3.21
    donut ( 199, 100) 25830 0.407 2.03 3.15
    [1280 x 720] * 218
    Solid square ( 640, 360) 921600 0.570 2.84 2.84
    Solid square ( 0, 0) 921600 0.618 3.08 3.08
    standing snake-like shape ( 0, 0) 461160 0.285 1.42 2.83
    prostrate snake-like shape ( 0, 0) 461440 0.302 1.50 3.00
    slalom shape ( 0, 0) 460082 0.287 1.43 2.86
    slalom shape(rotated) ( 0, 0) 460800 0.287 1.43 2.86
    cross-in-cross ( 0, 0) 917616 0.569 2.83 2.84
    fractal tree ( 639, 0) 445860 0.296 1.47 3.05
    greed(2) ( 0, 0) 691200 0.454 2.26 3.01
    greed(3) ( 0, 0) 512160 0.337 1.68 3.02
    greed(4) ( 0, 0) 403200 0.266 1.32 3.03
    donut (1279, 360) 655856 0.408 2.03 2.85
    [1920 x 1080] * 98
    Solid square ( 960, 540) 2073600 0.611 3.01 3.01
    Solid square ( 0, 0) 2073600 0.694 3.42 3.42
    standing snake-like shape ( 0, 0) 1037340 0.323 1.59 3.18
    prostrate snake-like shape ( 0, 0) 1037760 0.341 1.68 3.35
    slalom shape ( 0, 0) 1036800 0.325 1.60 3.20
    slalom shape(rotated) ( 0, 0) 1036800 0.323 1.59 3.18
    cross-in-cross ( 0, 0) 2067616 0.643 3.16 3.17
    fractal tree ( 959, 0) 1034612 0.300 1.48 2.96
    greed(2) ( 0, 0) 1555200 0.489 2.41 3.21
    greed(3) ( 0, 0) 1152000 0.353 1.74 3.13
    greed(4) ( 0, 0) 907200 0.279 1.37 3.14
    donut (1919, 540) 1477788 0.439 2.16 3.03
    [3840 x 2160] * 26
    Solid square (1920,1080) 8294400 0.702 3.26 3.26
    Solid square ( 0, 0) 8294400 0.801 3.71 3.71
    standing snake-like shape ( 0, 0) 4148280 0.391 1.81 3.63
    prostrate snake-like shape ( 0, 0) 4149120 0.634 2.94 5.88
    slalom shape ( 0, 0) 4147200 0.527 2.44 4.89
    slalom shape(rotated) ( 0, 0) 4147200 0.398 1.85 3.69
    cross-in-cross ( 0, 0) 8282416 0.751 3.48 3.49
    fractal tree (1919, 0) 4135652 0.323 1.50 3.00
    greed(2) ( 0, 0) 6220800 0.575 2.67 3.56
    greed(3) ( 0, 0) 4608000 0.415 1.92 3.46
    greed(4) ( 0, 0) 3628800 0.319 1.48 3.38
    donut (3839,1080) 5919706 0.493 2.29 3.20

    SKC,stack_timr2:
    [25 x 19] * 421054
    Solid square ( 12, 9) 475 0.904 4.52 4.52
    Solid square ( 0, 0) 475 0.945 4.72 4.72
    standing snake-like shape ( 0, 0) 259 0.568 2.84 5.21
    prostrate snake-like shape ( 0, 0) 259 0.575 2.87 5.27
    slalom shape ( 0, 0) 233 0.564 2.82 5.75
    slalom shape(rotated) ( 0, 0) 223 0.527 2.63 5.61
    cross-in-cross ( 0, 0) 403 0.874 4.37 5.15
    fractal tree ( 12, 0) 247 0.427 2.13 4.11
    greed(2) ( 0, 0) 367 0.692 3.46 4.48
    greed(3) ( 0, 0) 283 0.569 2.84 4.78
    greed(4) ( 0, 0) 223 0.465 2.32 4.95
    donut ( 23, 9) 238 0.564 2.82 5.63
    [200 x 200] * 5002
    Solid square ( 100, 100) 40000 0.657 3.28 3.28
    Solid square ( 0, 0) 40000 0.655 3.27 3.27
    standing snake-like shape ( 0, 0) 20100 0.333 1.66 3.31
    prostrate snake-like shape ( 0, 0) 20100 0.313 1.56 3.11
    slalom shape ( 0, 0) 19802 0.344 1.72 3.47
    slalom shape(rotated) ( 0, 0) 19802 0.342 1.71 3.45
    cross-in-cross ( 0, 0) 39216 0.665 3.32 3.39
    fractal tree ( 99, 0) 18674 0.331 1.65 3.54
    greed(2) ( 0, 0) 30000 0.468 2.34 3.12
    greed(3) ( 0, 0) 22311 0.343 1.71 3.07
    greed(4) ( 0, 0) 17500 0.266 1.33 3.04
    donut ( 199, 100) 25830 0.444 2.22 3.44
    [1280 x 720] * 218
    Solid square ( 640, 360) 921600 0.640 3.19 3.19
    Solid square ( 0, 0) 921600 0.752 3.74 3.74
    standing snake-like shape ( 0, 0) 461160 0.384 1.91 3.82
    prostrate snake-like shape ( 0, 0) 461440 0.371 1.85 3.69
    slalom shape ( 0, 0) 460082 0.408 2.03 4.07
    slalom shape(rotated) ( 0, 0) 460800 0.390 1.94 3.88
    cross-in-cross ( 0, 0) 917616 0.698 3.47 3.49
    fractal tree ( 639, 0) 445860 0.340 1.69 3.50
    greed(2) ( 0, 0) 691200 0.510 2.54 3.38
    greed(3) ( 0, 0) 512160 0.377 1.88 3.38
    greed(4) ( 0, 0) 403200 0.257 1.28 2.92
    donut (1279, 360) 655856 0.459 2.28 3.21
    [1920 x 1080] * 98
    Solid square ( 960, 540) 2073600 0.690 3.40 3.40
    Solid square ( 0, 0) 2073600 0.792 3.90 3.90
    standing snake-like shape ( 0, 0) 1037340 0.365 1.80 3.59
    prostrate snake-like shape ( 0, 0) 1037760 0.345 1.70 3.39
    slalom shape ( 0, 0) 1036800 0.384 1.89 3.78
    slalom shape(rotated) ( 0, 0) 1036800 0.375 1.85 3.69
    cross-in-cross ( 0, 0) 2067616 0.727 3.58 3.59
    fractal tree ( 959, 0) 1034612 0.356 1.75 3.51
    greed(2) ( 0, 0) 1555200 0.495 2.44 3.25
    greed(3) ( 0, 0) 1152000 0.346 1.70 3.06
    greed(4) ( 0, 0) 907200 0.266 1.31 2.99
    donut (1919, 540) 1477788 0.476 2.34 3.29
    [3840 x 2160] * 26
    Solid square (1920,1080) 8294400 0.760 3.52 3.52
    Solid square ( 0, 0) 8294400 0.856 3.97 3.97
    standing snake-like shape ( 0, 0) 4148280 0.415 1.92 3.85
    prostrate snake-like shape ( 0, 0) 4149120 0.483 2.24 4.48
    slalom shape ( 0, 0) 4147200 0.516 2.39 4.79
    slalom shape(rotated) ( 0, 0) 4147200 0.432 2.00 4.01
    cross-in-cross ( 0, 0) 8282416 0.806 3.74 3.74
    fractal tree (1919, 0) 4135652 0.380 1.76 3.53
    greed(2) ( 0, 0) 6220800 0.568 2.63 3.51
    greed(3) ( 0, 0) 4608000 0.400 1.85 3.34
    greed(4) ( 0, 0) 3628800 0.321 1.49 3.40
    donut (3839,1080) 5919706 0.540 2.50 3.51

    SKC,queue_timr:
    [25 x 19] * 421054
    Solid square ( 12, 9) 475 0.557 2.78 2.78
    Solid square ( 0, 0) 475 0.589 2.94 2.94
    standing snake-like shape ( 0, 0) 259 0.479 2.39 4.39
    prostrate snake-like shape ( 0, 0) 259 0.490 2.45 4.49
    slalom shape ( 0, 0) 233 0.464 2.32 4.73
    slalom shape(rotated) ( 0, 0) 223 0.427 2.13 4.55
    cross-in-cross ( 0, 0) 403 0.491 2.45 2.89
    fractal tree ( 12, 0) 247 0.304 1.52 2.92
    greed(2) ( 0, 0) 367 0.432 2.16 2.80
    greed(3) ( 0, 0) 283 0.348 1.74 2.92
    greed(4) ( 0, 0) 223 0.282 1.41 3.00
    donut ( 23, 9) 238 0.310 1.55 3.09
    [200 x 200] * 5002
    Solid square ( 100, 100) 40000 0.536 2.68 2.68
    Solid square ( 0, 0) 40000 0.513 2.56 2.56
    standing snake-like shape ( 0, 0) 20100 0.405 2.02 4.03
    prostrate snake-like shape ( 0, 0) 20100 0.397 1.98 3.95
    slalom shape ( 0, 0) 19802 0.409 2.04 4.13
    slalom shape(rotated) ( 0, 0) 19802 0.414 2.07 4.18
    cross-in-cross ( 0, 0) 39216 0.518 2.59 2.64
    fractal tree ( 99, 0) 18674 0.447 2.23 4.79
    greed(2) ( 0, 0) 30000 0.391 1.95 2.61
    greed(3) ( 0, 0) 22311 0.297 1.48 2.66
    greed(4) ( 0, 0) 17500 0.239 1.19 2.73
    donut ( 199, 100) 25830 0.383 1.91 2.96
    [1280 x 720] * 218
    Solid square ( 640, 360) 921600 0.588 2.93 2.93
    Solid square ( 0, 0) 921600 0.546 2.72 2.72
    standing snake-like shape ( 0, 0) 461160 0.405 2.02 4.03
    prostrate snake-like shape ( 0, 0) 461440 0.402 2.00 4.00
    slalom shape ( 0, 0) 460082 0.411 2.05 4.10
    slalom shape(rotated) ( 0, 0) 460800 0.417 2.08 4.15
    cross-in-cross ( 0, 0) 917616 0.541 2.69 2.70
    fractal tree ( 639, 0) 445860 0.312 1.55 3.21
    greed(2) ( 0, 0) 691200 0.423 2.11 2.81
    greed(3) ( 0, 0) 512160 0.323 1.61 2.89
    greed(4) ( 0, 0) 403200 0.258 1.28 2.94
    donut (1279, 360) 655856 0.413 2.06 2.89
    [1920 x 1080] * 98
    Solid square ( 960, 540) 2073600 0.713 3.51 3.51
    Solid square ( 0, 0) 2073600 0.572 2.81 2.81
    standing snake-like shape ( 0, 0) 1037340 0.414 2.04 4.07
    prostrate snake-like shape ( 0, 0) 1037760 0.414 2.04 4.07
    slalom shape ( 0, 0) 1036800 0.417 2.05 4.10
    slalom shape(rotated) ( 0, 0) 1036800 0.425 2.09 4.18
    cross-in-cross ( 0, 0) 2067616 0.571 2.81 2.82
    fractal tree ( 959, 0) 1034612 0.323 1.59 3.19
    greed(2) ( 0, 0) 1555200 0.439 2.16 2.88
    greed(3) ( 0, 0) 1152000 0.327 1.61 2.90
    greed(4) ( 0, 0) 907200 0.263 1.29 2.96
    donut (1919, 540) 1477788 0.455 2.24 3.14
    [3840 x 2160] * 26
    Solid square (1920,1080) 8294400 0.944 4.38 4.38
    Solid square ( 0, 0) 8294400 1.036 4.80 4.80
    standing snake-like shape ( 0, 0) 4148280 0.434 2.01 4.02
    prostrate snake-like shape ( 0, 0) 4149120 0.590 2.74 5.47
    slalom shape ( 0, 0) 4147200 0.551 2.56 5.11
    slalom shape(rotated) ( 0, 0) 4147200 0.452 2.10 4.19
    cross-in-cross ( 0, 0) 8282416 0.974 4.52 4.52
    fractal tree (1919, 0) 4135652 0.413 1.92 3.84
    greed(2) ( 0, 0) 6220800 0.813 3.77 5.03
    greed(3) ( 0, 0) 4608000 0.662 3.07 5.53
    greed(4) ( 0, 0) 3628800 0.543 2.52 5.76
    donut (3839,1080) 5919706 0.674 3.13 4.38


    ZN3,stack_2x2:
    [25 x 19] * 421054
    Solid square ( 12, 9) 475 0.408 2.04 2.04
    Solid square ( 0, 0) 475 0.382 1.91 1.91
    standing snake-like shape ( 0, 0) 259 0.343 1.71 3.15
    prostrate snake-like shape ( 0, 0) 259 0.327 1.63 3.00
    slalom shape ( 0, 0) 233 0.295 1.47 3.01
    slalom shape(rotated) ( 0, 0) 223 0.311 1.55 3.31
    cross-in-cross ( 0, 0) 403 0.356 1.78 2.10
    fractal tree ( 12, 0) 247 0.310 1.55 2.98
    greed(2) ( 0, 0) 367 0.393 1.96 2.54
    greed(3) ( 0, 0) 283 0.329 1.64 2.76
    greed(4) ( 0, 0) 223 0.285 1.42 3.04

    * Message split, to be continued *

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Wed Apr 17 07:47:22 2024
    * Continuation 1 of a split message *

    donut ( 23, 9) 238 0.220 1.10 2.20
    [200 x 200] * 5002
    Solid square ( 100, 100) 40000 0.332 1.66 1.66
    Solid square ( 0, 0) 40000 0.333 1.66 1.66
    standing snake-like shape ( 0, 0) 20100 0.288 1.44 2.86
    prostrate snake-like shape ( 0, 0) 20100 0.286 1.43 2.84
    slalom shape ( 0, 0) 19802 0.268 1.34 2.71
    slalom shape(rotated) ( 0, 0) 19802 0.288 1.44 2.91
    cross-in-cross ( 0, 0) 39216 0.340 1.70 1.73
    fractal tree ( 99, 0) 18674 0.344 1.72 3.68
    greed(2) ( 0, 0) 30000 0.323 1.61 2.15
    greed(3) ( 0, 0) 22311 0.271 1.35 2.43
    greed(4) ( 0, 0) 17500 0.223 1.11 2.55
    donut ( 199, 100) 25830 0.236 1.18 1.83
    [1280 x 720] * 218
    Solid square ( 640, 360) 921600 0.327 1.63 1.63
    Solid square ( 0, 0) 921600 0.320 1.59 1.59
    standing snake-like shape ( 0, 0) 461160 0.281 1.40 2.80
    prostrate snake-like shape ( 0, 0) 461440 0.276 1.37 2.74
    slalom shape ( 0, 0) 460082 0.266 1.32 2.65
    slalom shape(rotated) ( 0, 0) 460800 0.284 1.41 2.83
    cross-in-cross ( 0, 0) 917616 0.329 1.64 1.64
    fractal tree ( 639, 0) 445860 0.314 1.56 3.23
    greed(2) ( 0, 0) 691200 0.317 1.58 2.10
    greed(3) ( 0, 0) 512160 0.266 1.32 2.38
    greed(4) ( 0, 0) 403200 0.224 1.11 2.55
    donut (1279, 360) 655856 0.237 1.18 1.66
    [1920 x 1080] * 98
    Solid square ( 960, 540) 2073600 0.329 1.62 1.62
    Solid square ( 0, 0) 2073600 0.327 1.61 1.61
    standing snake-like shape ( 0, 0) 1037340 0.284 1.40 2.79
    prostrate snake-like shape ( 0, 0) 1037760 0.283 1.39 2.78
    slalom shape ( 0, 0) 1036800 0.277 1.36 2.73
    slalom shape(rotated) ( 0, 0) 1036800 0.289 1.42 2.84
    cross-in-cross ( 0, 0) 2067616 0.334 1.64 1.65
    fractal tree ( 959, 0) 1034612 0.353 1.74 3.48
    greed(2) ( 0, 0) 1555200 0.326 1.60 2.14
    greed(3) ( 0, 0) 1152000 0.273 1.34 2.42
    greed(4) ( 0, 0) 907200 0.235 1.16 2.64
    donut (1919, 540) 1477788 0.245 1.21 1.69
    [3840 x 2160] * 26
    Solid square (1920,1080) 8294400 0.358 1.66 1.66
    Solid square ( 0, 0) 8294400 0.394 1.83 1.83
    standing snake-like shape ( 0, 0) 4148280 0.338 1.57 3.13
    prostrate snake-like shape ( 0, 0) 4149120 0.354 1.64 3.28
    slalom shape ( 0, 0) 4147200 0.372 1.72 3.45
    slalom shape(rotated) ( 0, 0) 4147200 0.352 1.63 3.26
    cross-in-cross ( 0, 0) 8282416 0.365 1.69 1.69
    fractal tree (1919, 0) 4135652 0.372 1.72 3.46
    greed(2) ( 0, 0) 6220800 0.397 1.84 2.45
    greed(3) ( 0, 0) 4608000 0.316 1.47 2.64
    greed(4) ( 0, 0) 3628800 0.254 1.18 2.69
    donut (3839,1080) 5919706 0.253 1.17 1.64

    ZN3,stack_timr1:
    [25 x 19] * 421054
    Solid square ( 12, 9) 475 0.902 4.51 4.51
    Solid square ( 0, 0) 475 0.874 4.37 4.37
    standing snake-like shape ( 0, 0) 259 0.521 2.60 4.78
    prostrate snake-like shape ( 0, 0) 259 0.545 2.72 5.00
    slalom shape ( 0, 0) 233 0.513 2.56 5.23
    slalom shape(rotated) ( 0, 0) 223 0.505 2.52 5.38
    cross-in-cross ( 0, 0) 403 0.737 3.68 4.34
    fractal tree ( 12, 0) 247 0.436 2.18 4.19
    greed(2) ( 0, 0) 367 0.675 3.37 4.37
    greed(3) ( 0, 0) 283 0.558 2.79 4.68
    greed(4) ( 0, 0) 223 0.481 2.40 5.12
    donut ( 23, 9) 238 0.514 2.57 5.13
    [200 x 200] * 5002
    Solid square ( 100, 100) 40000 0.620 3.10 3.10
    Solid square ( 0, 0) 40000 0.654 3.27 3.27
    standing snake-like shape ( 0, 0) 20100 0.300 1.50 2.98
    prostrate snake-like shape ( 0, 0) 20100 0.268 1.34 2.67
    slalom shape ( 0, 0) 19802 0.273 1.36 2.76
    slalom shape(rotated) ( 0, 0) 19802 0.277 1.38 2.80
    cross-in-cross ( 0, 0) 39216 0.592 2.96 3.02
    fractal tree ( 99, 0) 18674 0.298 1.49 3.19
    greed(2) ( 0, 0) 30000 0.447 2.23 2.98
    greed(3) ( 0, 0) 22311 0.340 1.70 3.05
    greed(4) ( 0, 0) 17500 0.259 1.29 2.96
    donut ( 199, 100) 25830 0.408 2.04 3.16
    [1280 x 720] * 218
    Solid square ( 640, 360) 921600 0.554 2.76 2.76
    Solid square ( 0, 0) 921600 0.609 3.03 3.03
    standing snake-like shape ( 0, 0) 461160 0.266 1.32 2.65
    prostrate snake-like shape ( 0, 0) 461440 0.262 1.30 2.60
    slalom shape ( 0, 0) 460082 0.262 1.30 2.61
    slalom shape(rotated) ( 0, 0) 460800 0.264 1.31 2.63
    cross-in-cross ( 0, 0) 917616 0.558 2.78 2.79
    fractal tree ( 639, 0) 445860 0.283 1.41 2.91
    greed(2) ( 0, 0) 691200 0.417 2.08 2.77
    greed(3) ( 0, 0) 512160 0.307 1.53 2.75
    greed(4) ( 0, 0) 403200 0.241 1.20 2.74
    donut (1279, 360) 655856 0.437 2.18 3.06
    [1920 x 1080] * 98
    Solid square ( 960, 540) 2073600 0.585 2.88 2.88
    Solid square ( 0, 0) 2073600 0.732 3.60 3.60
    standing snake-like shape ( 0, 0) 1037340 0.308 1.52 3.03
    prostrate snake-like shape ( 0, 0) 1037760 0.301 1.48 2.96
    slalom shape ( 0, 0) 1036800 0.296 1.46 2.91
    slalom shape(rotated) ( 0, 0) 1036800 0.306 1.51 3.01
    cross-in-cross ( 0, 0) 2067616 0.665 3.27 3.28
    fractal tree ( 959, 0) 1034612 0.297 1.46 2.93
    greed(2) ( 0, 0) 1555200 0.466 2.29 3.06
    greed(3) ( 0, 0) 1152000 0.309 1.52 2.74
    greed(4) ( 0, 0) 907200 0.244 1.20 2.74
    donut (1919, 540) 1477788 0.452 2.22 3.12
    [3840 x 2160] * 26
    Solid square (1920,1080) 8294400 0.709 3.29 3.29
    Solid square ( 0, 0) 8294400 0.896 4.15 4.15
    standing snake-like shape ( 0, 0) 4148280 0.418 1.94 3.88
    prostrate snake-like shape ( 0, 0) 4149120 0.461 2.14 4.27
    slalom shape ( 0, 0) 4147200 0.445 2.06 4.13
    slalom shape(rotated) ( 0, 0) 4147200 0.443 2.05 4.11
    cross-in-cross ( 0, 0) 8282416 0.836 3.88 3.88
    fractal tree (1919, 0) 4135652 0.328 1.52 3.05
    greed(2) ( 0, 0) 6220800 0.596 2.76 3.68
    greed(3) ( 0, 0) 4608000 0.421 1.95 3.51
    greed(4) ( 0, 0) 3628800 0.316 1.47 3.35
    donut (3839,1080) 5919706 0.536 2.49 3.48

    ZN3,stack_timr2:
    [25 x 19] * 421054
    Solid square ( 12, 9) 475 0.990 4.95 4.95
    Solid square ( 0, 0) 475 1.043 5.21 5.21
    standing snake-like shape ( 0, 0) 259 0.617 3.08 5.66
    prostrate snake-like shape ( 0, 0) 259 0.580 2.90 5.32
    slalom shape ( 0, 0) 233 0.558 2.79 5.69
    slalom shape(rotated) ( 0, 0) 223 0.549 2.74 5.85
    cross-in-cross ( 0, 0) 403 0.831 4.15 4.90
    fractal tree ( 12, 0) 247 0.435 2.17 4.18
    greed(2) ( 0, 0) 367 0.757 3.78 4.90
    greed(3) ( 0, 0) 283 0.630 3.15 5.29
    greed(4) ( 0, 0) 223 0.514 2.57 5.47
    donut ( 23, 9) 238 0.534 2.67 5.33
    [200 x 200] * 5002
    Solid square ( 100, 100) 40000 0.762 3.81 3.81
    Solid square ( 0, 0) 40000 0.763 3.81 3.81
    standing snake-like shape ( 0, 0) 20100 0.386 1.93 3.84
    prostrate snake-like shape ( 0, 0) 20100 0.395 1.97 3.93
    slalom shape ( 0, 0) 19802 0.390 1.95 3.94
    slalom shape(rotated) ( 0, 0) 19802 0.364 1.82 3.67
    cross-in-cross ( 0, 0) 39216 0.761 3.80 3.88
    fractal tree ( 99, 0) 18674 0.385 1.92 4.12
    greed(2) ( 0, 0) 30000 0.549 2.74 3.66
    greed(3) ( 0, 0) 22311 0.417 2.08 3.74
    greed(4) ( 0, 0) 17500 0.328 1.64 3.75
    donut ( 199, 100) 25830 0.510 2.55 3.95
    [1280 x 720] * 218
    Solid square ( 640, 360) 921600 0.749 3.73 3.73
    Solid square ( 0, 0) 921600 0.820 4.08 4.08
    standing snake-like shape ( 0, 0) 461160 0.376 1.87 3.74
    prostrate snake-like shape ( 0, 0) 461440 0.384 1.91 3.82
    slalom shape ( 0, 0) 460082 0.398 1.98 3.97
    slalom shape(rotated) ( 0, 0) 460800 0.357 1.78 3.55
    cross-in-cross ( 0, 0) 917616 0.750 3.73 3.75
    fractal tree ( 639, 0) 445860 0.381 1.90 3.92
    greed(2) ( 0, 0) 691200 0.539 2.68 3.58
    greed(3) ( 0, 0) 512160 0.408 2.03 3.65
    greed(4) ( 0, 0) 403200 0.324 1.61 3.69
    donut (1279, 360) 655856 0.536 2.67 3.75
    [1920 x 1080] * 98
    Solid square ( 960, 540) 2073600 0.848 4.17 4.17
    Solid square ( 0, 0) 2073600 1.032 5.08 5.08
    standing snake-like shape ( 0, 0) 1037340 0.471 2.32 4.63
    prostrate snake-like shape ( 0, 0) 1037760 0.462 2.27 4.54
    slalom shape ( 0, 0) 1036800 0.476 2.34 4.68
    slalom shape(rotated) ( 0, 0) 1036800 0.452 2.22 4.45
    cross-in-cross ( 0, 0) 2067616 0.932 4.59 4.60
    fractal tree ( 959, 0) 1034612 0.403 1.98 3.97
    greed(2) ( 0, 0) 1555200 0.649 3.19 4.26
    greed(3) ( 0, 0) 1152000 0.461 2.27 4.08
    greed(4) ( 0, 0) 907200 0.339 1.67 3.81
    donut (1919, 540) 1477788 0.588 2.89 4.06
    [3840 x 2160] * 26
    Solid square (1920,1080) 8294400 0.922 4.28 4.28
    Solid square ( 0, 0) 8294400 1.089 5.05 5.05
    standing snake-like shape ( 0, 0) 4148280 0.520 2.41 4.82
    prostrate snake-like shape ( 0, 0) 4149120 0.548 2.54 5.08
    slalom shape ( 0, 0) 4147200 0.539 2.50 5.00
    slalom shape(rotated) ( 0, 0) 4147200 0.507 2.35 4.70
    cross-in-cross ( 0, 0) 8282416 0.992 4.60 4.61
    fractal tree (1919, 0) 4135652 0.428 1.98 3.98
    greed(2) ( 0, 0) 6220800 0.709 3.29 4.38
    greed(3) ( 0, 0) 4608000 0.518 2.40 4.32
    greed(4) ( 0, 0) 3628800 0.417 1.93 4.42
    donut (3839,1080) 5919706 0.647 3.00 4.20

    ZN3,queue_timr:
    [25 x 19] * 421054
    Solid square ( 12, 9) 475 0.639 3.19 3.19
    Solid square ( 0, 0) 475 0.655 3.27 3.27
    standing snake-like shape ( 0, 0) 259 0.437 2.18 4.01
    prostrate snake-like shape ( 0, 0) 259 0.441 2.20 4.04
    slalom shape ( 0, 0) 233 0.397 1.98 4.05
    slalom shape(rotated) ( 0, 0) 223 0.374 1.87 3.98
    cross-in-cross ( 0, 0) 403 0.521 2.60 3.07
    fractal tree ( 12, 0) 247 0.499 2.49 4.80
    greed(2) ( 0, 0) 367 0.778 3.89 5.03
    greed(3) ( 0, 0) 283 0.623 3.11 5.23
    greed(4) ( 0, 0) 223 0.495 2.47 5.27
    donut ( 23, 9) 238 0.344 1.72 3.43
    [200 x 200] * 5002
    Solid square ( 100, 100) 40000 0.578 2.89 2.89
    Solid square ( 0, 0) 40000 0.581 2.90 2.90
    standing snake-like shape ( 0, 0) 20100 0.378 1.89 3.76
    prostrate snake-like shape ( 0, 0) 20100 0.369 1.84 3.67
    slalom shape ( 0, 0) 19802 0.366 1.83 3.70
    slalom shape(rotated) ( 0, 0) 19802 0.365 1.82 3.69
    cross-in-cross ( 0, 0) 39216 0.577 2.88 2.94
    fractal tree ( 99, 0) 18674 0.479 2.39 5.13
    greed(2) ( 0, 0) 30000 0.679 3.39 4.52
    greed(3) ( 0, 0) 22311 0.547 2.73 4.90
    greed(4) ( 0, 0) 17500 0.456 2.28 5.21
    donut ( 199, 100) 25830 0.402 2.01 3.11
    [1280 x 720] * 218
    Solid square ( 640, 360) 921600 0.595 2.96 2.96
    Solid square ( 0, 0) 921600 0.596 2.97 2.97
    standing snake-like shape ( 0, 0) 461160 0.373 1.86 3.71
    prostrate snake-like shape ( 0, 0) 461440 0.377 1.88 3.75
    slalom shape ( 0, 0) 460082 0.371 1.85 3.70
    slalom shape(rotated) ( 0, 0) 460800 0.362 1.80 3.60
    cross-in-cross ( 0, 0) 917616 0.593 2.95 2.96
    fractal tree ( 639, 0) 445860 0.474 2.36 4.88
    greed(2) ( 0, 0) 691200 0.459 2.28 3.05
    greed(3) ( 0, 0) 512160 0.339 1.69 3.04
    greed(4) ( 0, 0) 403200 0.447 2.22 5.09
    donut (1279, 360) 655856 0.437 2.18 3.06
    [1920 x 1080] * 98
    Solid square ( 960, 540) 2073600 0.609 3.00 3.00
    Solid square ( 0, 0) 2073600 0.611 3.01 3.01
    standing snake-like shape ( 0, 0) 1037340 0.383 1.88 3.77
    prostrate snake-like shape ( 0, 0) 1037760 0.379 1.87 3.73
    slalom shape ( 0, 0) 1036800 0.375 1.85 3.69
    slalom shape(rotated) ( 0, 0) 1036800 0.372 1.83 3.66
    cross-in-cross ( 0, 0) 2067616 0.610 3.00 3.01
    fractal tree ( 959, 0) 1034612 0.455 2.24 4.49
    greed(2) ( 0, 0) 1555200 0.468 2.30 3.07
    greed(3) ( 0, 0) 1152000 0.348 1.71 3.08
    greed(4) ( 0, 0) 907200 0.276 1.36 3.10
    donut (1919, 540) 1477788 0.453 2.23 3.13
    [3840 x 2160] * 26
    Solid square (1920,1080) 8294400 0.717 3.32 3.32
    Solid square ( 0, 0) 8294400 0.689 3.19 3.19
    standing snake-like shape ( 0, 0) 4148280 0.404 1.87 3.75
    prostrate snake-like shape ( 0, 0) 4149120 0.406 1.88 3.76
    slalom shape ( 0, 0) 4147200 0.400 1.85 3.71
    slalom shape(rotated) ( 0, 0) 4147200 0.394 1.83 3.65
    cross-in-cross ( 0, 0) 8282416 0.674 3.13 3.13
    fractal tree (1919, 0) 4135652 0.441 2.04 4.10
    greed(2) ( 0, 0) 6220800 0.537 2.49 3.32
    greed(3) ( 0, 0) 4608000 0.399 1.85 3.33
    greed(4) ( 0, 0) 3628800 0.317 1.47 3.36
    donut (3839,1080) 5919706 0.491 2.28 3.19





    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Thu Apr 18 03:47:25 2024
    Michael S <already5chosen@yahoo.com> writes:

    [...]

    Finally found the time for speed measurements. [...]

    I got these. Thank you.

    The format used didn't make it easy to do any automated
    processing. I was able to get around that, although it
    would have been nicer if that had been easier.

    The results you got are radically different than my own,
    to the point where I wonder if there is something else
    going on.

    Considering that, since I now have no way of doing any
    useful measuring, it seems there is little point in any
    further development or investigation on my part. It's
    been fun, even if ultimately inconclusive.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Thu Apr 18 05:41:26 2024
    On Wed, 17 Apr 2024 10:47:25 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    [...]

    Finally found the time for speed measurements. [...]

    I got these. Thank you.

    The format used didn't make it easy to do any automated
    processing. I was able to get around that, although it
    would have been nicer if that had been easier.

    The results you got are radically different than my own,
    to the point where I wonder if there is something else
    going on.



    What are your absolute result?
    Are they much faster, much slower or similar to mine?
    Also it would help if you find out characteristics of your test
    hardware.

    Considering that, since I now have no way of doing any
    useful measuring, it seems there is little point in any
    further development or investigation on my part. It's
    been fun, even if ultimately inconclusive.

    I am still interested in combination of speed that does not suck
    with O(N) worst-case memory footprint.
    I already have couple of variants of the former, but so far they are
    all unreasonably slow - ~5 times slower than the best.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Sat Apr 20 07:59:20 2024
    Michael S <already5chosen@yahoo.com> writes:

    On Wed, 17 Apr 2024 10:47:25 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    [...]

    Finally found the time for speed measurements. [...]

    I got these. Thank you.

    The format used didn't make it easy to do any automated
    processing. I was able to get around that, although it
    would have been nicer if that had been easier.

    The results you got are radically different than my own,
    to the point where I wonder if there is something else
    going on.

    What are your absolute result?
    Are they much faster, much slower or similar to mine?
    Also it would help if you find out characteristics of your
    test hardware.

    I think trying to look at those wouldn't tell me anything
    helpful. Too many unknowns. And still no way to test or
    measure any changes to the various algorithms.

    Considering that, since I now have no way of doing any
    useful measuring, it seems there is little point in any
    further development or investigation on my part. It's
    been fun, even if ultimately inconclusive.

    I am still interested in combination of speed that does
    not suck with O(N) worst-case memory footprint.
    I already have couple of variants of the former,

    Did you mean you some algorithms whose worst case memory
    behavior is strictly less than O( total number of pixels )?

    I think it would be helpful to adopt a standard terminology
    where the pixel field is of size M x N, otherwise I'm not
    sure what O(N) refers to.

    but so
    far they are all unreasonably slow - ~5 times slower than
    the best.

    I'm no longer working on the problem but I'm interested to
    hear what you come up with.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Sun Apr 21 04:10:23 2024
    On Fri, 19 Apr 2024 14:59:20 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    On Wed, 17 Apr 2024 10:47:25 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Michael S <already5chosen@yahoo.com> writes:

    [...]

    Finally found the time for speed measurements. [...]

    I got these. Thank you.

    The format used didn't make it easy to do any automated
    processing. I was able to get around that, although it
    would have been nicer if that had been easier.

    The results you got are radically different than my own,
    to the point where I wonder if there is something else
    going on.

    What are your absolute result?
    Are they much faster, much slower or similar to mine?
    Also it would help if you find out characteristics of your
    test hardware.

    I think trying to look at those wouldn't tell me anything
    helpful. Too many unknowns. And still no way to test or
    measure any changes to the various algorithms.


    Frankly, I don't understand.
    If you have troubles with testing on shared hardware then you can
    always test on the hardware that you own and has full control.
    Even if it is a little old, the trends tend to be the same. At least I
    clearly see the same trends on my almost 12 y.o. home PC and on
    relatively modern EPYC3.

    Considering that, since I now have no way of doing any
    useful measuring, it seems there is little point in any
    further development or investigation on my part. It's
    been fun, even if ultimately inconclusive.

    I am still interested in combination of speed that does
    not suck with O(N) worst-case memory footprint.
    I already have couple of variants of the former,

    Did you mean you some algorithms whose worst case memory
    behavior is strictly less than O( total number of pixels )?

    I think it would be helpful to adopt a standard terminology
    where the pixel field is of size M x N, otherwise I'm not
    sure what O(N) refers to.


    No, I mean O(max(M,N)) plus possibly some logarithmic component that
    loses significance when images grow bigger.
    More so, if bounding rectangle of the shape is A x B then I'd like
    memory requirements to be O(max(A,B)), but so far it does not appear to
    be possible, or at least not possible without significant complications
    and further slowdown. So, as an intermediate goal I am willing to
    accept that allocation would be O(max(M,N)). but amount of touched
    memory is O(max(A,B)).

    but so
    far they are all unreasonably slow - ~5 times slower than
    the best.

    I'm no longer working on the problem but I'm interested to
    hear what you come up with.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Fri Apr 26 00:56:06 2024
    On Sat, 20 Apr 2024 21:10:23 +0300
    Michael S <already5chosen@yahoo.com> wrote:

    On Fri, 19 Apr 2024 14:59:20 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:


    Did you mean you some algorithms whose worst case memory
    behavior is strictly less than O( total number of pixels )?

    I think it would be helpful to adopt a standard terminology
    where the pixel field is of size M x N, otherwise I'm not
    sure what O(N) refers to.


    No, I mean O(max(M,N)) plus possibly some logarithmic component that
    loses significance when images grow bigger.
    More so, if bounding rectangle of the shape is A x B then I'd like
    memory requirements to be O(max(A,B)), but so far it does not appear
    to be possible, or at least not possible without significant
    complications and further slowdown. So, as an intermediate goal I am
    willing to accept that allocation would be O(max(M,N)). but amount of
    touched memory is O(max(A,B)).

    but so
    far they are all unreasonably slow - ~5 times slower than
    the best.

    I'm no longer working on the problem but I'm interested to
    hear what you come up with.



    Here is what I had in mind.
    I tried to optimize as little as I can in order to make it as simple
    as I can. Unfortunately, I am not particularly good at it, so, code
    still contains few unnecessary "tricks" that make understanding a
    little harder.
    The code uses VLA and recursion for the same purpose of making it less
    tricky.
    If desired, the memory footprint could be easily reduced by factor of 8
    through use of packed bit arrays instead arrays of _Bool.

    Even in this relatively crude form for majority of shapes this code is blazingly fast.
    Unfortunately, in the worst case (both 'slalom' shapes) an execution
    time is O(max(A,B)**3) which makes it unfit as general-purpose routine.
    At the moment I don't see a solution for this problem. Overall, it's
    probably a dead end.

    #include <stddef.h>
    #include <string.h>

    typedef unsigned char Color;

    struct floodfill4_state {
    Color* image;
    ptrdiff_t width;
    _Bool *l_todo, *r_todo, *u_todo, *d_todo;
    int nx, ny;
    int x, y;
    Color old_color, new_color;
    };

    enum {
    more_r = 1, more_l = 2, more_d = 4, more_u = 8,
    more_lr = more_r+more_l, more_ud=more_u+more_d,
    };

    static
    int floodfill4_expand_lr(struct floodfill4_state* s, int exp_x,
    _Bool* src_todo, _Bool* exp_todo, int lr);
    static
    int floodfill4_expand_ud(struct floodfill4_state* s, int exp_x,
    _Bool* src_todo, _Bool* exp_todo, int ud);

    int floodfill4(Color* image, int width, int height, int x, int y,
    Color old_color, Color new_color)
    {
    if (width <= 0 || height <= 0)
    return 0;

    if (x < 0 || x >= width || y < 0 || y >= height)
    return 0;

    Color* beg = &image[(size_t)width*y+x];
    if (*beg != old_color)
    return 0;

    *beg = new_color;
    // Color* last_row = &image[(size_t)width*(height-1)];
    _Bool lr_todo[2][height];
    _Bool ud_todo[2][width];

    struct floodfill4_state s = {
    .image = beg,
    .width = width,
    .l_todo = &lr_todo[0][y],
    .r_todo = &lr_todo[1][y],
    .u_todo = &ud_todo[0][x],
    .d_todo = &ud_todo[1][x],
    .x = 0, .y = 0, .nx = 1, .ny = 1,
    .old_color = old_color,
    .new_color = new_color,
    };
    *s.l_todo = *s.r_todo = *s.u_todo = *s.d_todo = 1;

    // expansion loop
    for (int more = more_lr+more_ud; more != 0;) {
    if (more & more_lr) {
    _Bool exp_todo[s.ny];
    do {
    if (more & more_r) {
    while (x+s.nx != width) {
    // try to expand to the right
    s.x = s.nx-1;
    int ret = floodfill4_expand_lr(&s, s.nx, s.r_todo,
    exp_todo, more_r);
    if (!ret)
    break;
    more |= ret;
    ++s.nx;
    }
    more &= ~more_r;
    }
    if (more & more_l) {
    while (x != 0) {
    // try to expand to the left
    s.x = 0;
    int ret = floodfill4_expand_lr(&s, -1, s.l_todo, exp_todo,
    more_l);
    if (!ret)
    break;
    more |= ret;
    ++s.nx;
    --s.image;
    --s.u_todo;
    --s.d_todo;
    --x;
    }
    more &= ~more_l;
    }
    } while (more & more_lr);
    }

    if (more & more_ud) {
    _Bool exp_todo[s.nx];
    do {
    if (more & more_d) {
    while (y+s.ny != height) {
    // try to expand down
    s.y = s.ny-1;
    int ret = floodfill4_expand_ud(&s, s.ny, s.d_todo,
    exp_todo, more_d);
    if (!ret)
    break;
    more |= ret;
    ++s.ny;
    }
    more &= ~more_d;
    }
    if (more & more_u) {
    while (y != 0) {
    // try to expand up
    s.y = 0;
    int ret = floodfill4_expand_ud(&s, -1, s.u_todo, exp_todo,
    more_u);
    if (!ret)
    break;
    more |= ret;
    ++s.ny;
    s.image -= s.width;
    --s.l_todo;
    --s.r_todo;
    --y;
    }
    more &= ~more_u;
    }
    } while (more & more_ud);
    }
    }
    return 1;
    }

    // floodfill4_core - floodfill4 recursively in divide and conquer
    fashion
    // s.*-todo arrays initialized by caller. floodfill4_core sets values
    // in that indicate need for further action, but never clears values
    // that were already set
    static void floodfill4_core(const struct floodfill4_state* arg)
    {
    const int nx = arg->nx;
    const int ny = arg->ny;
    if (nx+ny == 2) { // nx==ny==1
    *arg->l_todo = *arg->r_todo = *arg->u_todo = *arg->d_todo = 1;
    *arg->image = arg->new_color;
    return;
    }

    struct floodfill4_state args[2];
    args[0] = args[1] = *arg;
    if (nx > ny) {
    // split vertically
    _Bool todo[2][ny];
    const int hx = nx / 2;

    args[0].r_todo = todo[0];
    args[0].nx = hx;

    args[1].image += hx;
    args[1].l_todo = todo[1];
    args[1].u_todo += hx;
    args[1].d_todo += hx;
    args[1].nx = nx-hx;

    int todo_i;
    int x0 = arg->x;
    if (x0 < hx) { // update left field
    memset(todo[0], 0, ny*sizeof(todo[0][0]));
    floodfill4_core(&args[0]);
    todo_i = 0;
    } else { // update right field
    memset(todo[1], 0, ny*sizeof(todo[0][0]));
    args[1].x = x0 - hx;
    floodfill4_core(&args[1]);
    todo_i = 1;
    }

    args[0].x = hx-1;
    args[1].x = 0;
    for (;;) {
    // look for contact points on destination edge
    _Bool *todo_src = todo[todo_i];
    Color *edge_dst = &arg->image[hx-todo_i];
    int y;
    for (y = 0; y < ny; edge_dst += arg->width, ++y) {
    if (todo_src[y] && *edge_dst == arg->old_color) // contact found
    break;
    }
    if (y == ny)
    break;

    todo_i = 1 - todo_i;
    memset(todo[todo_i], 0, ny*sizeof(todo[0][0]));
    do {
    args[todo_i].y = y;
    floodfill4_core(&args[todo_i]);
    edge_dst += arg->width;
    for (y = y+1; y < ny; edge_dst += arg->width, ++y) {
    if (todo_src[y] && *edge_dst == arg->old_color) // contact
    found
    break;
    }
    } while (y < ny);
    }
    } else { // ny >= nx
    // split horizontally
    _Bool todo[2][nx];
    const int hy = ny / 2;
    Color* edge = &arg->image[arg->width*hy];

    args[0].d_todo = todo[0];
    args[0].ny = hy;

    args[1].image = edge;
    args[1].u_todo = todo[1];
    args[1].l_todo += hy;
    args[1].r_todo += hy;
    args[1].ny = ny-hy;

    int todo_i;
    int y0 = arg->y;
    if (y0 < hy) { // update up field
    memset(todo[0], 0, nx*sizeof(todo[0][0]));
    floodfill4_core(&args[0]);
    todo_i = 0;
    } else { // update down field
    args[1].y = y0 - hy;
    memset(todo[1], 0, nx*sizeof(todo[0][0]));
    floodfill4_core(&args[1]);
    todo_i = 1;
    }

    args[0].y = hy-1;
    args[1].y = 0;
    for (;;) {
    // look for contact points on destination edge
    _Bool *todo_src = todo[todo_i];
    Color *edge_dst = todo_i ? edge - arg->width : edge;
    int x;
    for (x = 0; x < nx; ++x) {
    if (todo_src[x] && edge_dst[x] == arg->old_color) // contact
    found
    break;
    }
    if (x == nx)
    break;

    todo_i = 1 - todo_i;
    memset(todo[todo_i], 0, nx*sizeof(todo[0][0]));
    do {
    args[todo_i].x = x;
    floodfill4_core(&args[todo_i]);
    for (x = x+1; x < nx; ++x) {
    if (todo_src[x] && edge_dst[x] == arg->old_color) // contact
    found
    break;
    }
    } while (x < nx);
    }
    }
    }


    // return value
    // 0 - not expanded
    // 1 - expanded, no bounce back
    // 2 - expanded, possible bounce back
    static
    int floodfill4_expand(
    Color* pixels, // row or column
    ptrdiff_t incr, // distance between adjacent points of pixels
    int len,
    Color old_color,
    Color new_color,
    _Bool* src_todo,
    _Bool* dst_todo,
    _Bool first)
    {
    for (int i = 0; i < len; pixels += incr, ++i) {
    if (src_todo[i] && *pixels == old_color) {
    // contact found
    if (first)
    memset(dst_todo, 0, len*sizeof(*dst_todo));
    *pixels = new_color;
    dst_todo[i] = 1;
    Color* p = pixels - incr;
    int k;
    for (k = i-1; k >= 0 && *p == old_color; p -= incr, --k) {
    *p = new_color;
    dst_todo[k] = 1;
    }
    _Bool more = k != i-1;
    for (;;) {
    pixels += incr;
    for (i = i+1; i < len && *pixels == old_color; pixels += incr,
    ++i) {
    *pixels = new_color;
    dst_todo[i] = 1;
    more |= src_todo[i] ^ 1;
    }
    if (i >= len)
    break;
    pixels += incr;
    for (i = i+1; i < len && (!src_todo[i] || *pixels !=
    old_color); pixels += incr, ++i);
    if (i >= len)
    break;
    *pixels = new_color;
    dst_todo[i] = 1;
    Color* p = pixels - incr;
    for (k = i-1; *p == old_color; --k, p -= incr) {
    *p = new_color;
    dst_todo[k] = 1;
    }
    more |= k != i-1;
    }
    return more ? 2 : 1;
    }
    }
    return 0; // not expended
    }

    // return value - more code
    static
    int floodfill4_expand_lr(struct floodfill4_state* s, int exp_x, _Bool* src_todo, _Bool* exp_todo, int lr)
    {
    // try to expand to the right or left
    const int ny = s->ny;
    int ret = floodfill4_expand(&s->image[exp_x], s->width, ny,
    old_color, s->new_color, src_todo, exp_todo, 1);
    if (!ret)
    return 0;

    int result = lr;
    while (ret == 2) {
    Color* p = &s->image[s->x];
    _Bool contact = 0;
    for (int y = 0; y < ny; p += s->width, ++y) {
    if (exp_todo[y] && *p == s->old_color) {
    if (!contact)
    memset(src_todo, 0, ny*sizeof(*src_todo));
    s->y = y;
    floodfill4_core(s);
    contact = 1;
    }
    }
    if (!contact)
    break;
    result = more_lr+more_ud;
    ret = floodfill4_expand(&s->image[exp_x], s->width, ny,
    s->old_color, s->new_color, src_todo, exp_todo, 0);
    }

    if ((s->u_todo[exp_x] = exp_todo[0])) result |= more_u;
    if ((s->d_todo[exp_x] = exp_todo[ny-1])) result |= more_d;
    memcpy(src_todo, exp_todo, ny*sizeof(*src_todo));
    return result;
    }

    // return value - more code
    static
    int floodfill4_expand_ud(struct floodfill4_state* s, int exp_y, _Bool* src_todo, _Bool* exp_todo, int ud)
    {
    // try to expand up or down
    const int nx = s->nx;
    int ret = floodfill4_expand(&s->image[s->width*exp_y], 1, nx,
    old_color, s->new_color, src_todo, exp_todo, 1);
    if (!ret)
    return 0;

    int result = ud;
    while (ret == 2) {
    Color* p = &s->image[s->width*s->y];
    _Bool contact = 0;
    for (int x = 0; x < nx; ++x) {
    if (exp_todo[x] && p[x] == s->old_color) {
    if (!contact)
    memset(src_todo, 0, nx*sizeof(*src_todo));
    s->x = x;
    floodfill4_core(s);
    contact = 1;
    }
    }
    if (!contact)
    break;
    result = more_lr+more_ud;
    ret = floodfill4_expand(&s->image[s->width*exp_y], 1, nx,
    s->old_color, s->new_color, src_todo, exp_todo, 0);
    }

    if ((s->l_todo[exp_y] = exp_todo[0])) result |= more_l;
    if ((s->r_todo[exp_y] = exp_todo[nx-1])) result |= more_r;
    memcpy(src_todo, exp_todo, nx*sizeof(*src_todo));
    return result;
    }








    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)