• Solving thundering Herd with glibc...

    From Chris M. Thomasson@3:633/280.2 to All on Fri Apr 25 06:39:27 2025
    Well, I don't have any more time to mess around with this, but is Bonita right? does glibc 100% solve _all_ thundering herd problems? I know
    about wait morphing, however it is not a 100% solution.

    Thanks.

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Fri Apr 25 07:10:46 2025
    Am 24.04.2025 um 22:39 schrieb Chris M. Thomasson:
    Well, I don't have any more time to mess around with this, but is Bonita right? does glibc 100% solve _all_ thundering herd problems? I know
    about wait morphing, however it is not a 100% solution.

    With wait morphing thundering herd is impossible.


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Fri Apr 25 07:33:40 2025
    On 4/24/2025 2:10 PM, Bonita Montero wrote:
    Am 24.04.2025 um 22:39 schrieb Chris M. Thomasson:
    Well, I don't have any more time to mess around with this, but is
    Bonita right? does glibc 100% solve _all_ thundering herd problems? I
    know about wait morphing, however it is not a 100% solution.

    With wait morphing thundering herd is impossible.


    "there’s no thundering herd, ever!" because a controlled test didn't
    "show it" is like saying race conditions do not exist because your code "worked fine this time."? Fair enough?

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Scott Lurndal@3:633/280.2 to All on Fri Apr 25 07:35:30 2025
    Reply-To: slp53@pacbell.net

    Bonita Montero <Bonita.Montero@gmail.com> writes:
    Am 24.04.2025 um 22:39 schrieb Chris M. Thomasson:
    Well, I don't have any more time to mess around with this, but is Bonita
    right? does glibc 100% solve _all_ thundering herd problems? I know
    about wait morphing, however it is not a 100% solution.

    With wait morphing thundering herd is impossible.

    Wait morphing was removed from glibc in 2016, if I recall correctly.

    In any case, a programmer should never assume that the
    run-time system will support wait morphing when writing
    portable code. Release the mutex before signaling the
    condition variable and avoid pthread_cond_broadcast unless
    absolutely necessary.

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: UsenetServer - www.usenetserver.com (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Fri Apr 25 16:28:08 2025
    Am 24.04.2025 um 23:35 schrieb Scott Lurndal:

    Wait morphing was removed from glibc in 2016, if I recall correctly.

    I've shown with my first pingpong-code that there are no further
    wakeups beyond one per loop. Try to find the bug; there isn't a bug.

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Fri Apr 25 18:37:04 2025
    Am 24.04.2025 um 23:33 schrieb Chris M. Thomasson:

    "there’s no thundering herd, ever!" because a controlled test didn't
    "show it" is like saying race conditions do not exist because your code "worked fine this time."? Fair enough?

    Yes, controlled test with 10'000 iterations.
    The code is correct and trivial, but too much for you.


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Fri Apr 25 21:27:28 2025
    Am 25.04.2025 um 10:37 schrieb Bonita Montero:

    Yes, controlled test with 10'000 iterations.
    The code is correct and trivial, but too much for you.

    A thundering herd problem with a condvar should occur if I notify > 1
    threads and unlock the mutex, but it actually dosn't happen; never.
    So the glibc condvar is optimal.


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Sat Apr 26 06:01:21 2025
    On 4/25/2025 4:27 AM, Bonita Montero wrote:
    Am 25.04.2025 um 10:37 schrieb Bonita Montero:

    Yes, controlled test with 10'000 iterations.
    The code is correct and trivial, but too much for you.

    A thundering herd problem with a condvar should occur if I notify > 1
    threads and unlock the mutex, but it actually dosn't happen; never.
    So the glibc condvar is optimal.


    If you say so... Too busy right now. Perhaps sometime later on tonight.

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Sat Apr 26 16:26:21 2025
    Am 25.04.2025 um 22:01 schrieb Chris M. Thomasson:

    If you say so... Too busy right now. Perhaps sometime later on tonight.


    If there would be a thundering herd problem with glibc's condvar it
    would happen very often with my code since I awake 31 threads at once
    with my machine.


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Sat Apr 26 17:25:22 2025
    Am 26.04.2025 um 08:26 schrieb Bonita Montero:
    Am 25.04.2025 um 22:01 schrieb Chris M. Thomasson:

    If you say so... Too busy right now. Perhaps sometime later on tonight.


    If there would be a thundering herd problem with glibc's condvar it
    would happen very often with my code since I awake 31 threads at once
    with my machine.

    I just tried to awaken all 31 threads from outside holding the mutex,
    but not from inside:

    for( size_t r = N; r; --r )
    {
    unique_lock lock( mtx );
    signalled = nClients;
    ai.store( nClients, memory_order_relaxed );
    lock.unlock();
    if( argc < 2 )
    cv.notify_all();
    else
    for( int c = nClients; c; cv.notify_one(), --c );
    bs.acquire();
    }

    The result: 7.500 context switches per thread, not 3.000.

    10000 rounds,
    7498.06 context switches pe thread

    So never signal a condvar to multiple threads from outside !

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Sun Apr 27 07:41:31 2025
    On 4/26/2025 12:25 AM, Bonita Montero wrote:
    Am 26.04.2025 um 08:26 schrieb Bonita Montero:
    Am 25.04.2025 um 22:01 schrieb Chris M. Thomasson:

    If you say so... Too busy right now. Perhaps sometime later on
    tonight.


    If there would be a thundering herd problem with glibc's condvar it
    would happen very often with my code since I awake 31 threads at once
    with my machine.

    I just tried to awaken all 31 threads from outside holding the mutex,
    but not from inside:

    ÿÿÿÿfor( size_t r = N; r; --r )
    ÿÿÿÿ{
    ÿÿÿÿÿÿÿ unique_lock lock( mtx );
    ÿÿÿÿÿÿÿ signalled = nClients;
    ÿÿÿÿÿÿÿ ai.store( nClients, memory_order_relaxed );
    ÿÿÿÿÿÿÿ lock.unlock();
    ÿÿÿÿÿÿÿ if( argc < 2 )
    ÿÿÿÿÿÿÿÿÿÿÿ cv.notify_all();
    ÿÿÿÿÿÿÿ else
    ÿÿÿÿÿÿÿÿÿÿÿ for( int c = nClients; c; cv.notify_one(), --c );
    ÿÿÿÿÿÿÿ bs.acquire();
    ÿÿÿÿ}

    The result: 7.500 context switches per thread, not 3.000.

    ÿÿÿÿ10000 rounds,
    ÿÿÿÿ7498.06 context switches pe thread

    So never signal a condvar to multiple threads from outside !

    So, do that. It's your software. Do what you like. This is a very old
    debate. Take your contrived test and just, roll with it. Whatever.

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Sun Apr 27 13:18:02 2025
    Am 26.04.2025 um 23:41 schrieb Chris M. Thomasson:

    On 4/26/2025 12:25 AM, Bonita Montero wrote:


    I just tried to awaken all 31 threads from outside holding the mutex,
    but not from inside:

    ÿÿÿÿÿfor( size_t r = N; r; --r )
    ÿÿÿÿÿ{
    ÿÿÿÿÿÿÿÿ unique_lock lock( mtx );
    ÿÿÿÿÿÿÿÿ signalled = nClients;
    ÿÿÿÿÿÿÿÿ ai.store( nClients, memory_order_relaxed );
    ÿÿÿÿÿÿÿÿ lock.unlock();
    ÿÿÿÿÿÿÿÿ if( argc < 2 )
    ÿÿÿÿÿÿÿÿÿÿÿÿ cv.notify_all();
    ÿÿÿÿÿÿÿÿ else
    ÿÿÿÿÿÿÿÿÿÿÿÿ for( int c = nClients; c; cv.notify_one(), --c );
    ÿÿÿÿÿÿÿÿ bs.acquire();
    ÿÿÿÿÿ}

    The result: 7.500 context switches per thread, not 3.000.

    ÿÿÿÿÿ10000 rounds,
    ÿÿÿÿÿ7498.06 context switches pe thread

    So never signal a condvar to multiple threads from outside !

    So, do that. It's your software. Do what you like. This is a very old debate. Take your contrived test and just, roll with it. Whatever.

    There's nothing to debate, I've measured it: If you awake a single
    thread it doesn't matter if you awake from inside or outside, if you
    awake multiple threads awakening them from inside is multiple times
    faster.


    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Wed Apr 30 05:26:41 2025
    On 4/26/2025 8:18 PM, Bonita Montero wrote:
    Am 26.04.2025 um 23:41 schrieb Chris M. Thomasson:

    On 4/26/2025 12:25 AM, Bonita Montero wrote:


    I just tried to awaken all 31 threads from outside holding the mutex,
    but not from inside:

    ÿÿÿÿÿfor( size_t r = N; r; --r )
    ÿÿÿÿÿ{
    ÿÿÿÿÿÿÿÿ unique_lock lock( mtx );
    ÿÿÿÿÿÿÿÿ signalled = nClients;
    ÿÿÿÿÿÿÿÿ ai.store( nClients, memory_order_relaxed );
    ÿÿÿÿÿÿÿÿ lock.unlock();
    ÿÿÿÿÿÿÿÿ if( argc < 2 )
    ÿÿÿÿÿÿÿÿÿÿÿÿ cv.notify_all();
    ÿÿÿÿÿÿÿÿ else
    ÿÿÿÿÿÿÿÿÿÿÿÿ for( int c = nClients; c; cv.notify_one(), --c );
    ÿÿÿÿÿÿÿÿ bs.acquire();
    ÿÿÿÿÿ}

    The result: 7.500 context switches per thread, not 3.000.

    ÿÿÿÿÿ10000 rounds,
    ÿÿÿÿÿ7498.06 context switches pe thread

    So never signal a condvar to multiple threads from outside !

    So, do that. It's your software. Do what you like. This is a very old
    debate. Take your contrived test and just, roll with it. Whatever.

    There's nothing to debate, I've measured it: If you awake a single
    thread it doesn't matter if you awake from inside or outside, if you
    awake multiple threads awakening them from inside is multiple times
    faster.


    Yawn.

    --- MBSE BBS v1.1.1 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)