Well, I don't have any more time to mess around with this, but is Bonita right? does glibc 100% solve _all_ thundering herd problems? I know
about wait morphing, however it is not a 100% solution.
Am 24.04.2025 um 22:39 schrieb Chris M. Thomasson:
Well, I don't have any more time to mess around with this, but is
Bonita right? does glibc 100% solve _all_ thundering herd problems? I
know about wait morphing, however it is not a 100% solution.
With wait morphing thundering herd is impossible.
Am 24.04.2025 um 22:39 schrieb Chris M. Thomasson:
Well, I don't have any more time to mess around with this, but is Bonita
right? does glibc 100% solve _all_ thundering herd problems? I know
about wait morphing, however it is not a 100% solution.
With wait morphing thundering herd is impossible.
Wait morphing was removed from glibc in 2016, if I recall correctly.
"there’s no thundering herd, ever!" because a controlled test didn't
"show it" is like saying race conditions do not exist because your code "worked fine this time."? Fair enough?
Yes, controlled test with 10'000 iterations.
The code is correct and trivial, but too much for you.
Am 25.04.2025 um 10:37 schrieb Bonita Montero:
Yes, controlled test with 10'000 iterations.
The code is correct and trivial, but too much for you.
A thundering herd problem with a condvar should occur if I notify > 1
threads and unlock the mutex, but it actually dosn't happen; never.
So the glibc condvar is optimal.
If you say so... Too busy right now. Perhaps sometime later on tonight.
Am 25.04.2025 um 22:01 schrieb Chris M. Thomasson:
If you say so... Too busy right now. Perhaps sometime later on tonight.
If there would be a thundering herd problem with glibc's condvar it
would happen very often with my code since I awake 31 threads at once
with my machine.
Am 26.04.2025 um 08:26 schrieb Bonita Montero:
Am 25.04.2025 um 22:01 schrieb Chris M. Thomasson:
If you say so... Too busy right now. Perhaps sometime later ontonight.
If there would be a thundering herd problem with glibc's condvar it
would happen very often with my code since I awake 31 threads at once
with my machine.
I just tried to awaken all 31 threads from outside holding the mutex,
but not from inside:
ÿÿÿÿfor( size_t r = N; r; --r )
ÿÿÿÿ{
ÿÿÿÿÿÿÿ unique_lock lock( mtx );
ÿÿÿÿÿÿÿ signalled = nClients;
ÿÿÿÿÿÿÿ ai.store( nClients, memory_order_relaxed );
ÿÿÿÿÿÿÿ lock.unlock();
ÿÿÿÿÿÿÿ if( argc < 2 )
ÿÿÿÿÿÿÿÿÿÿÿ cv.notify_all();
ÿÿÿÿÿÿÿ else
ÿÿÿÿÿÿÿÿÿÿÿ for( int c = nClients; c; cv.notify_one(), --c );
ÿÿÿÿÿÿÿ bs.acquire();
ÿÿÿÿ}
The result: 7.500 context switches per thread, not 3.000.
ÿÿÿÿ10000 rounds,
ÿÿÿÿ7498.06 context switches pe thread
So never signal a condvar to multiple threads from outside !
On 4/26/2025 12:25 AM, Bonita Montero wrote:
I just tried to awaken all 31 threads from outside holding the mutex,
but not from inside:
ÿÿÿÿÿfor( size_t r = N; r; --r )
ÿÿÿÿÿ{
ÿÿÿÿÿÿÿÿ unique_lock lock( mtx );
ÿÿÿÿÿÿÿÿ signalled = nClients;
ÿÿÿÿÿÿÿÿ ai.store( nClients, memory_order_relaxed );
ÿÿÿÿÿÿÿÿ lock.unlock();
ÿÿÿÿÿÿÿÿ if( argc < 2 )
ÿÿÿÿÿÿÿÿÿÿÿÿ cv.notify_all();
ÿÿÿÿÿÿÿÿ else
ÿÿÿÿÿÿÿÿÿÿÿÿ for( int c = nClients; c; cv.notify_one(), --c );
ÿÿÿÿÿÿÿÿ bs.acquire();
ÿÿÿÿÿ}
The result: 7.500 context switches per thread, not 3.000.
ÿÿÿÿÿ10000 rounds,
ÿÿÿÿÿ7498.06 context switches pe thread
So never signal a condvar to multiple threads from outside !
So, do that. It's your software. Do what you like. This is a very old debate. Take your contrived test and just, roll with it. Whatever.
Am 26.04.2025 um 23:41 schrieb Chris M. Thomasson:
On 4/26/2025 12:25 AM, Bonita Montero wrote:
I just tried to awaken all 31 threads from outside holding the mutex,
but not from inside:
ÿÿÿÿÿfor( size_t r = N; r; --r )
ÿÿÿÿÿ{
ÿÿÿÿÿÿÿÿ unique_lock lock( mtx );
ÿÿÿÿÿÿÿÿ signalled = nClients;
ÿÿÿÿÿÿÿÿ ai.store( nClients, memory_order_relaxed );
ÿÿÿÿÿÿÿÿ lock.unlock();
ÿÿÿÿÿÿÿÿ if( argc < 2 )
ÿÿÿÿÿÿÿÿÿÿÿÿ cv.notify_all();
ÿÿÿÿÿÿÿÿ else
ÿÿÿÿÿÿÿÿÿÿÿÿ for( int c = nClients; c; cv.notify_one(), --c );
ÿÿÿÿÿÿÿÿ bs.acquire();
ÿÿÿÿÿ}
The result: 7.500 context switches per thread, not 3.000.
ÿÿÿÿÿ10000 rounds,
ÿÿÿÿÿ7498.06 context switches pe thread
So never signal a condvar to multiple threads from outside !
So, do that. It's your software. Do what you like. This is a very old
debate. Take your contrived test and just, roll with it. Whatever.
There's nothing to debate, I've measured it: If you awake a single
thread it doesn't matter if you awake from inside or outside, if you
awake multiple threads awakening them from inside is multiple times
faster.
Am 24.04.2025 um 23:33 schrieb Chris M. Thomasson:
"there’s no thundering herd, ever!" because a controlled test didn't
"show it" is like saying race conditions do not exist because your
code "worked fine this time."? Fair enough?
Yes, controlled test with 10'000 iterations.
The code is correct and trivial, but too much for you.
On 25/04/2025 09:37, Bonita Montero wrote:
Am 24.04.2025 um 23:33 schrieb Chris M. Thomasson:Once upon a time I put a race in a bit of code.
"there’s no thundering herd, ever!" because a controlled test didn't
"show it" is like saying race conditions do not exist because your
code "worked fine this time."? Fair enough?
Yes, controlled test with 10'000 iterations.
The code is correct and trivial, but too much for you.
It took us 3 years to track down why our customers were reporting
occasional faults :(
(It turned out the trick to reproduce it was a combination of lots of
CPU load and disk transfers not more than once every 30 seconds)
On 25/04/2025 09:37, Bonita Montero wrote:
Am 24.04.2025 um 23:33 schrieb Chris M. Thomasson:Once upon a time I put a race in a bit of code.
"there’s no thundering herd, ever!" because a controlled test didn't
"show it" is like saying race conditions do not exist because your
code "worked fine this time."? Fair enough?
Yes, controlled test with 10'000 iterations.
The code is correct and trivial, but too much for you.
It took us 3 years to track down why our customers were reporting
occasional faults :(
(It turned out the trick to reproduce it was a combination of lots of
CPU load and disk transfers not more than once every 30 seconds)
On 14/05/2025 17:05, Vir Campestris wrote:
On 25/04/2025 09:37, Bonita Montero wrote:
Am 24.04.2025 um 23:33 schrieb Chris M. Thomasson:Once upon a time I put a race in a bit of code.
"there’s no thundering herd, ever!" because a controlled test didn't >>>> "show it" is like saying race conditions do not exist because your
code "worked fine this time."? Fair enough?
Yes, controlled test with 10'000 iterations.
The code is correct and trivial, but too much for you.
It took us 3 years to track down why our customers were reporting
occasional faults :(
(It turned out the trick to reproduce it was a combination of lots of
CPU load and disk transfers not more than once every 30 seconds)
It's always fun dealing with a bug that only triggers in rare timing situations. We once had a mistake in a timing table in a program that
could sometimes result in intermittent faults in the system if
particular events occurred at the same time, on the 30th of September. Finding an issue that occurred at most once per year was a challenge!
It took us 3 years to track down why our customers were reporting
occasional faults :(
David Brown <david.brown@hesbynett.no> wrote at 06:49 this Thursday (GMT):
On 14/05/2025 17:05, Vir Campestris wrote:
On 25/04/2025 09:37, Bonita Montero wrote:
Am 24.04.2025 um 23:33 schrieb Chris M. Thomasson:Once upon a time I put a race in a bit of code.
"there’s no thundering herd, ever!" because a controlled test didn't >>>>> "show it" is like saying race conditions do not exist because your
code "worked fine this time."? Fair enough?
Yes, controlled test with 10'000 iterations.
The code is correct and trivial, but too much for you.
It took us 3 years to track down why our customers were reporting
occasional faults :(
(It turned out the trick to reproduce it was a combination of lots of
CPU load and disk transfers not more than once every 30 seconds)
It's always fun dealing with a bug that only triggers in rare timing
situations. We once had a mistake in a timing table in a program that
could sometimes result in intermittent faults in the system if
particular events occurred at the same time, on the 30th of September.
Finding an issue that occurred at most once per year was a challenge!
If you knew what date the issue was happening on, could you force the
system clock to be on that day?
On 16/05/2025 14:30, candycanearter07 wrote:
David Brown <david.brown@hesbynett.no> wrote at 06:49 this Thursday (GMT): >>> On 14/05/2025 17:05, Vir Campestris wrote:
On 25/04/2025 09:37, Bonita Montero wrote:
Am 24.04.2025 um 23:33 schrieb Chris M. Thomasson:Once upon a time I put a race in a bit of code.
"there’s no thundering herd, ever!" because a controlled test didn't >>>>>> "show it" is like saying race conditions do not exist because your >>>>>> code "worked fine this time."? Fair enough?
Yes, controlled test with 10'000 iterations.
The code is correct and trivial, but too much for you.
It took us 3 years to track down why our customers were reporting
occasional faults :(
(It turned out the trick to reproduce it was a combination of lots of
CPU load and disk transfers not more than once every 30 seconds)
It's always fun dealing with a bug that only triggers in rare timing
situations. We once had a mistake in a timing table in a program that
could sometimes result in intermittent faults in the system if
particular events occurred at the same time, on the 30th of September.
Finding an issue that occurred at most once per year was a challenge!
If you knew what date the issue was happening on, could you force the
system clock to be on that day?
Yes, once we figured out that the issue was date-dependent. For the
first few years, all we knew was that we were getting occasional rare
bug reports and no one saw the coincidence. (This was a program on DOS
- changing the system clock was easy.)
Sysop: | Tetrazocine |
---|---|
Location: | Melbourne, VIC, Australia |
Users: | 8 |
Nodes: | 8 (0 / 8) |
Uptime: | 115:51:48 |
Calls: | 161 |
Files: | 21,502 |
Messages: | 78,817 |