• Idea for spin-wait loops

    From Bonita Montero@3:633/280.2 to All on Sun Mar 24 03:53:40 2024
    I've got a nice idea for a new processor-extrension for spin-wait
    -loops. The idea is that a thread of a processors enters a sleep
    state if a word in memory is equal to a certain register until
    the cacheline containing the word is modified or there's a timeout
    according to the timestamp-counter's value.
    This would eliminate active spinning and polling a value in memory.
    Polling would occur only if the cacheline would be modified by
    another thread.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Sun Mar 24 07:52:49 2024
    On 3/23/2024 9:53 AM, Bonita Montero wrote:
    I've got a nice idea for a new processor-extrension for spin-wait
    -loops. The idea is that a thread of a processors enters a sleep
    state if a word in memory is equal to a certain register until
    the cacheline containing the word is modified or there's a timeout
    according to the timestamp-counter's value.
    This would eliminate active spinning and polling a value in memory.
    Polling would occur only if the cacheline would be modified by
    another thread.

    futex

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Sun Mar 24 07:58:02 2024
    On 3/23/2024 1:52 PM, Chris M. Thomasson wrote:
    On 3/23/2024 9:53 AM, Bonita Montero wrote:
    I've got a nice idea for a new processor-extrension for spin-wait
    -loops. The idea is that a thread of a processors enters a sleep
    state if a word in memory is equal to a certain register until
    the cacheline containing the word is modified or there's a timeout
    according to the timestamp-counter's value.
    This would eliminate active spinning and polling a value in memory.
    Polling would occur only if the cacheline would be modified by
    another thread.

    futex

    MWAIT?

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Sun Mar 24 17:37:33 2024
    Am 23.03.2024 um 21:52 schrieb Chris M. Thomasson:
    On 3/23/2024 9:53 AM, Bonita Montero wrote:
    I've got a nice idea for a new processor-extrension for spin-wait
    -loops. The idea is that a thread of a processors enters a sleep
    state if a word in memory is equal to a certain register until
    the cacheline containing the word is modified or there's a timeout
    according to the timestamp-counter's value.
    This would eliminate active spinning and polling a value in memory.
    Polling would occur only if the cacheline would be modified by
    another thread.

    futex

    Not all kinds of mutexes can be done with a futex.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Sun Mar 24 17:38:02 2024
    Am 23.03.2024 um 21:58 schrieb Chris M. Thomasson:
    On 3/23/2024 1:52 PM, Chris M. Thomasson wrote:
    On 3/23/2024 9:53 AM, Bonita Montero wrote:
    I've got a nice idea for a new processor-extrension for spin-wait
    -loops. The idea is that a thread of a processors enters a sleep
    state if a word in memory is equal to a certain register until
    the cacheline containing the word is modified or there's a timeout
    according to the timestamp-counter's value.
    This would eliminate active spinning and polling a value in memory.
    Polling would occur only if the cacheline would be modified by
    another thread.

    futex

    MWAIT?

    MWAIT has no timeout.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Mon Mar 25 06:33:42 2024
    On 3/23/2024 11:37 PM, Bonita Montero wrote:
    Am 23.03.2024 um 21:52 schrieb Chris M. Thomasson:
    On 3/23/2024 9:53 AM, Bonita Montero wrote:
    I've got a nice idea for a new processor-extrension for spin-wait
    -loops. The idea is that a thread of a processors enters a sleep
    state if a word in memory is equal to a certain register until
    the cacheline containing the word is modified or there's a timeout
    according to the timestamp-counter's value.
    This would eliminate active spinning and polling a value in memory.
    Polling would occur only if the cacheline would be modified by
    another thread.

    futex

    Not all kinds of mutexes can be done with a futex.


    Have you ever heard of an asymmetric mutex?

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Scott Lurndal@3:633/280.2 to All on Mon Mar 25 07:43:37 2024
    Reply-To: slp53@pacbell.net

    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
    On 3/23/2024 9:53 AM, Bonita Montero wrote:
    I've got a nice idea for a new processor-extrension for spin-wait
    -loops. The idea is that a thread of a processors enters a sleep
    state if a word in memory is equal to a certain register until

    A processor which is doesn't own (or have a shared copy) of the
    cacheline which would contain that word in memory will never know
    if it was modified, as it won't see the invalidate messages in
    a directory-based cache subsystem (leaving aside noncachable
    accesses to the word in memory, of course).

    This sounds like a solution to a problem that doesn't exist,
    and there would be no incentive for a processor designer
    to include the substantial additional complexity required
    to support your feature.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: UsenetServer - www.usenetserver.com (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Mon Mar 25 17:23:14 2024
    Am 24.03.2024 um 21:43 schrieb Scott Lurndal:

    This sounds like a solution to a problem that doesn't exist,
    and there would be no incentive for a processor designer
    to include the substantial additional complexity required
    to support your feature.

    MONITOR / MWAIT is nearly the same except for the timeout.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Mon Mar 25 23:34:50 2024
    On Sun, 24 Mar 2024 20:43:37 GMT
    scott@slp53.sl.home (Scott Lurndal) wrote:

    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
    On 3/23/2024 9:53 AM, Bonita Montero wrote:
    I've got a nice idea for a new processor-extrension for spin-wait
    -loops. The idea is that a thread of a processors enters a sleep
    state if a word in memory is equal to a certain register until

    A processor which is doesn't own (or have a shared copy) of the
    cacheline which would contain that word in memory will never know
    if it was modified, as it won't see the invalidate messages in
    a directory-based cache subsystem (leaving aside noncachable
    accesses to the word in memory, of course).


    It seems, I didn't understand the idea.
    Of course, the waiting thread/core has the word in question in its
    L1D cache when it enters the wait loop.
    Of course, it is awaken if/when the the word is evicted from the cache
    for unrelated reason, i.e. practically because of capacity conflict
    caused by activity of other threads that are running on the same
    core. There is nothing wrong with spurious awakenings as long as they
    are rare.

    This sounds like a solution to a problem that doesn't exist,
    and there would be no incentive for a processor designer
    to include the substantial additional complexity required
    to support your feature.

    The problem does exist and primitive proposed by Bonita is not new. It
    is a minor modification of Monitor/Mwait.
    For current Intel and AMD processors this sort of things is
    relatively unattractive because at 2 threads per core and with rather measurable throughput gains achieved by running 2 threads instead of
    one (for AMD up to 30%, for Intel a little less, but often measurable),
    each thread is a valuable resource, so you don't really want to keep it
    paused for too long time. And the whole point of Bonita's amendment of
    existing mechanism is that the software has more control on long waits.

    On IBM POWER and on few of Sun/Oracle chips they have up to 8 threads
    per core, so each thread is not that valuable. It means that longer uninterrupted wait has more sense and control of duration of the
    timeout is more desirable. But may be IBM's and Oracle's variants of
    MWAIT already have it?





    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Tue Mar 26 04:11:22 2024
    On Mon, 25 Mar 2024 14:34:50 +0200
    Michael S <already5chosen@yahoo.com> wrote:

    On Sun, 24 Mar 2024 20:43:37 GMT
    scott@slp53.sl.home (Scott Lurndal) wrote:

    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
    On 3/23/2024 9:53 AM, Bonita Montero wrote:
    I've got a nice idea for a new processor-extrension for spin-wait
    -loops. The idea is that a thread of a processors enters a sleep
    state if a word in memory is equal to a certain register until


    A processor which is doesn't own (or have a shared copy) of the
    cacheline which would contain that word in memory will never know
    if it was modified, as it won't see the invalidate messages in
    a directory-based cache subsystem (leaving aside noncachable
    accesses to the word in memory, of course).


    It seems, I didn't understand the idea.

    I meant to say 'you' instead of 'I'.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Tue Mar 26 04:53:52 2024
    Am 25.03.2024 um 13:34 schrieb Michael S:

    The problem does exist and primitive proposed by Bonita is not new.
    It is a minor modification of Monitor/Mwait.

    Functionally the modification is minor, but the effect would be
    major since the cache-interconnect traffic would be minimized.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Tue Mar 26 13:48:27 2024
    On 3/23/2024 11:38 PM, Bonita Montero wrote:
    Am 23.03.2024 um 21:58 schrieb Chris M. Thomasson:
    On 3/23/2024 1:52 PM, Chris M. Thomasson wrote:
    On 3/23/2024 9:53 AM, Bonita Montero wrote:
    I've got a nice idea for a new processor-extrension for spin-wait
    -loops. The idea is that a thread of a processors enters a sleep
    state if a word in memory is equal to a certain register until
    the cacheline containing the word is modified or there's a timeout
    according to the timestamp-counter's value.
    This would eliminate active spinning and polling a value in memory.
    Polling would occur only if the cacheline would be modified by
    another thread.

    futex

    MWAIT?

    MWAIT has no timeout.


    Not sure how important it would be for MWAIT to have a timeout... You
    are referring to user space, right?

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Tue Mar 26 21:12:07 2024
    Am 26.03.2024 um 03:48 schrieb Chris M. Thomasson:
    On 3/23/2024 11:38 PM, Bonita Montero wrote:
    Am 23.03.2024 um 21:58 schrieb Chris M. Thomasson:
    On 3/23/2024 1:52 PM, Chris M. Thomasson wrote:
    On 3/23/2024 9:53 AM, Bonita Montero wrote:
    I've got a nice idea for a new processor-extrension for spin-wait
    -loops. The idea is that a thread of a processors enters a sleep
    state if a word in memory is equal to a certain register until
    the cacheline containing the word is modified or there's a timeout
    according to the timestamp-counter's value.
    This would eliminate active spinning and polling a value in memory.
    Polling would occur only if the cacheline would be modified by
    another thread.

    futex

    MWAIT?

    MWAIT has no timeout.


    Not sure how important it would be for MWAIT to have a timeout... You
    are referring to user space, right?

    MWAIT could be used for limited spinning like glibc's pthread_mutex
    is capable. The advantage of a MWAIT with timout would be much less interconnect-traffic compared to polling.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Wed Mar 27 07:02:47 2024
    On 3/26/2024 3:12 AM, Bonita Montero wrote:
    Am 26.03.2024 um 03:48 schrieb Chris M. Thomasson:
    On 3/23/2024 11:38 PM, Bonita Montero wrote:
    Am 23.03.2024 um 21:58 schrieb Chris M. Thomasson:
    On 3/23/2024 1:52 PM, Chris M. Thomasson wrote:
    On 3/23/2024 9:53 AM, Bonita Montero wrote:
    I've got a nice idea for a new processor-extrension for spin-wait
    -loops. The idea is that a thread of a processors enters a sleep
    state if a word in memory is equal to a certain register until
    the cacheline containing the word is modified or there's a timeout >>>>>> according to the timestamp-counter's value.
    This would eliminate active spinning and polling a value in memory. >>>>>> Polling would occur only if the cacheline would be modified by
    another thread.

    futex

    MWAIT?

    MWAIT has no timeout.


    Not sure how important it would be for MWAIT to have a timeout... You
    are referring to user space, right?

    MWAIT could be used for limited spinning like glibc's pthread_mutex
    is capable. The advantage of a MWAIT with timout would be much less interconnect-traffic compared to polling.


    MWAIT is meant to get around polling?

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Wed Mar 27 07:13:58 2024
    On 3/25/2024 10:53 AM, Bonita Montero wrote:
    Am 25.03.2024 um 13:34 schrieb Michael S:

    The problem does exist and primitive proposed by Bonita is not new.
    It is a minor modification of Monitor/Mwait.

    Functionally the modification is minor, but the effect would be
    major since the cache-interconnect traffic would be minimized.


    Ask over in comp.arch

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Wed Mar 27 07:23:07 2024
    Am 26.03.2024 um 21:02 schrieb Chris M. Thomasson:
    On 3/26/2024 3:12 AM, Bonita Montero wrote:
    Am 26.03.2024 um 03:48 schrieb Chris M. Thomasson:
    On 3/23/2024 11:38 PM, Bonita Montero wrote:
    Am 23.03.2024 um 21:58 schrieb Chris M. Thomasson:
    On 3/23/2024 1:52 PM, Chris M. Thomasson wrote:
    On 3/23/2024 9:53 AM, Bonita Montero wrote:
    I've got a nice idea for a new processor-extrension for spin-wait >>>>>>> -loops. The idea is that a thread of a processors enters a sleep >>>>>>> state if a word in memory is equal to a certain register until
    the cacheline containing the word is modified or there's a timeout >>>>>>> according to the timestamp-counter's value.
    This would eliminate active spinning and polling a value in memory. >>>>>>> Polling would occur only if the cacheline would be modified by
    another thread.

    futex

    MWAIT?

    MWAIT has no timeout.


    Not sure how important it would be for MWAIT to have a timeout... You
    are referring to user space, right?

    MWAIT could be used for limited spinning like glibc's pthread_mutex
    is capable. The advantage of a MWAIT with timout would be much less
    interconnect-traffic compared to polling.


    MWAIT is meant to get around polling?

    MWAIT could replace polling / spinning on a mutex for a limited
    time if it would have a timeout.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Wed Mar 27 07:30:45 2024
    On 3/26/2024 1:23 PM, Bonita Montero wrote:
    Am 26.03.2024 um 21:02 schrieb Chris M. Thomasson:
    On 3/26/2024 3:12 AM, Bonita Montero wrote:
    Am 26.03.2024 um 03:48 schrieb Chris M. Thomasson:
    On 3/23/2024 11:38 PM, Bonita Montero wrote:
    Am 23.03.2024 um 21:58 schrieb Chris M. Thomasson:
    On 3/23/2024 1:52 PM, Chris M. Thomasson wrote:
    On 3/23/2024 9:53 AM, Bonita Montero wrote:
    I've got a nice idea for a new processor-extrension for spin-wait >>>>>>>> -loops. The idea is that a thread of a processors enters a sleep >>>>>>>> state if a word in memory is equal to a certain register until >>>>>>>> the cacheline containing the word is modified or there's a timeout >>>>>>>> according to the timestamp-counter's value.
    This would eliminate active spinning and polling a value in memory. >>>>>>>> Polling would occur only if the cacheline would be modified by >>>>>>>> another thread.

    futex

    MWAIT?

    MWAIT has no timeout.


    Not sure how important it would be for MWAIT to have a timeout...
    You are referring to user space, right?

    MWAIT could be used for limited spinning like glibc's pthread_mutex
    is capable. The advantage of a MWAIT with timout would be much less
    interconnect-traffic compared to polling.


    MWAIT is meant to get around polling?

    MWAIT could replace polling / spinning on a mutex for a limited
    time if it would have a timeout.


    So, you timeout, check some other stuff, then wait again. Still sounds
    like polling?

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Wed Mar 27 07:31:24 2024
    On 3/26/2024 1:30 PM, Chris M. Thomasson wrote:
    On 3/26/2024 1:23 PM, Bonita Montero wrote:
    Am 26.03.2024 um 21:02 schrieb Chris M. Thomasson:
    On 3/26/2024 3:12 AM, Bonita Montero wrote:
    Am 26.03.2024 um 03:48 schrieb Chris M. Thomasson:
    On 3/23/2024 11:38 PM, Bonita Montero wrote:
    Am 23.03.2024 um 21:58 schrieb Chris M. Thomasson:
    On 3/23/2024 1:52 PM, Chris M. Thomasson wrote:
    On 3/23/2024 9:53 AM, Bonita Montero wrote:
    I've got a nice idea for a new processor-extrension for spin-wait >>>>>>>>> -loops. The idea is that a thread of a processors enters a sleep >>>>>>>>> state if a word in memory is equal to a certain register until >>>>>>>>> the cacheline containing the word is modified or there's a timeout >>>>>>>>> according to the timestamp-counter's value.
    This would eliminate active spinning and polling a value in >>>>>>>>> memory.
    Polling would occur only if the cacheline would be modified by >>>>>>>>> another thread.

    futex

    MWAIT?

    MWAIT has no timeout.


    Not sure how important it would be for MWAIT to have a timeout...
    You are referring to user space, right?

    MWAIT could be used for limited spinning like glibc's pthread_mutex
    is capable. The advantage of a MWAIT with timout would be much less
    interconnect-traffic compared to polling.


    MWAIT is meant to get around polling?

    MWAIT could replace polling / spinning on a mutex for a limited
    time if it would have a timeout.


    So, you timeout, check some other stuff, then wait again. Still sounds
    like polling?

    Sounds like you want a hardware based futex.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Bonita Montero@3:633/280.2 to All on Wed Mar 27 20:18:47 2024
    Am 26.03.2024 um 21:30 schrieb Chris M. Thomasson:

    So, you timeout, check some other stuff, then wait again.
    Still sounds like polling?

    The checks only would occur if the cacheline containing the
    word actually was modified.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Thu Mar 28 02:09:57 2024
    On Tue, 26 Mar 2024 13:02:47 -0700
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> wrote:

    On 3/26/2024 3:12 AM, Bonita Montero wrote:
    Am 26.03.2024 um 03:48 schrieb Chris M. Thomasson:
    On 3/23/2024 11:38 PM, Bonita Montero wrote:
    Am 23.03.2024 um 21:58 schrieb Chris M. Thomasson:

    MWAIT?

    MWAIT has no timeout.


    Not sure how important it would be for MWAIT to have a timeout...
    You are referring to user space, right?

    MWAIT could be used for limited spinning like glibc's pthread_mutex
    is capable. The advantage of a MWAIT with timout would be much less interconnect-traffic compared to polling.


    MWAIT is meant to get around polling?

    I don't know what you mean by 'get around'.
    The main point of original Monitor/MWAIT is to allow to one SMT thread
    to do polling on memory address in a way that consumes almost no core's execution resources thus allowing to the other SMT thread(s) of the
    same core to run faster. The sort of more intelligent PAUSE.
    In the absence of other SMT threads the main advantage of polling
    loop with Monitor/MWAIT vs simple tight polling loop (STPL) is reduced
    power consumption.
    As far as cache coherence traffic (CCT) is concerned, Monitor/MWAIT
    polling loop provides virtually no advantage relatively to STPL. Both
    are quite efficient from CCT perspective, at least as long as programmer
    does not do anything stupid.

    Later on Intel invented 'MWAIT for Power Management' that has slightly different objectives. But that is O.T.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Thu Mar 28 06:58:50 2024
    On 3/27/2024 8:09 AM, Michael S wrote:
    On Tue, 26 Mar 2024 13:02:47 -0700
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> wrote:

    On 3/26/2024 3:12 AM, Bonita Montero wrote:
    Am 26.03.2024 um 03:48 schrieb Chris M. Thomasson:
    On 3/23/2024 11:38 PM, Bonita Montero wrote:
    Am 23.03.2024 um 21:58 schrieb Chris M. Thomasson:

    MWAIT?

    MWAIT has no timeout.


    Not sure how important it would be for MWAIT to have a timeout...
    You are referring to user space, right?

    MWAIT could be used for limited spinning like glibc's pthread_mutex
    is capable. The advantage of a MWAIT with timout would be much less
    interconnect-traffic compared to polling.


    MWAIT is meant to get around polling?

    I don't know what you mean by 'get around'.

    Turing a "hot" spin wait into a cooler one...

    ;^)


    The main point of original Monitor/MWAIT is to allow to one SMT thread
    to do polling on memory address in a way that consumes almost no core's execution resources thus allowing to the other SMT thread(s) of the
    same core to run faster. The sort of more intelligent PAUSE.
    In the absence of other SMT threads the main advantage of polling
    loop with Monitor/MWAIT vs simple tight polling loop (STPL) is reduced
    power consumption.
    As far as cache coherence traffic (CCT) is concerned, Monitor/MWAIT
    polling loop provides virtually no advantage relatively to STPL. Both
    are quite efficient from CCT perspective, at least as long as programmer
    does not do anything stupid.

    Later on Intel invented 'MWAIT for Power Management' that has slightly different objectives. But that is O.T.


    Indeed.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)