• How to smartly deal with SIGSEGV (hazard pointers)

    From Bonita Montero@3:633/10 to All on Mon Mar 9 14:44:18 2026
    Under Windows you won't need hazard pointer because you've structured
    exception handling (EXCEPTION_ACCESS_VIOLATION). This is possible with
    Posix also if you trap SIGSEGV and do a siglongjmp out of the handler.
    A problem with that is that these signal handlers are global and the
    thread itself only has its private signal mask.
    So I invented a little framework to handle signals with a thread-local
    signal handler. A global signal handler is established as a static
    object before main(). If an according signal (given as an int template paramter) is caught it dispatches to the thread-local handler.
    Here's the source of my little framework (yes, it's rather a framework
    and not a lib since it has a callback):

    #pragma once
    #include <cstdlib>
    #include <variant>
    #include <unistd.h>
    #include <signal.h>
    #include <setjmp.h>

    template<int SigNo>
    struct xsignal
    {
    static_assert(SigNo == SIGILL || SigNo == SIGFPE || SigNo == SIGSEGV ||
    SigNo == SIGBUS || SigNo == SIGTRAP, "only sychronous signals");
    using handler_fn = void (*)( int );
    using siginfo_handler_fn = void (*)( int, siginfo_t *, void * );
    using handler_variant = std::variant<int, handler_fn, siginfo_handler_fn>;
    xsignal( handler_variant fn );
    ~xsignal();
    void handler( handler_variant fn ) noexcept;
    void params( const sigset_t &set, int flags );
    private:
    handler_variant m_handlerBefore;
    inline static thread_local handler_variant t_handler;
    inline static struct init
    {
    init();
    ~init();
    void dummy() {}
    struct sigaction m_saBefore;
    } g_init;
    static void action( int sig, siginfo_t *info, void *uContext ) noexcept;
    };

    template<int SigNo>
    xsignal<SigNo>::xsignal( handler_variant fn ) :
    m_handlerBefore( t_handler )
    {
    (void)g_init; //enforce instantiation of g_init
    t_handler = fn;
    }

    template<int SigNo>
    inline xsignal<SigNo>::~xsignal()
    {
    t_handler = m_handlerBefore;
    }

    template<int SigNo>
    void xsignal<SigNo>::handler( handler_variant fn ) noexcept
    {
    t_handler = fn;
    }

    template<int SigNo>
    void xsignal<SigNo>::params( const sigset_t &set, int flags )
    {
    struct sigaction sa;
    sa.sa_sigaction = action;
    sa.sa_mask = set;
    sa.sa_flags = flags | SA_SIGINFO;
    sigaction( SigNo, &sa, nullptr );
    }

    template<int SigNo>
    xsignal<SigNo>::init::init()
    {
    using namespace std;
    struct sigaction sa;
    //sigemptyset( &sa.sa_mask );
    sigfillset( &sa.sa_mask );
    sa.sa_flags = SA_SIGINFO;
    sa.sa_sigaction = action;
    sigaction( SigNo, &sa, &m_saBefore );
    }

    template<int SigNo>
    xsignal<SigNo>::init::~init()
    {
    sigaction( SigNo, &m_saBefore, nullptr );
    }

    template<int SigNo>
    void xsignal<SigNo>::action( int sig, siginfo_t *info, void *uContext ) noexcept
    {
    if( holds_alternative<handler_fn>( t_handler ) )
    get<handler_fn>( t_handler )( sig );
    else if( holds_alternative<siginfo_handler_fn>( t_handler ) )
    get<siginfo_handler_fn>( t_handler )( sig, info, uContext );
    }

    This is an example program:

    #include <iostream>
    #include "xsignal.hpp"

    using namespace std;

    volatile int *volatile p = nullptr;

    int main()
    {
    thread_local jmp_buf jb;
    xsignal<SIGSEGV> threadSegv( +[]( int sig )
    {
    static const char PrintThis[] = "caught!\n";
    write( 1, PrintThis, sizeof PrintThis - 1 );
    siglongjmp( jb, 1 );
    } );
    if( int ret = sigsetjmp( jb, 1 ); !ret )
    {
    cout << "starting" << endl;
    *::p = 0;
    }
    else
    cout << "aborted" << endl;
    }

    --- PyGate Linux v1.5.12
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Chris M. Thomasson@3:633/10 to All on Mon Mar 9 15:36:05 2026
    On 3/9/2026 6:44 AM, Bonita Montero wrote:
    Under Windows you won't need hazard pointer because you've structured exception handling (EXCEPTION_ACCESS_VIOLATION). This is possible with
    Posix also if you trap SIGSEGV and do a siglongjmp out of the handler.
    A problem with that is that these signal handlers are global and the
    thread itself only has its private signal mask.
    So I invented a little framework to handle signals with a thread-local
    signal handler. A global signal handler is established as a static
    object before main(). If an according signal (given as an int template paramter) is caught it dispatches to the thread-local handler.
    Here's the source of my little framework (yes, it's rather a framework
    and not a lib since it has a callback):[...]

    Well, iirc SLIST uses SEH for its work when shit hits the fan on
    windows. However, its very specialized for the algorithm:

    https://learn.microsoft.com/en-us/windows/win32/sync/interlocked-singly-linked-lists

    I don't think it mentions it there, but SEH is used. I prefer proxy
    collectors over hazard pointers, but that's just me. Well, Joe knows.
    Hazard pointers basically need async membar behavior or else they eat
    your lunch with that damn #StoreLoad.

    --- PyGate Linux v1.5.12
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Bonita Montero@3:633/10 to All on Tue Mar 10 09:56:19 2026
    Am 09.03.2026 um 23:36 schrieb Chris M. Thomasson:

    I don't think it mentions it there, but SEH is used. I prefer proxy collectors over hazard pointers, but that's just me. Well, Joe knows.
    Hazard pointers basically need async membar behavior or else they eat
    your lunch with that damn #StoreLoad.

    I've developed a lockfree stack with SEH. With one thread it is
    lower than a SLIST, with two threads it is faster and with 32
    threads it is magnitudes faster:

    #pragma once
    #if defined(_WIN32)
    #include <Windows.h>
    #endif
    #include <atomic>

    template<typename T>
    struct lockfree_stack
    {
    struct node : T
    {
    using T::T;
    private:
    template<typename T>
    friend struct lockfree_stack;
    node *m_next;
    };
    void push( node *nd ) noexcept;
    node *pop() noexcept;
    private:
    struct head { node *ptr; size_t ctr; };
    static_assert(atomic_ref<head>::is_always_lock_free);
    alignas(2 * sizeof(size_t)) head m_head = { 0, 0 };
    };

    template<typename T>
    void lockfree_stack<T>::push( node *nd ) noexcept
    {
    using namespace std;
    atomic_ref ptr( m_head.ptr );
    atomic_ref ctr( m_head.ctr );
    head hdRef( ptr.load( memory_order_relaxed ), ctr.load( memory_order_relaxed ) );
    atomic_ref<head> aHead( m_head );
    do
    nd->m_next = (node *)hdRef.ptr;
    while( aHead.compare_exchange_strong( hdRef, head( nd, hdRef.ctr + 1 ),
    memory_order_release, memory_order_relaxed ) );
    }

    template<typename T>
    auto lockfree_stack<T>::pop() noexcept -> node *
    {
    using namespace std;
    atomic_ref ptr( m_head.ptr );
    atomic_ref ctr( m_head.ctr );
    head hdRef( ptr.load( memory_order_relaxed ), ctr.load( memory_order_relaxed ) );
    atomic_ref<head> aHead( m_head );
    for( ; ; )
    {
    node *next;
    if( !hdRef.ptr )
    return nullptr;
    __try
    {
    next = hdRef.ptr->m_next;
    }
    __except( EXCEPTION_EXECUTE_HANDLER )
    {
    hdRef = head( ptr.load( memory_order_relaxed ), ctr.load(
    memory_order_relaxed ) );
    continue;
    }
    if( aHead.compare_exchange_strong( hdRef, head( next, hdRef.ctr + 1 ),
    memory_order_acquire, memory_order_relaxed ) )
    return hdRef.ptr;
    }
    }

    In the next step I make this Posix'd with my scoped_signal.


    --- PyGate Linux v1.5.12
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Bonita Montero@3:633/10 to All on Tue Mar 10 10:43:00 2026
    Now it works with Posix an Win32:

    #pragma once
    #if defined(_WIN32)
    #include <Windows.h>
    #elif defined(__unix__)
    #include "signal_scope.hpp"
    #else
    #error
    #endif
    #include <atomic>
    #include <setjmp.h>

    template<typename T>
    struct lockfree_stack
    {
    struct node : T
    {
    using T::T;
    private:
    template<typename T2>
    friend struct lockfree_stack;
    node *m_next;
    };
    void push( node *nd ) noexcept;
    node *pop() noexcept;
    private:
    struct head { node *ptr; size_t ctr; };
    alignas(2 * sizeof(size_t)) head m_head = { 0, 0 };
    };

    template<typename T>
    void lockfree_stack<T>::push( node *nd ) noexcept
    {
    using namespace std;
    atomic_ref ptr( m_head.ptr );
    atomic_ref ctr( m_head.ctr );
    head hdRef( ptr.load( memory_order_relaxed ), ctr.load( memory_order_relaxed ) );
    atomic_ref<head> aHead( m_head );
    do
    nd->m_next = (node *)hdRef.ptr;
    while( aHead.compare_exchange_strong( hdRef, head( nd, hdRef.ctr + 1 ),
    memory_order_release, memory_order_relaxed ) );
    }

    template<typename T>
    auto lockfree_stack<T>::pop() noexcept -> node *
    {
    using namespace std;
    atomic_ref ptr( m_head.ptr );
    atomic_ref ctr( m_head.ctr );
    head hdRef( ptr.load( memory_order_relaxed ), ctr.load( memory_order_relaxed ) );
    atomic_ref<head> aHead( m_head );
    for( ; ; )
    {
    node *next;
    if( !hdRef.ptr )
    return nullptr;
    #if defined(_WIN32)
    __try
    {
    next = hdRef.ptr->m_next;
    }
    __except( EXCEPTION_EXECUTE_HANDLER )
    {
    hdRef = head( ptr.load( memory_order_relaxed ), ctr.load(
    memory_order_relaxed ) );
    continue;
    }
    #elif defined(__unix__)
    {
    thread_local jmp_buf jb;
    signal_scope<SIGSEGV> sigScope( +[]( int ) { siglongjmp( jb, 1 );
    return true; } );
    if( int ret = sigsetjmp( jb, 1 ); ret )
    continue;
    next = hdRef.ptr->m_next;
    }
    #endif
    if( aHead.compare_exchange_strong( hdRef, head( next, hdRef.ctr + 1 ),
    memory_order_acquire, memory_order_relaxed ) )
    return hdRef.ptr;
    }
    }


    --- PyGate Linux v1.5.12
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Tristan Wibberley@3:633/10 to All on Tue Mar 10 13:38:27 2026
    On 10/03/2026 09:43, Bonita Montero wrote:
    Am 09.03.2026 um 23:36 schrieb Chris M. Thomasson:

    I don't think it mentions it there, but SEH is used. I prefer proxy
    collectors over hazard pointers, but that's just me. Well, Joe knows.
    Hazard pointers basically need async membar behavior or else they eat
    your lunch with that damn #StoreLoad.

    I've developed a lockfree stack with SEH. With one thread it is
    lower than a SLIST, with two threads it is faster and with 32
    threads it is magnitudes faster:

    I searched for thread_local and didn't see it.

    thread_local gets basic structure into the linker and generalises to
    uctx's from older POSIX standards for those systems where uctx is
    green-like (fillet cut wrt. thread_local, eg linux) even if you have to
    exclude those systems that are fiber-like (steak-cut wrt. thread_local,
    as I understand it, eg bsd).

    Linker needs more intersections of such layout matters to get complete
    coverage of context variants.

    --
    Tristan Wibberley

    The message body is Copyright (C) 2026 Tristan Wibberley except
    citations and quotations noted. All Rights Reserved except that you may,
    of course, cite it academically giving credit to me, distribute it
    verbatim as part of a usenet system or its archives, and use it to
    promote my greatness and general superiority without misrepresentation
    of my opinions other than my opinion of my greatness and general
    superiority which you _may_ misrepresent. You definitely MAY NOT train
    any production AI system with it but you may train experimental AI that
    will only be used for evaluation of the AI methods it implements.


    --- PyGate Linux v1.5.12
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Bonita Montero@3:633/10 to All on Tue Mar 10 16:50:18 2026
    Am 10.03.2026 um 14:38 schrieb Tristan Wibberley:

    I searched for thread_local and didn't see it.

    For the Windows code it isn't necessary. The Unix-code needs thread
    -specific signal handling with SIGSEGV. This is done with scoped_signal. scoped_signal has a global handler for the signal and propagates the
    signal to a thread-local function pointer.

    thread_local gets basic structure into the linker and generalises to
    uctx's from older POSIX standards for those systems where uctx is
    green-like (fillet cut wrt. thread_local, eg linux) even if you have to exclude those systems that are fiber-like (steak-cut wrt. thread_local,
    as I understand it, eg bsd).

    thread_local works as C++11 defined it.



    --- PyGate Linux v1.5.12
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Tristan Wibberley@3:633/10 to All on Tue Mar 10 22:45:19 2026
    On 10/03/2026 15:50, Bonita Montero wrote:

    thread_local works as C++11 defined it.


    Depends on how standard your C++ is and whether your system defines a generalisation to beyond the C++ standard, the charter of this group
    isn't restricted to ISO standards AFAIK so you'll find reality creeping in.


    --
    Tristan Wibberley

    The message body is Copyright (C) 2026 Tristan Wibberley except
    citations and quotations noted. All Rights Reserved except that you may,
    of course, cite it academically giving credit to me, distribute it
    verbatim as part of a usenet system or its archives, and use it to
    promote my greatness and general superiority without misrepresentation
    of my opinions other than my opinion of my greatness and general
    superiority which you _may_ misrepresent. You definitely MAY NOT train
    any production AI system with it but you may train experimental AI that
    will only be used for evaluation of the AI methods it implements.


    --- PyGate Linux v1.5.12
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Chris M. Thomasson@3:633/10 to All on Tue Mar 10 16:59:58 2026
    On 3/10/2026 1:56 AM, Bonita Montero wrote:
    Am 09.03.2026 um 23:36 schrieb Chris M. Thomasson:

    I don't think it mentions it there, but SEH is used. I prefer proxy
    collectors over hazard pointers, but that's just me. Well, Joe knows.
    Hazard pointers basically need async membar behavior or else they eat
    your lunch with that damn #StoreLoad.

    I've developed a lockfree stack with SEH. With one thread it is
    lower than a SLIST, with two threads it is faster and with 32
    threads it is magnitudes faster:

    I need some more time to really examine it. Seems to look okay for now.
    Well, I cannot notice anything obviously wrong so far.

    Notice where you align?

    alignas(2 * sizeof(size_t)) head m_head = { 0, 0 };

    Well, I don't think a SLIST anchor is automatically aligned. You should
    align it on a l2 cache line boundary and pad it up to a l2 cache line.

    https://learn.microsoft.com/en-us/windows/win32/sync/using-singly-linked-lists

    notice:

    typedef struct _PROGRAM_ITEM {
    SLIST_ENTRY ItemEntry;
    ULONG Signature;
    } PROGRAM_ITEM, *PPROGRAM_ITEM;

    ? Notice _aligned_malloc?

    Also, have you checked that your compare_exchange_strong is using
    CMPXCHG8B on 32 bit systems and CMPXCHG16B on 64 bit systems? That used
    to piss me off when it said not always lock-free.

    Little shits.

    PROGRAM_ITEM needs to be aligned and padded, and PSLIST_HEADER needs to
    be aligned and padded.

    Also, how are you testing it? If you never delete any nodes, your SEH is useless. Also, so, well, what memory allocator are you using?



    #pragma once
    #if defined(_WIN32)
    ÿÿÿÿ#include <Windows.h>
    #endif
    #include <atomic>

    template<typename T>
    struct lockfree_stack
    {
    ÿÿÿÿstruct node : T
    ÿÿÿÿ{
    ÿÿÿÿÿÿÿ using T::T;
    ÿÿÿÿprivate:
    ÿÿÿÿÿÿÿ template<typename T>
    ÿÿÿÿÿÿÿ friend struct lockfree_stack;
    ÿÿÿÿÿÿÿ node *m_next;
    ÿÿÿÿ};
    ÿÿÿÿvoid push( node *nd ) noexcept;
    ÿÿÿÿnode *pop() noexcept;
    private:
    ÿÿÿÿstruct head { node *ptr; size_t ctr; };
    ÿÿÿÿstatic_assert(atomic_ref<head>::is_always_lock_free);
    ÿÿÿÿalignas(2 * sizeof(size_t)) head m_head = { 0, 0 };
    };

    template<typename T>
    void lockfree_stack<T>::push( node *nd ) noexcept
    {
    ÿÿÿÿusing namespace std;
    ÿÿÿÿatomic_ref ptr( m_head.ptr );
    ÿÿÿÿatomic_ref ctr( m_head.ctr );
    ÿÿÿÿhead hdRef( ptr.load( memory_order_relaxed ),
    ctr.load( memory_order_relaxed ) );
    ÿÿÿÿatomic_ref<head> aHead( m_head );
    ÿÿÿÿdo
    ÿÿÿÿÿÿÿ nd->m_next = (node *)hdRef.ptr;
    ÿÿÿÿwhile( aHead.compare_exchange_strong( hdRef, head( nd, hdRef.ctr +
    1 ), memory_order_release, memory_order_relaxed ) );
    }

    template<typename T>
    auto lockfree_stack<T>::pop() noexcept -> node *
    {
    ÿÿÿÿusing namespace std;
    ÿÿÿÿatomic_ref ptr( m_head.ptr );
    ÿÿÿÿatomic_ref ctr( m_head.ctr );
    ÿÿÿÿhead hdRef( ptr.load( memory_order_relaxed ),
    ctr.load( memory_order_relaxed ) );
    ÿÿÿÿatomic_ref<head> aHead( m_head );
    ÿÿÿÿfor( ; ; )
    ÿÿÿÿ{
    ÿÿÿÿÿÿÿ node *next;
    ÿÿÿÿÿÿÿ if( !hdRef.ptr )
    ÿÿÿÿÿÿÿÿÿÿÿ return nullptr;
    ÿÿÿÿÿÿÿ __try
    ÿÿÿÿÿÿÿ {
    ÿÿÿÿÿÿÿÿÿÿÿ next = hdRef.ptr->m_next;
    ÿÿÿÿÿÿÿ }
    ÿÿÿÿÿÿÿ __except( EXCEPTION_EXECUTE_HANDLER )
    ÿÿÿÿÿÿÿ {
    ÿÿÿÿÿÿÿÿÿÿÿ hdRef = head( ptr.load( memory_order_relaxed ),
    ctr.load( memory_order_relaxed ) );
    ÿÿÿÿÿÿÿÿÿÿÿ continue;
    ÿÿÿÿÿÿÿ }
    ÿÿÿÿÿÿÿ if( aHead.compare_exchange_strong( hdRef, head( next, hdRef.ctr
    + 1 ), memory_order_acquire, memory_order_relaxed ) )
    ÿÿÿÿÿÿÿÿÿÿÿ return hdRef.ptr;
    ÿÿÿÿ}
    }

    In the next step I make this Posix'd with my scoped_signal.



    --- PyGate Linux v1.5.12
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Chris M. Thomasson@3:633/10 to All on Tue Mar 10 17:27:21 2026
    On 3/9/2026 3:36 PM, Chris M. Thomasson wrote:
    On 3/9/2026 6:44 AM, Bonita Montero wrote:
    Under Windows you won't need hazard pointer because you've structured
    exception handling (EXCEPTION_ACCESS_VIOLATION). This is possible with
    Posix also if you trap SIGSEGV and do a siglongjmp out of the handler.
    A problem with that is that these signal handlers are global and the
    thread itself only has its private signal mask.
    So I invented a little framework to handle signals with a thread-local
    signal handler. A global signal handler is established as a static
    object before main(). If an according signal (given as an int template
    paramter) is caught it dispatches to the thread-local handler.
    Here's the source of my little framework (yes, it's rather a framework
    and not a lib since it has a callback):[...]

    Well, iirc SLIST uses SEH for its work when shit hits the fan on
    windows. However, its very specialized for the algorithm:

    https://learn.microsoft.com/en-us/windows/win32/sync/interlocked-singly- linked-lists

    I don't think it mentions it there, but SEH is used. I prefer proxy collectors over hazard pointers, but that's just me. Well, Joe knows.
    Hazard pointers basically need async membar behavior or else they eat
    your lunch with that damn #StoreLoad.

    Another reason that proxy collectors are useful... Say you wanted to
    iterate that lock-free stack. Not sure I would want SEH there. All the
    nodes should be alive. Using SEH for the pop is, iirc, what SLIST does,
    but, well...

    --- PyGate Linux v1.5.12
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Chris M. Thomasson@3:633/10 to All on Tue Mar 10 18:24:48 2026
    On 3/10/2026 1:56 AM, Bonita Montero wrote:
    Am 09.03.2026 um 23:36 schrieb Chris M. Thomasson:

    I don't think it mentions it there, but SEH is used. I prefer proxy
    collectors over hazard pointers, but that's just me. Well, Joe knows.
    Hazard pointers basically need async membar behavior or else they eat
    your lunch with that damn #StoreLoad.

    I've developed a lockfree stack with SEH. With one thread it is
    lower than a SLIST, with two threads it is faster and with 32
    threads it is magnitudes faster:

    #pragma once
    #if defined(_WIN32)
    ÿÿÿÿ#include <Windows.h>
    #endif
    #include <atomic>

    template<typename T>
    struct lockfree_stack
    {
    ÿÿÿÿstruct node : T
    ÿÿÿÿ{
    ÿÿÿÿÿÿÿ using T::T;
    ÿÿÿÿprivate:
    ÿÿÿÿÿÿÿ template<typename T>
    ÿÿÿÿÿÿÿ friend struct lockfree_stack;
    ÿÿÿÿÿÿÿ node *m_next;
    ÿÿÿÿ};
    ÿÿÿÿvoid push( node *nd ) noexcept;
    ÿÿÿÿnode *pop() noexcept;
    private:
    ÿÿÿÿstruct head { node *ptr; size_t ctr; };
    ÿÿÿÿstatic_assert(atomic_ref<head>::is_always_lock_free);
    ÿÿÿÿalignas(2 * sizeof(size_t)) head m_head = { 0, 0 };
    };

    template<typename T>
    void lockfree_stack<T>::push( node *nd ) noexcept
    {
    ÿÿÿÿusing namespace std;
    ÿÿÿÿatomic_ref ptr( m_head.ptr );
    ÿÿÿÿatomic_ref ctr( m_head.ctr );
    ÿÿÿÿhead hdRef( ptr.load( memory_order_relaxed ),
    ctr.load( memory_order_relaxed ) );
    ÿÿÿÿatomic_ref<head> aHead( m_head );
    ÿÿÿÿdo
    ÿÿÿÿÿÿÿ nd->m_next = (node *)hdRef.ptr;
    ÿÿÿÿwhile( aHead.compare_exchange_strong( hdRef, head( nd, hdRef.ctr +
    1 ), memory_order_release, memory_order_relaxed ) );
    }

    template<typename T>
    auto lockfree_stack<T>::pop() noexcept -> node *
    {
    ÿÿÿÿusing namespace std;
    ÿÿÿÿatomic_ref ptr( m_head.ptr );
    ÿÿÿÿatomic_ref ctr( m_head.ctr );
    ÿÿÿÿhead hdRef( ptr.load( memory_order_relaxed ),
    ctr.load( memory_order_relaxed ) );
    ÿÿÿÿatomic_ref<head> aHead( m_head );
    ÿÿÿÿfor( ; ; )
    ÿÿÿÿ{
    ÿÿÿÿÿÿÿ node *next;
    ÿÿÿÿÿÿÿ if( !hdRef.ptr )
    ÿÿÿÿÿÿÿÿÿÿÿ return nullptr;
    ÿÿÿÿÿÿÿ __try
    ÿÿÿÿÿÿÿ {
    ÿÿÿÿÿÿÿÿÿÿÿ next = hdRef.ptr->m_next;
    ÿÿÿÿÿÿÿ }
    ÿÿÿÿÿÿÿ __except( EXCEPTION_EXECUTE_HANDLER )
    ÿÿÿÿÿÿÿ {
    ÿÿÿÿÿÿÿÿÿÿÿ hdRef = head( ptr.load( memory_order_relaxed ),
    ctr.load( memory_order_relaxed ) );
    ÿÿÿÿÿÿÿÿÿÿÿ continue;
    ÿÿÿÿÿÿÿ }
    ÿÿÿÿÿÿÿ if( aHead.compare_exchange_strong( hdRef, head( next, hdRef.ctr
    + 1 ), memory_order_acquire, memory_order_relaxed ) )
    ÿÿÿÿÿÿÿÿÿÿÿ return hdRef.ptr;
    ÿÿÿÿ}
    }

    In the next step I make this Posix'd with my scoped_signal.


    Actually, you should keep a counter there just for testing/profiling.
    Bump it every time the exception handler triggers and does its thing.

    Might be interesting?

    --- PyGate Linux v1.5.12
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Bonita Montero@3:633/10 to All on Wed Mar 11 05:51:44 2026
    Am 10.03.2026 um 23:45 schrieb Tristan Wibberley:

    Depends on how standard your C++ is and whether your system defines a generalisation to beyond the C++ standard, the charter of this group
    isn't restricted to ISO standards AFAIK so you'll find reality creeping in.

    Which compiler is non-conforming in that sense ?

    --- PyGate Linux v1.5.12
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Bonita Montero@3:633/10 to All on Wed Mar 11 06:00:09 2026
    Am 11.03.2026 um 00:59 schrieb Chris M. Thomasson:

    Notice where you align?

    alignas(2 * sizeof(size_t)) head m_head = { 0, 0 };

    Because atomic_ref makes a runtime-check about that.

    Well, I don't think a SLIST anchor is automatically aligned. ...

    On ARM it should.

    ? Notice _aligned_malloc?

    Not necessary here; the head is just a class.

    Also, have you checked that your compare_exchange_strong is using
    CMPXCHG8B on 32 bit systems and CMPXCHG16B on 64 bit systems? That
    used to piss me off when it said not always lock-free.

    Both are lock-free for sure.



    --- PyGate Linux v1.5.12
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Chris M. Thomasson@3:633/10 to All on Wed Mar 11 12:28:23 2026
    On 3/10/2026 10:00 PM, Bonita Montero wrote:
    Am 11.03.2026 um 00:59 schrieb Chris M. Thomasson:

    Notice where you align?

    alignas(2 * sizeof(size_t)) head m_head = { 0, 0 };

    Because atomic_ref makes a runtime-check about that.

    Well, I don't think a SLIST anchor is automatically aligned. ...

    On ARM it should.

    Hopefully. ;^)


    ? Notice _aligned_malloc?

    Not necessary here; the head is just a class.

    Just to be tidy, so to speak. Pad _and_ align the head on a l2 cache
    line, even with SLIST_HEADER. Completely isolate it. Also, for your
    nodes, ditto.


    Also, have you checked that your compare_exchange_strong is using
    CMPXCHG8B on 32 bit systems and CMPXCHG16B on 64 bit systems? That
    used to piss me off when it said not always lock-free.

    Both are lock-free for sure.

    Good! What does the lock-free check from C++ give you? Ala is_always_lock_free? Some compilers scare me iirc with sometimes
    lock-free... Shit happens.

    Also, for fun... Keep a counter in a "debug" or "profile" mode of your
    code. It simply counts how many times the SEH fired. You are making me reminisce about the old days... ;^)

    --- PyGate Linux v1.5.12
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Bonita Montero@3:633/10 to All on Thu Mar 12 06:41:08 2026
    Am 11.03.2026 um 20:28 schrieb Chris M. Thomasson:

    Just to be tidy, so to speak. Pad _and_ align the head on a l2 cache
    line, even with SLIST_HEADER. Completely isolate it. Also, for your
    nodes, ditto.

    It's just enough not to stride two cachlines. With my Zen4-CPU there's
    a performance penalty only of the word strides page boundaries.

    Good! What does the lock-free check from C++ give you?

    I trust in cl, clang-cl, g++ and clang++.
    All do it correctly.

    Also, for fun... Keep a counter in a "debug" or "profile" mode of your
    code. It simply counts how many times the SEH fired. ...
    I didn't test that since the code is trivial.

    --- PyGate Linux v1.5.12
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Chris M. Thomasson@3:633/10 to All on Wed Mar 11 23:30:20 2026
    On 3/11/2026 10:41 PM, Bonita Montero wrote:
    Am 11.03.2026 um 20:28 schrieb Chris M. Thomasson:

    Just to be tidy, so to speak. Pad _and_ align the head on a l2 cache
    line, even with SLIST_HEADER. Completely isolate it. Also, for your
    nodes, ditto.

    It's just enough not to stride two cachlines. With my Zen4-CPU there's
    a performance penalty only of the word strides page boundaries.

    Well, align it on a boundary and pad it up to a boundary. That way you
    know your head, or anchor, is isolated. Also, for your nodes... Each
    node should be its own l2 cacheline, properly aligned and padded. This
    goes for using SLIST_HEADER as well...


    Good! What does the lock-free check from C++ give you?

    I trust in cl, clang-cl, g++ and clang++.
    All do it correctly.

    Did you do a disassembly? On a 64 bit system it should boil down to LOCK CMPXCHG16B and report alway lock free = true. :^)

    Also, for fun... Keep a counter in a "debug" or "profile" mode of your
    code. It simply counts how many times the SEH fired. ...

    I didn't test that since the code is trivial.

    You should do it. Does it get above zero in your tests?

    --- PyGate Linux v1.5.12
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Bonita Montero@3:633/10 to All on Thu Mar 12 08:36:07 2026
    Am 12.03.2026 um 07:30 schrieb Chris M. Thomasson:

    Well, align it on a boundary and pad it up to a boundary.

    I align it and the padding comes automatically since it's always
    eight or sixteen bytes.

    That way you know your head, or anchor, is isolated. Also, for your nodes... Each
    node should be its own l2 cacheline, properly aligned and padded. This
    goes for using SLIST_HEADER as well...

    That's the duty of the person pushing or popping the nodes.
    With modern memory allocators (jemalloc, mimalloc, tcmalloc)
    the alignment comes by themselfes.

    Did you do a disassembly? On a 64 bit system it should boil down to LOCK CMPXCHG16B and report alway lock free = true. :^)

    Yes, of course that works. The problem with atomic<int128_type> with
    MSVC was that 128 bit operations are available only with atomic_ref
    for backward compatibility reasons.

    You should do it. Does it get above zero in your tests?

    It's only one line of code with less than 20 characters.

    --- PyGate Linux v1.5.12
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Bonita Montero@3:633/10 to All on Thu Mar 12 08:46:39 2026
    Am 12.03.2026 um 08:36 schrieb Bonita Montero:

    You should do it. Does it get above zero in your tests?

    It's only one line of code with less than 20 characters.

    It's only that line "next = hdRef.ptr->m_next;" which is
    encapsulated in a __try / __except.


    --- PyGate Linux v1.5.12
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Chris M. Thomasson@3:633/10 to All on Thu Mar 12 16:05:00 2026
    On 3/12/2026 12:46 AM, Bonita Montero wrote:
    Am 12.03.2026 um 08:36 schrieb Bonita Montero:

    You should do it. Does it get above zero in your tests?

    It's only one line of code with less than 20 characters.

    It's only that line "next = hdRef.ptr->m_next;" which is
    encapsulated in a __try / __except.


    Right. Just count how many times it get triggered? It should give you an insight...

    --- PyGate Linux v1.5.12
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Bonita Montero@3:633/10 to All on Fri Mar 13 01:20:42 2026
    Am 13.03.2026 um 00:05 schrieb Chris M. Thomasson:

    Right. Just count how many times it get triggered? It should give you an insight...

    No matter how often since it's clear that this is a rare condition
    with today's memory-allocation.

    --- PyGate Linux v1.5.12
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Chris M. Thomasson@3:633/10 to All on Thu Mar 12 21:27:32 2026
    On 3/12/2026 5:20 PM, Bonita Montero wrote:
    Am 13.03.2026 um 00:05 schrieb Chris M. Thomasson:

    Right. Just count how many times it get triggered? It should give you
    an insight...

    No matter how often since it's clear that this is a rare condition
    with today's memory-allocation.

    It should be rare, but... Well, add a counter anyway. Just to see how
    far it goes over zero... ;^)

    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Bonita Montero@3:633/10 to All on Fri Mar 13 06:13:49 2026
    Am 13.03.2026 um 05:27 schrieb Chris M. Thomasson:

    It should be rare, but... Well, add a counter anyway.
    Just to see how far it goes over zero... ;^)

    You can't simulate that without writing a large program with
    a specific allocation scheme.

    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Chris M. Thomasson@3:633/10 to All on Fri Mar 13 01:33:19 2026
    On 3/12/2026 10:13 PM, Bonita Montero wrote:
    Am 13.03.2026 um 05:27 schrieb Chris M. Thomasson:

    It should be rare, but... Well, add a counter anyway.
    Just to see how far it goes over zero... ;^)

    You can't simulate that without writing a large program with
    a specific allocation scheme.

    Well, you need to try? See what that counter gets to after say, a 5 hour
    run of your system.

    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Bonita Montero@3:633/10 to All on Fri Mar 13 10:38:59 2026
    Am 13.03.2026 um 09:33 schrieb Chris M. Thomasson:
    On 3/12/2026 10:13 PM, Bonita Montero wrote:
    Am 13.03.2026 um 05:27 schrieb Chris M. Thomasson:

    It should be rare, but... Well, add a counter anyway.
    Just to see how far it goes over zero... ;^)

    You can't simulate that without writing a large program with
    a specific allocation scheme.

    Well, you need to try? See what that counter gets to after say, a 5 hour
    run of your system.

    It's very unlikely that the counter comes back to the same value and
    the pointer also. With 32 bit it's 2 ^ 32 iterations and the pointer
    must also have the same value. With 64 bit it's 2 ^ 64, i.e. 1,8E19
    rounds, so you can assume that this never happens also.

    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Chris M. Thomasson@3:633/10 to All on Fri Mar 13 11:49:40 2026
    On 3/13/2026 2:38 AM, Bonita Montero wrote:
    Am 13.03.2026 um 09:33 schrieb Chris M. Thomasson:
    On 3/12/2026 10:13 PM, Bonita Montero wrote:
    Am 13.03.2026 um 05:27 schrieb Chris M. Thomasson:

    It should be rare, but... Well, add a counter anyway.
    Just to see how far it goes over zero... ;^)

    You can't simulate that without writing a large program with
    a specific allocation scheme.

    Well, you need to try? See what that counter gets to after say, a 5
    hour run of your system.

    It's very unlikely that the counter comes back to the same value and
    the pointer also. With 32 bit it's 2 ^ 32 iterations and the pointer
    must also have the same value. With 64 bit it's 2 ^ 64, i.e. 1,8E19
    rounds, so you can assume that this never happens also.

    It can occur. You can make a special setup to artificially trigger the
    SEH to fire?

    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Chris M. Thomasson@3:633/10 to All on Fri Mar 13 11:50:35 2026
    On 3/13/2026 11:49 AM, Chris M. Thomasson wrote:
    On 3/13/2026 2:38 AM, Bonita Montero wrote:
    Am 13.03.2026 um 09:33 schrieb Chris M. Thomasson:
    On 3/12/2026 10:13 PM, Bonita Montero wrote:
    Am 13.03.2026 um 05:27 schrieb Chris M. Thomasson:

    It should be rare, but... Well, add a counter anyway.
    Just to see how far it goes over zero... ;^)

    You can't simulate that without writing a large program with
    a specific allocation scheme.

    Well, you need to try? See what that counter gets to after say, a 5
    hour run of your system.

    It's very unlikely that the counter comes back to the same value and
    the pointer also. With 32 bit it's 2 ^ 32 iterations and the pointer
    must also have the same value. With 64 bit it's 2 ^ 64, i.e. 1,8E19
    rounds, so you can assume that this never happens also.

    It can occur. You can make a special setup to artificially trigger the
    SEH to fire?

    Architect a special condition.

    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Chris M. Thomasson@3:633/10 to All on Fri Mar 13 11:53:21 2026
    On 3/13/2026 11:50 AM, Chris M. Thomasson wrote:
    On 3/13/2026 11:49 AM, Chris M. Thomasson wrote:
    On 3/13/2026 2:38 AM, Bonita Montero wrote:
    Am 13.03.2026 um 09:33 schrieb Chris M. Thomasson:
    On 3/12/2026 10:13 PM, Bonita Montero wrote:
    Am 13.03.2026 um 05:27 schrieb Chris M. Thomasson:

    It should be rare, but... Well, add a counter anyway.
    Just to see how far it goes over zero... ;^)

    You can't simulate that without writing a large program with
    a specific allocation scheme.

    Well, you need to try? See what that counter gets to after say, a 5
    hour run of your system.

    It's very unlikely that the counter comes back to the same value and
    the pointer also. With 32 bit it's 2 ^ 32 iterations and the pointer
    must also have the same value. With 64 bit it's 2 ^ 64, i.e. 1,8E19
    rounds, so you can assume that this never happens also.

    It can occur. You can make a special setup to artificially trigger the
    SEH to fire?

    Architect a special condition.

    If the counter is non-zero, you know that SEH fired and corrected
    things... Fair enough?

    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Bonita Montero@3:633/10 to All on Sun Mar 15 10:51:28 2026
    Am 13.03.2026 um 19:49 schrieb Chris M. Thomasson:

    On 3/13/2026 2:38 AM, Bonita Montero wrote:

    It's very unlikely that the counter comes back to the same value and
    the pointer also. With 32 bit it's 2 ^ 32 iterations and the pointer
    must also have the same value. With 64 bit it's 2 ^ 64, i.e. 1,8E19
    rounds, so you can assume that this never happens also.

    It can occur. You can make a special setup to artificially trigger
    the SEH to fire?

    It's obvious where and why the access violation may happen. It's easy
    to test that with a simple nullpointer-assignment. Taken that code you
    can combine the next-pointer read with the trap-catching code. There's
    no need to test that, it's only one CPU instruction that may fail here.


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Chris M. Thomasson@3:633/10 to All on Sun Mar 15 13:58:44 2026
    On 3/15/2026 2:51 AM, Bonita Montero wrote:
    Am 13.03.2026 um 19:49 schrieb Chris M. Thomasson:

    On 3/13/2026 2:38 AM, Bonita Montero wrote:

    It's very unlikely that the counter comes back to the same value and
    the pointer also. With 32 bit it's 2 ^ 32 iterations and the pointer
    must also have the same value. With 64 bit it's 2 ^ 64, i.e. 1,8E19
    rounds, so you can assume that this never happens also.

    It can occur. You can make a special setup to artificially trigger
    the SEH to fire?

    It's obvious where and why the access violation may happen. It's easy
    to test that with a simple nullpointer-assignment. Taken that code you
    can combine the next-pointer read with the trap-catching code. There's
    no need to test that, it's only one CPU instruction that may fail here.


    See how many times it trips during an "intense simulation" for fun?
    Build a program, let it run for say, 12 hours. And see how many times
    the SEH fired. You are making me think about the good ol' days. Thanks.

    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Tristan Wibberley@3:633/10 to All on Mon Mar 16 07:41:28 2026
    On 11/03/2026 04:51, Bonita Montero wrote:
    Am 10.03.2026 um 23:45 schrieb Tristan Wibberley:

    Depends on how standard your C++ is and whether your system defines a
    generalisation to beyond the C++ standard, the charter of this group
    isn't restricted to ISO standards AFAIK so you'll find reality
    creeping in.

    Which compiler is non-conforming in that sense ?


    I don't know post-C++-11 so well so I couldn't say whether a compiler is non-conforming in that respect. uctx facilities are platform library facilities on POSIX systems (I think it's POSIX pre-late-2000s).

    I am trusting descriptions I've received of the difference between Linux
    and BSD uctx; I recall testing the linux uctx wrt thread_local and found
    it behaved as I expect and need but I ran out of time to experiment with
    the BSDs and Windows Fibers to check their behaviours for sure.

    BSDs were reputed to have one of the two behaviours and Linux the other.

    IIRC from some years ago the Linux behaviour is the intuitive one:
    thread_local follows the context of execution ie the traditional meaning
    of "thread" from ages before C++ began to be standardised. An apparent C
    stack and unwinding breadcrumb-trail evolving with function calls and
    returns. When you switch away from a uctx on one POSIX thread on Linux
    and switch back to it on another POSIX thread you get the same
    thread_local object.

    I understand that the BSD behaviour is such that the word "thread" in thread_local refers to the O/S task scheduler notion of a "POSIX thread" specifically and distinctively, so when you switch away from a uctx on
    one POSIX thread on a BSD and switch back to it on another POSIX thread
    you get different thread_local objects.

    I suppose the behaviours would be seen with GCC on both platforms.

    --
    Tristan Wibberley

    The message body is Copyright (C) 2026 Tristan Wibberley except
    citations and quotations noted. All Rights Reserved except that you may,
    of course, cite it academically giving credit to me, distribute it
    verbatim as part of a usenet system or its archives, and use it to
    promote my greatness and general superiority without misrepresentation
    of my opinions other than my opinion of my greatness and general
    superiority which you _may_ misrepresent. You definitely MAY NOT train
    any production AI system with it but you may train experimental AI that
    will only be used for evaluation of the AI methods it implements.


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Bonita Montero@3:633/10 to All on Mon Mar 16 09:06:54 2026
    Am 15.03.2026 um 21:58 schrieb Chris M. Thomasson:

    See how many times it trips during an "intense simulation" for fun?
    Build a program, let it run for say, 12 hours. And see how many times
    the SEH fired. You are making me think about the good ol' days. Thanks.

    It's not possible to simulate that with the same environment like in
    a real application since the allocation and dellocation behaviour of
    a real application is highly dependent on the surrounding code.


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Chris M. Thomasson@3:633/10 to All on Mon Mar 16 12:03:37 2026
    On 3/16/2026 1:06 AM, Bonita Montero wrote:
    Am 15.03.2026 um 21:58 schrieb Chris M. Thomasson:

    See how many times it trips during an "intense simulation" for fun?
    Build a program, let it run for say, 12 hours. And see how many times
    the SEH fired. You are making me think about the good ol' days. Thanks.

    It's not possible to simulate that with the same environment like in
    a real application since the allocation and dellocation behaviour of
    a real application is highly dependent on the surrounding code.


    Well, I have seen many programs over the years that call new for a node;
    push it; pop a node; read its data and delete it in a thread pool with producers and consumers.

    That is not efficient, but yet they were there.

    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Tristan Wibberley@3:633/10 to All on Tue Mar 17 00:04:25 2026
    On 16/03/2026 19:03, Chris M. Thomasson wrote:
    Well, I have seen many programs over the years that call new for a node;
    push it; pop a node; read its data and delete it in a thread pool with producers and consumers.

    That is not efficient, but yet they were there.

    overloaded operator new?

    --
    Tristan Wibberley

    The message body is Copyright (C) 2026 Tristan Wibberley except
    citations and quotations noted. All Rights Reserved except that you may,
    of course, cite it academically giving credit to me, distribute it
    verbatim as part of a usenet system or its archives, and use it to
    promote my greatness and general superiority without misrepresentation
    of my opinions other than my opinion of my greatness and general
    superiority which you _may_ misrepresent. You definitely MAY NOT train
    any production AI system with it but you may train experimental AI that
    will only be used for evaluation of the AI methods it implements.


    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Chris M. Thomasson@3:633/10 to All on Mon Mar 16 18:13:19 2026
    On 3/16/2026 5:04 PM, Tristan Wibberley wrote:
    On 16/03/2026 19:03, Chris M. Thomasson wrote:
    Well, I have seen many programs over the years that call new for a node;
    push it; pop a node; read its data and delete it in a thread pool with
    producers and consumers.

    That is not efficient, but yet they were there.

    overloaded operator new?


    No. Just new and delete. Iirc, some of them were hooked up to custom allocators. Remember that special delete[] for arrays that had meta data
    for the size of the array?

    --- PyGate Linux v1.5.13
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)