Forum: d0p3 BBS

How to smartly deal with SIGSEGV (hazard pointers)

From Bonita Montero@3:633/10 to All on Mon Mar 9 14:44:18 2026

Under Windows you won't need hazard pointer because you've structured
exception handling (EXCEPTION_ACCESS_VIOLATION). This is possible with
Posix also if you trap SIGSEGV and do a siglongjmp out of the handler.
A problem with that is that these signal handlers are global and the
thread itself only has its private signal mask.
So I invented a little framework to handle signals with a thread-local
signal handler. A global signal handler is established as a static
object before main(). If an according signal (given as an int template paramter) is caught it dispatches to the thread-local handler.
Here's the source of my little framework (yes, it's rather a framework
and not a lib since it has a callback):

#pragma once
#include <cstdlib>
#include <variant>
#include <unistd.h>
#include <signal.h>
#include <setjmp.h>

template<int SigNo>
struct xsignal
{
static_assert(SigNo == SIGILL || SigNo == SIGFPE || SigNo == SIGSEGV ||
SigNo == SIGBUS || SigNo == SIGTRAP, "only sychronous signals");
using handler_fn = void (*)( int );
using siginfo_handler_fn = void (*)( int, siginfo_t *, void * );
using handler_variant = std::variant<int, handler_fn, siginfo_handler_fn>;
xsignal( handler_variant fn );
~xsignal();
void handler( handler_variant fn ) noexcept;
void params( const sigset_t &set, int flags );
private:
handler_variant m_handlerBefore;
inline static thread_local handler_variant t_handler;
inline static struct init
{
init();
~init();
void dummy() {}
struct sigaction m_saBefore;
} g_init;
static void action( int sig, siginfo_t *info, void *uContext ) noexcept;
};

template<int SigNo>
xsignal<SigNo>::xsignal( handler_variant fn ) :
m_handlerBefore( t_handler )
{
(void)g_init; //enforce instantiation of g_init
t_handler = fn;
}

template<int SigNo>
inline xsignal<SigNo>::~xsignal()
{
t_handler = m_handlerBefore;
}

template<int SigNo>
void xsignal<SigNo>::handler( handler_variant fn ) noexcept
{
t_handler = fn;
}

template<int SigNo>
void xsignal<SigNo>::params( const sigset_t &set, int flags )
{
struct sigaction sa;
sa.sa_sigaction = action;
sa.sa_mask = set;
sa.sa_flags = flags | SA_SIGINFO;
sigaction( SigNo, &sa, nullptr );
}

template<int SigNo>
xsignal<SigNo>::init::init()
{
using namespace std;
struct sigaction sa;
//sigemptyset( &sa.sa_mask );
sigfillset( &sa.sa_mask );
sa.sa_flags = SA_SIGINFO;
sa.sa_sigaction = action;
sigaction( SigNo, &sa, &m_saBefore );
}

template<int SigNo>
xsignal<SigNo>::init::~init()
{
sigaction( SigNo, &m_saBefore, nullptr );
}

template<int SigNo>
void xsignal<SigNo>::action( int sig, siginfo_t *info, void *uContext ) noexcept
{
if( holds_alternative<handler_fn>( t_handler ) )
get<handler_fn>( t_handler )( sig );
else if( holds_alternative<siginfo_handler_fn>( t_handler ) )
get<siginfo_handler_fn>( t_handler )( sig, info, uContext );
}

This is an example program:

#include <iostream>
#include "xsignal.hpp"

using namespace std;

volatile int *volatile p = nullptr;

int main()
{
thread_local jmp_buf jb;
xsignal<SIGSEGV> threadSegv( +[]( int sig )
{
static const char PrintThis[] = "caught!\n";
write( 1, PrintThis, sizeof PrintThis - 1 );
siglongjmp( jb, 1 );
} );
if( int ret = sigsetjmp( jb, 1 ); !ret )
{
cout << "starting" << endl;
*::p = 0;
}
else
cout << "aborted" << endl;
}

--- PyGate Linux v1.5.12
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Chris M. Thomasson@3:633/10 to All on Mon Mar 9 15:36:05 2026

On 3/9/2026 6:44 AM, Bonita Montero wrote:

Under Windows you won't need hazard pointer because you've structured exception handling (EXCEPTION_ACCESS_VIOLATION). This is possible with
Posix also if you trap SIGSEGV and do a siglongjmp out of the handler.
A problem with that is that these signal handlers are global and the
thread itself only has its private signal mask.
So I invented a little framework to handle signals with a thread-local
signal handler. A global signal handler is established as a static
object before main(). If an according signal (given as an int template paramter) is caught it dispatches to the thread-local handler.
Here's the source of my little framework (yes, it's rather a framework
and not a lib since it has a callback):[...]

Well, iirc SLIST uses SEH for its work when shit hits the fan on
windows. However, its very specialized for the algorithm:

https://learn.microsoft.com/en-us/windows/win32/sync/interlocked-singly-linked-lists

I don't think it mentions it there, but SEH is used. I prefer proxy
collectors over hazard pointers, but that's just me. Well, Joe knows.
Hazard pointers basically need async membar behavior or else they eat
your lunch with that damn #StoreLoad.

--- PyGate Linux v1.5.12
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bonita Montero@3:633/10 to All on Tue Mar 10 09:56:19 2026

Am 09.03.2026 um 23:36 schrieb Chris M. Thomasson:

I don't think it mentions it there, but SEH is used. I prefer proxy collectors over hazard pointers, but that's just me. Well, Joe knows.
Hazard pointers basically need async membar behavior or else they eat
your lunch with that damn #StoreLoad.

I've developed a lockfree stack with SEH. With one thread it is
lower than a SLIST, with two threads it is faster and with 32
threads it is magnitudes faster:

#pragma once
#if defined(_WIN32)
#include <Windows.h>
#endif
#include <atomic>

template<typename T>
struct lockfree_stack
{
struct node : T
{
using T::T;
private:
template<typename T>
friend struct lockfree_stack;
node *m_next;
};
void push( node *nd ) noexcept;
node *pop() noexcept;
private:
struct head { node *ptr; size_t ctr; };
static_assert(atomic_ref<head>::is_always_lock_free);
alignas(2 * sizeof(size_t)) head m_head = { 0, 0 };
};

template<typename T>
void lockfree_stack<T>::push( node *nd ) noexcept
{
using namespace std;
atomic_ref ptr( m_head.ptr );
atomic_ref ctr( m_head.ctr );
head hdRef( ptr.load( memory_order_relaxed ), ctr.load( memory_order_relaxed ) );
atomic_ref<head> aHead( m_head );
do
nd->m_next = (node *)hdRef.ptr;
while( aHead.compare_exchange_strong( hdRef, head( nd, hdRef.ctr + 1 ),
memory_order_release, memory_order_relaxed ) );
}

template<typename T>
auto lockfree_stack<T>::pop() noexcept -> node *
{
using namespace std;
atomic_ref ptr( m_head.ptr );
atomic_ref ctr( m_head.ctr );
head hdRef( ptr.load( memory_order_relaxed ), ctr.load( memory_order_relaxed ) );
atomic_ref<head> aHead( m_head );
for( ; ; )
{
node *next;
if( !hdRef.ptr )
return nullptr;
__try
{
next = hdRef.ptr->m_next;
}
__except( EXCEPTION_EXECUTE_HANDLER )
{
hdRef = head( ptr.load( memory_order_relaxed ), ctr.load(
memory_order_relaxed ) );
continue;
}
if( aHead.compare_exchange_strong( hdRef, head( next, hdRef.ctr + 1 ),
memory_order_acquire, memory_order_relaxed ) )
return hdRef.ptr;
}
}

In the next step I make this Posix'd with my scoped_signal.

--- PyGate Linux v1.5.12
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bonita Montero@3:633/10 to All on Tue Mar 10 10:43:00 2026

Now it works with Posix an Win32:

#pragma once
#if defined(_WIN32)
#include <Windows.h>
#elif defined(__unix__)
#include "signal_scope.hpp"
#else
#error
#endif
#include <atomic>
#include <setjmp.h>

template<typename T>
struct lockfree_stack
{
struct node : T
{
using T::T;
private:
template<typename T2>
friend struct lockfree_stack;
node *m_next;
};
void push( node *nd ) noexcept;
node *pop() noexcept;
private:
struct head { node *ptr; size_t ctr; };
alignas(2 * sizeof(size_t)) head m_head = { 0, 0 };
};

template<typename T>
void lockfree_stack<T>::push( node *nd ) noexcept
{
using namespace std;
atomic_ref ptr( m_head.ptr );
atomic_ref ctr( m_head.ctr );
head hdRef( ptr.load( memory_order_relaxed ), ctr.load( memory_order_relaxed ) );
atomic_ref<head> aHead( m_head );
do
nd->m_next = (node *)hdRef.ptr;
while( aHead.compare_exchange_strong( hdRef, head( nd, hdRef.ctr + 1 ),
memory_order_release, memory_order_relaxed ) );
}

template<typename T>
auto lockfree_stack<T>::pop() noexcept -> node *
{
using namespace std;
atomic_ref ptr( m_head.ptr );
atomic_ref ctr( m_head.ctr );
head hdRef( ptr.load( memory_order_relaxed ), ctr.load( memory_order_relaxed ) );
atomic_ref<head> aHead( m_head );
for( ; ; )
{
node *next;
if( !hdRef.ptr )
return nullptr;
#if defined(_WIN32)
__try
{
next = hdRef.ptr->m_next;
}
__except( EXCEPTION_EXECUTE_HANDLER )
{
hdRef = head( ptr.load( memory_order_relaxed ), ctr.load(
memory_order_relaxed ) );
continue;
}
#elif defined(__unix__)
{
thread_local jmp_buf jb;
signal_scope<SIGSEGV> sigScope( +[]( int ) { siglongjmp( jb, 1 );
return true; } );
if( int ret = sigsetjmp( jb, 1 ); ret )
continue;
next = hdRef.ptr->m_next;
}
#endif
if( aHead.compare_exchange_strong( hdRef, head( next, hdRef.ctr + 1 ),
memory_order_acquire, memory_order_relaxed ) )
return hdRef.ptr;
}
}

--- PyGate Linux v1.5.12
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tristan Wibberley@3:633/10 to All on Tue Mar 10 13:38:27 2026

On 10/03/2026 09:43, Bonita Montero wrote:

Am 09.03.2026 um 23:36 schrieb Chris M. Thomasson:

I don't think it mentions it there, but SEH is used. I prefer proxy
collectors over hazard pointers, but that's just me. Well, Joe knows.
Hazard pointers basically need async membar behavior or else they eat
your lunch with that damn #StoreLoad.

I've developed a lockfree stack with SEH. With one thread it is
lower than a SLIST, with two threads it is faster and with 32
threads it is magnitudes faster:

I searched for thread_local and didn't see it.

thread_local gets basic structure into the linker and generalises to
uctx's from older POSIX standards for those systems where uctx is
green-like (fillet cut wrt. thread_local, eg linux) even if you have to
exclude those systems that are fiber-like (steak-cut wrt. thread_local,
as I understand it, eg bsd).

Linker needs more intersections of such layout matters to get complete
coverage of context variants.

--
Tristan Wibberley

The message body is Copyright (C) 2026 Tristan Wibberley except
citations and quotations noted. All Rights Reserved except that you may,
of course, cite it academically giving credit to me, distribute it
verbatim as part of a usenet system or its archives, and use it to
promote my greatness and general superiority without misrepresentation
of my opinions other than my opinion of my greatness and general
superiority which you _may_ misrepresent. You definitely MAY NOT train
any production AI system with it but you may train experimental AI that
will only be used for evaluation of the AI methods it implements.

--- PyGate Linux v1.5.12
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bonita Montero@3:633/10 to All on Tue Mar 10 16:50:18 2026

Am 10.03.2026 um 14:38 schrieb Tristan Wibberley:

I searched for thread_local and didn't see it.

For the Windows code it isn't necessary. The Unix-code needs thread
-specific signal handling with SIGSEGV. This is done with scoped_signal. scoped_signal has a global handler for the signal and propagates the
signal to a thread-local function pointer.

thread_local gets basic structure into the linker and generalises to
uctx's from older POSIX standards for those systems where uctx is
green-like (fillet cut wrt. thread_local, eg linux) even if you have to exclude those systems that are fiber-like (steak-cut wrt. thread_local,
as I understand it, eg bsd).

thread_local works as C++11 defined it.

--- PyGate Linux v1.5.12
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tristan Wibberley@3:633/10 to All on Tue Mar 10 22:45:19 2026

On 10/03/2026 15:50, Bonita Montero wrote:

thread_local works as C++11 defined it.

Depends on how standard your C++ is and whether your system defines a generalisation to beyond the C++ standard, the charter of this group
isn't restricted to ISO standards AFAIK so you'll find reality creeping in.

--
Tristan Wibberley

The message body is Copyright (C) 2026 Tristan Wibberley except
citations and quotations noted. All Rights Reserved except that you may,
of course, cite it academically giving credit to me, distribute it
verbatim as part of a usenet system or its archives, and use it to
promote my greatness and general superiority without misrepresentation
of my opinions other than my opinion of my greatness and general
superiority which you _may_ misrepresent. You definitely MAY NOT train
any production AI system with it but you may train experimental AI that
will only be used for evaluation of the AI methods it implements.

--- PyGate Linux v1.5.12
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Chris M. Thomasson@3:633/10 to All on Tue Mar 10 16:59:58 2026

On 3/10/2026 1:56 AM, Bonita Montero wrote:

Am 09.03.2026 um 23:36 schrieb Chris M. Thomasson:

I don't think it mentions it there, but SEH is used. I prefer proxy
collectors over hazard pointers, but that's just me. Well, Joe knows.
Hazard pointers basically need async membar behavior or else they eat
your lunch with that damn #StoreLoad.

I've developed a lockfree stack with SEH. With one thread it is
lower than a SLIST, with two threads it is faster and with 32
threads it is magnitudes faster:

I need some more time to really examine it. Seems to look okay for now.
Well, I cannot notice anything obviously wrong so far.

Notice where you align?

alignas(2 * sizeof(size_t)) head m_head = { 0, 0 };

Well, I don't think a SLIST anchor is automatically aligned. You should
align it on a l2 cache line boundary and pad it up to a l2 cache line.

https://learn.microsoft.com/en-us/windows/win32/sync/using-singly-linked-lists

notice:

typedef struct _PROGRAM_ITEM {
SLIST_ENTRY ItemEntry;
ULONG Signature;
} PROGRAM_ITEM, *PPROGRAM_ITEM;

? Notice _aligned_malloc?

Also, have you checked that your compare_exchange_strong is using
CMPXCHG8B on 32 bit systems and CMPXCHG16B on 64 bit systems? That used
to piss me off when it said not always lock-free.

Little shits.

PROGRAM_ITEM needs to be aligned and padded, and PSLIST_HEADER needs to
be aligned and padded.

Also, how are you testing it? If you never delete any nodes, your SEH is useless. Also, so, well, what memory allocator are you using?

#pragma once
#if defined(_WIN32)
��#include <Windows.h>
#endif
#include <atomic>

template<typename T>
struct lockfree_stack
{
��struct node : T
��{
�� using T::T;
��private:
�� template<typename T>
�� friend struct lockfree_stack;
�� node *m_next;
��};
��void push( node *nd ) noexcept;
��node *pop() noexcept;
private:
��struct head { node *ptr; size_t ctr; };
��static_assert(atomic_ref<head>::is_always_lock_free);
��alignas(2 * sizeof(size_t)) head m_head = { 0, 0 };
};

template<typename T>
void lockfree_stack<T>::push( node *nd ) noexcept
{
��using namespace std;
��atomic_ref ptr( m_head.ptr );
��atomic_ref ctr( m_head.ctr );
��head hdRef( ptr.load( memory_order_relaxed ),
ctr.load( memory_order_relaxed ) );
��atomic_ref<head> aHead( m_head );
��do
�� nd->m_next = (node *)hdRef.ptr;
��while( aHead.compare_exchange_strong( hdRef, head( nd, hdRef.ctr +
1 ), memory_order_release, memory_order_relaxed ) );
}

template<typename T>
auto lockfree_stack<T>::pop() noexcept -> node *
{
��using namespace std;
��atomic_ref ptr( m_head.ptr );
��atomic_ref ctr( m_head.ctr );
��head hdRef( ptr.load( memory_order_relaxed ),
ctr.load( memory_order_relaxed ) );
��atomic_ref<head> aHead( m_head );
��for( ; ; )
��{
�� node *next;
�� if( !hdRef.ptr )
�� return nullptr;
�� __try
�� {
�� next = hdRef.ptr->m_next;
�� }
�� __except( EXCEPTION_EXECUTE_HANDLER )
�� {
�� hdRef = head( ptr.load( memory_order_relaxed ),
ctr.load( memory_order_relaxed ) );
�� continue;
�� }
�� if( aHead.compare_exchange_strong( hdRef, head( next, hdRef.ctr
+ 1 ), memory_order_acquire, memory_order_relaxed ) )
�� return hdRef.ptr;
��}
}

In the next step I make this Posix'd with my scoped_signal.

--- PyGate Linux v1.5.12
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Chris M. Thomasson@3:633/10 to All on Tue Mar 10 17:27:21 2026

On 3/9/2026 3:36 PM, Chris M. Thomasson wrote:

On 3/9/2026 6:44 AM, Bonita Montero wrote:

Under Windows you won't need hazard pointer because you've structured
exception handling (EXCEPTION_ACCESS_VIOLATION). This is possible with
Posix also if you trap SIGSEGV and do a siglongjmp out of the handler.
A problem with that is that these signal handlers are global and the
thread itself only has its private signal mask.
So I invented a little framework to handle signals with a thread-local
signal handler. A global signal handler is established as a static
object before main(). If an according signal (given as an int template
paramter) is caught it dispatches to the thread-local handler.
Here's the source of my little framework (yes, it's rather a framework
and not a lib since it has a callback):[...]

Well, iirc SLIST uses SEH for its work when shit hits the fan on
windows. However, its very specialized for the algorithm:

https://learn.microsoft.com/en-us/windows/win32/sync/interlocked-singly- linked-lists

I don't think it mentions it there, but SEH is used. I prefer proxy collectors over hazard pointers, but that's just me. Well, Joe knows.
Hazard pointers basically need async membar behavior or else they eat
your lunch with that damn #StoreLoad.

Another reason that proxy collectors are useful... Say you wanted to
iterate that lock-free stack. Not sure I would want SEH there. All the
nodes should be alive. Using SEH for the pop is, iirc, what SLIST does,
but, well...

--- PyGate Linux v1.5.12
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Chris M. Thomasson@3:633/10 to All on Tue Mar 10 18:24:48 2026

On 3/10/2026 1:56 AM, Bonita Montero wrote:

Am 09.03.2026 um 23:36 schrieb Chris M. Thomasson:

I don't think it mentions it there, but SEH is used. I prefer proxy
collectors over hazard pointers, but that's just me. Well, Joe knows.
Hazard pointers basically need async membar behavior or else they eat
your lunch with that damn #StoreLoad.

I've developed a lockfree stack with SEH. With one thread it is
lower than a SLIST, with two threads it is faster and with 32
threads it is magnitudes faster:

#pragma once
#if defined(_WIN32)
��#include <Windows.h>
#endif
#include <atomic>

template<typename T>
struct lockfree_stack
{
��struct node : T
��{
�� using T::T;
��private:
�� template<typename T>
�� friend struct lockfree_stack;
�� node *m_next;
��};
��void push( node *nd ) noexcept;
��node *pop() noexcept;
private:
��struct head { node *ptr; size_t ctr; };
��static_assert(atomic_ref<head>::is_always_lock_free);
��alignas(2 * sizeof(size_t)) head m_head = { 0, 0 };
};

template<typename T>
void lockfree_stack<T>::push( node *nd ) noexcept
{
��using namespace std;
��atomic_ref ptr( m_head.ptr );
��atomic_ref ctr( m_head.ctr );
��head hdRef( ptr.load( memory_order_relaxed ),
ctr.load( memory_order_relaxed ) );
��atomic_ref<head> aHead( m_head );
��do
�� nd->m_next = (node *)hdRef.ptr;
��while( aHead.compare_exchange_strong( hdRef, head( nd, hdRef.ctr +
1 ), memory_order_release, memory_order_relaxed ) );
}

template<typename T>
auto lockfree_stack<T>::pop() noexcept -> node *
{
��using namespace std;
��atomic_ref ptr( m_head.ptr );
��atomic_ref ctr( m_head.ctr );
��head hdRef( ptr.load( memory_order_relaxed ),
ctr.load( memory_order_relaxed ) );
��atomic_ref<head> aHead( m_head );
��for( ; ; )
��{
�� node *next;
�� if( !hdRef.ptr )
�� return nullptr;
�� __try
�� {
�� next = hdRef.ptr->m_next;
�� }
�� __except( EXCEPTION_EXECUTE_HANDLER )
�� {
�� hdRef = head( ptr.load( memory_order_relaxed ),
ctr.load( memory_order_relaxed ) );
�� continue;
�� }
�� if( aHead.compare_exchange_strong( hdRef, head( next, hdRef.ctr
+ 1 ), memory_order_acquire, memory_order_relaxed ) )
�� return hdRef.ptr;
��}
}

In the next step I make this Posix'd with my scoped_signal.

Actually, you should keep a counter there just for testing/profiling.
Bump it every time the exception handler triggers and does its thing.

Might be interesting?

--- PyGate Linux v1.5.12
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bonita Montero@3:633/10 to All on Wed Mar 11 05:51:44 2026

Am 10.03.2026 um 23:45 schrieb Tristan Wibberley:

Depends on how standard your C++ is and whether your system defines a generalisation to beyond the C++ standard, the charter of this group
isn't restricted to ISO standards AFAIK so you'll find reality creeping in.

Which compiler is non-conforming in that sense ?

--- PyGate Linux v1.5.12
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bonita Montero@3:633/10 to All on Wed Mar 11 06:00:09 2026

Am 11.03.2026 um 00:59 schrieb Chris M. Thomasson:

Notice where you align?

alignas(2 * sizeof(size_t)) head m_head = { 0, 0 };

Because atomic_ref makes a runtime-check about that.

Well, I don't think a SLIST anchor is automatically aligned. ...

On ARM it should.

? Notice _aligned_malloc?

Not necessary here; the head is just a class.

Also, have you checked that your compare_exchange_strong is using
CMPXCHG8B on 32 bit systems and CMPXCHG16B on 64 bit systems? That
used to piss me off when it said not always lock-free.

Both are lock-free for sure.

--- PyGate Linux v1.5.12
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Chris M. Thomasson@3:633/10 to All on Wed Mar 11 12:28:23 2026

On 3/10/2026 10:00 PM, Bonita Montero wrote:

Am 11.03.2026 um 00:59 schrieb Chris M. Thomasson:

Notice where you align?

alignas(2 * sizeof(size_t)) head m_head = { 0, 0 };

Because atomic_ref makes a runtime-check about that.

Well, I don't think a SLIST anchor is automatically aligned. ...

On ARM it should.

Hopefully. ;^)

? Notice _aligned_malloc?

Not necessary here; the head is just a class.

Just to be tidy, so to speak. Pad _and_ align the head on a l2 cache
line, even with SLIST_HEADER. Completely isolate it. Also, for your
nodes, ditto.

Also, have you checked that your compare_exchange_strong is using
CMPXCHG8B on 32 bit systems and CMPXCHG16B on 64 bit systems? That
used to piss me off when it said not always lock-free.

Both are lock-free for sure.

Good! What does the lock-free check from C++ give you? Ala is_always_lock_free? Some compilers scare me iirc with sometimes
lock-free... Shit happens.

Also, for fun... Keep a counter in a "debug" or "profile" mode of your
code. It simply counts how many times the SEH fired. You are making me reminisce about the old days... ;^)

--- PyGate Linux v1.5.12
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bonita Montero@3:633/10 to All on Thu Mar 12 06:41:08 2026

Am 11.03.2026 um 20:28 schrieb Chris M. Thomasson:

Just to be tidy, so to speak. Pad _and_ align the head on a l2 cache
line, even with SLIST_HEADER. Completely isolate it. Also, for your
nodes, ditto.

It's just enough not to stride two cachlines. With my Zen4-CPU there's
a performance penalty only of the word strides page boundaries.

Good! What does the lock-free check from C++ give you?

I trust in cl, clang-cl, g++ and clang++.
All do it correctly.

Also, for fun... Keep a counter in a "debug" or "profile" mode of your
code. It simply counts how many times the SEH fired. ...

I didn't test that since the code is trivial.

--- PyGate Linux v1.5.12
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Chris M. Thomasson@3:633/10 to All on Wed Mar 11 23:30:20 2026

On 3/11/2026 10:41 PM, Bonita Montero wrote:

Am 11.03.2026 um 20:28 schrieb Chris M. Thomasson:

Just to be tidy, so to speak. Pad _and_ align the head on a l2 cache
line, even with SLIST_HEADER. Completely isolate it. Also, for your
nodes, ditto.

It's just enough not to stride two cachlines. With my Zen4-CPU there's
a performance penalty only of the word strides page boundaries.

Well, align it on a boundary and pad it up to a boundary. That way you
know your head, or anchor, is isolated. Also, for your nodes... Each
node should be its own l2 cacheline, properly aligned and padded. This
goes for using SLIST_HEADER as well...

Good! What does the lock-free check from C++ give you?

I trust in cl, clang-cl, g++ and clang++.
All do it correctly.

Did you do a disassembly? On a 64 bit system it should boil down to LOCK CMPXCHG16B and report alway lock free = true. :^)

Also, for fun... Keep a counter in a "debug" or "profile" mode of your
code. It simply counts how many times the SEH fired. ...

I didn't test that since the code is trivial.

You should do it. Does it get above zero in your tests?

--- PyGate Linux v1.5.12
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bonita Montero@3:633/10 to All on Thu Mar 12 08:36:07 2026

Am 12.03.2026 um 07:30 schrieb Chris M. Thomasson:

Well, align it on a boundary and pad it up to a boundary.

I align it and the padding comes automatically since it's always
eight or sixteen bytes.

That way you know your head, or anchor, is isolated. Also, for your nodes... Each
node should be its own l2 cacheline, properly aligned and padded. This
goes for using SLIST_HEADER as well...

That's the duty of the person pushing or popping the nodes.
With modern memory allocators (jemalloc, mimalloc, tcmalloc)
the alignment comes by themselfes.

Did you do a disassembly? On a 64 bit system it should boil down to LOCK CMPXCHG16B and report alway lock free = true. :^)

Yes, of course that works. The problem with atomic<int128_type> with
MSVC was that 128 bit operations are available only with atomic_ref
for backward compatibility reasons.

You should do it. Does it get above zero in your tests?

It's only one line of code with less than 20 characters.

--- PyGate Linux v1.5.12
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bonita Montero@3:633/10 to All on Thu Mar 12 08:46:39 2026

Am 12.03.2026 um 08:36 schrieb Bonita Montero:

You should do it. Does it get above zero in your tests?

It's only one line of code with less than 20 characters.

It's only that line "next = hdRef.ptr->m_next;" which is
encapsulated in a __try / __except.

--- PyGate Linux v1.5.12
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Chris M. Thomasson@3:633/10 to All on Thu Mar 12 16:05:00 2026

On 3/12/2026 12:46 AM, Bonita Montero wrote:

Am 12.03.2026 um 08:36 schrieb Bonita Montero:

You should do it. Does it get above zero in your tests?

It's only one line of code with less than 20 characters.

It's only that line "next = hdRef.ptr->m_next;" which is
encapsulated in a __try / __except.

Right. Just count how many times it get triggered? It should give you an insight...

--- PyGate Linux v1.5.12
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bonita Montero@3:633/10 to All on Fri Mar 13 01:20:42 2026

Am 13.03.2026 um 00:05 schrieb Chris M. Thomasson:

Right. Just count how many times it get triggered? It should give you an insight...

No matter how often since it's clear that this is a rare condition
with today's memory-allocation.

--- PyGate Linux v1.5.12
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Chris M. Thomasson@3:633/10 to All on Thu Mar 12 21:27:32 2026

On 3/12/2026 5:20 PM, Bonita Montero wrote:

Am 13.03.2026 um 00:05 schrieb Chris M. Thomasson:

Right. Just count how many times it get triggered? It should give you
an insight...

No matter how often since it's clear that this is a rare condition
with today's memory-allocation.

It should be rare, but... Well, add a counter anyway. Just to see how
far it goes over zero... ;^)

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bonita Montero@3:633/10 to All on Fri Mar 13 06:13:49 2026

Am 13.03.2026 um 05:27 schrieb Chris M. Thomasson:

It should be rare, but... Well, add a counter anyway.
Just to see how far it goes over zero... ;^)

You can't simulate that without writing a large program with
a specific allocation scheme.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Chris M. Thomasson@3:633/10 to All on Fri Mar 13 01:33:19 2026

On 3/12/2026 10:13 PM, Bonita Montero wrote:

Am 13.03.2026 um 05:27 schrieb Chris M. Thomasson:

It should be rare, but... Well, add a counter anyway.
Just to see how far it goes over zero... ;^)

You can't simulate that without writing a large program with
a specific allocation scheme.

Well, you need to try? See what that counter gets to after say, a 5 hour
run of your system.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bonita Montero@3:633/10 to All on Fri Mar 13 10:38:59 2026

Am 13.03.2026 um 09:33 schrieb Chris M. Thomasson:

On 3/12/2026 10:13 PM, Bonita Montero wrote:

Am 13.03.2026 um 05:27 schrieb Chris M. Thomasson:

It should be rare, but... Well, add a counter anyway.
Just to see how far it goes over zero... ;^)

You can't simulate that without writing a large program with
a specific allocation scheme.

Well, you need to try? See what that counter gets to after say, a 5 hour
run of your system.

It's very unlikely that the counter comes back to the same value and
the pointer also. With 32 bit it's 2 ^ 32 iterations and the pointer
must also have the same value. With 64 bit it's 2 ^ 64, i.e. 1,8E19
rounds, so you can assume that this never happens also.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Chris M. Thomasson@3:633/10 to All on Fri Mar 13 11:49:40 2026

On 3/13/2026 2:38 AM, Bonita Montero wrote:

Am 13.03.2026 um 09:33 schrieb Chris M. Thomasson:

On 3/12/2026 10:13 PM, Bonita Montero wrote:

Am 13.03.2026 um 05:27 schrieb Chris M. Thomasson:

It should be rare, but... Well, add a counter anyway.
Just to see how far it goes over zero... ;^)

You can't simulate that without writing a large program with
a specific allocation scheme.

Well, you need to try? See what that counter gets to after say, a 5
hour run of your system.

It's very unlikely that the counter comes back to the same value and
the pointer also. With 32 bit it's 2 ^ 32 iterations and the pointer
must also have the same value. With 64 bit it's 2 ^ 64, i.e. 1,8E19
rounds, so you can assume that this never happens also.

It can occur. You can make a special setup to artificially trigger the
SEH to fire?

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Chris M. Thomasson@3:633/10 to All on Fri Mar 13 11:50:35 2026

On 3/13/2026 11:49 AM, Chris M. Thomasson wrote:

On 3/13/2026 2:38 AM, Bonita Montero wrote:

Am 13.03.2026 um 09:33 schrieb Chris M. Thomasson:

On 3/12/2026 10:13 PM, Bonita Montero wrote:

Am 13.03.2026 um 05:27 schrieb Chris M. Thomasson:

It should be rare, but... Well, add a counter anyway.
Just to see how far it goes over zero... ;^)

You can't simulate that without writing a large program with
a specific allocation scheme.

Well, you need to try? See what that counter gets to after say, a 5
hour run of your system.

It's very unlikely that the counter comes back to the same value and
the pointer also. With 32 bit it's 2 ^ 32 iterations and the pointer
must also have the same value. With 64 bit it's 2 ^ 64, i.e. 1,8E19
rounds, so you can assume that this never happens also.

It can occur. You can make a special setup to artificially trigger the
SEH to fire?

Architect a special condition.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Chris M. Thomasson@3:633/10 to All on Fri Mar 13 11:53:21 2026

On 3/13/2026 11:50 AM, Chris M. Thomasson wrote:

On 3/13/2026 11:49 AM, Chris M. Thomasson wrote:

On 3/13/2026 2:38 AM, Bonita Montero wrote:

Am 13.03.2026 um 09:33 schrieb Chris M. Thomasson:

On 3/12/2026 10:13 PM, Bonita Montero wrote:

Am 13.03.2026 um 05:27 schrieb Chris M. Thomasson:

It should be rare, but... Well, add a counter anyway.
Just to see how far it goes over zero... ;^)

You can't simulate that without writing a large program with
a specific allocation scheme.

Well, you need to try? See what that counter gets to after say, a 5
hour run of your system.

It's very unlikely that the counter comes back to the same value and
the pointer also. With 32 bit it's 2 ^ 32 iterations and the pointer
must also have the same value. With 64 bit it's 2 ^ 64, i.e. 1,8E19
rounds, so you can assume that this never happens also.

It can occur. You can make a special setup to artificially trigger the
SEH to fire?

Architect a special condition.

If the counter is non-zero, you know that SEH fired and corrected
things... Fair enough?

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bonita Montero@3:633/10 to All on Sun Mar 15 10:51:28 2026

Am 13.03.2026 um 19:49 schrieb Chris M. Thomasson:

On 3/13/2026 2:38 AM, Bonita Montero wrote:

It's very unlikely that the counter comes back to the same value and
the pointer also. With 32 bit it's 2 ^ 32 iterations and the pointer
must also have the same value. With 64 bit it's 2 ^ 64, i.e. 1,8E19
rounds, so you can assume that this never happens also.

It can occur. You can make a special setup to artificially trigger
the SEH to fire?

It's obvious where and why the access violation may happen. It's easy
to test that with a simple nullpointer-assignment. Taken that code you
can combine the next-pointer read with the trap-catching code. There's
no need to test that, it's only one CPU instruction that may fail here.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Chris M. Thomasson@3:633/10 to All on Sun Mar 15 13:58:44 2026

On 3/15/2026 2:51 AM, Bonita Montero wrote:

Am 13.03.2026 um 19:49 schrieb Chris M. Thomasson:

On 3/13/2026 2:38 AM, Bonita Montero wrote:

It's very unlikely that the counter comes back to the same value and
the pointer also. With 32 bit it's 2 ^ 32 iterations and the pointer
must also have the same value. With 64 bit it's 2 ^ 64, i.e. 1,8E19
rounds, so you can assume that this never happens also.

It can occur. You can make a special setup to artificially trigger
the SEH to fire?

It's obvious where and why the access violation may happen. It's easy
to test that with a simple nullpointer-assignment. Taken that code you
can combine the next-pointer read with the trap-catching code. There's
no need to test that, it's only one CPU instruction that may fail here.

See how many times it trips during an "intense simulation" for fun?
Build a program, let it run for say, 12 hours. And see how many times
the SEH fired. You are making me think about the good ol' days. Thanks.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tristan Wibberley@3:633/10 to All on Mon Mar 16 07:41:28 2026

On 11/03/2026 04:51, Bonita Montero wrote:

Am 10.03.2026 um 23:45 schrieb Tristan Wibberley:

Depends on how standard your C++ is and whether your system defines a
generalisation to beyond the C++ standard, the charter of this group
isn't restricted to ISO standards AFAIK so you'll find reality
creeping in.

Which compiler is non-conforming in that sense ?

I don't know post-C++-11 so well so I couldn't say whether a compiler is non-conforming in that respect. uctx facilities are platform library facilities on POSIX systems (I think it's POSIX pre-late-2000s).

I am trusting descriptions I've received of the difference between Linux
and BSD uctx; I recall testing the linux uctx wrt thread_local and found
it behaved as I expect and need but I ran out of time to experiment with
the BSDs and Windows Fibers to check their behaviours for sure.

BSDs were reputed to have one of the two behaviours and Linux the other.

IIRC from some years ago the Linux behaviour is the intuitive one:
thread_local follows the context of execution ie the traditional meaning
of "thread" from ages before C++ began to be standardised. An apparent C
stack and unwinding breadcrumb-trail evolving with function calls and
returns. When you switch away from a uctx on one POSIX thread on Linux
and switch back to it on another POSIX thread you get the same
thread_local object.

I understand that the BSD behaviour is such that the word "thread" in thread_local refers to the O/S task scheduler notion of a "POSIX thread" specifically and distinctively, so when you switch away from a uctx on
one POSIX thread on a BSD and switch back to it on another POSIX thread
you get different thread_local objects.

I suppose the behaviours would be seen with GCC on both platforms.

--
Tristan Wibberley

The message body is Copyright (C) 2026 Tristan Wibberley except
citations and quotations noted. All Rights Reserved except that you may,
of course, cite it academically giving credit to me, distribute it
verbatim as part of a usenet system or its archives, and use it to
promote my greatness and general superiority without misrepresentation
of my opinions other than my opinion of my greatness and general
superiority which you _may_ misrepresent. You definitely MAY NOT train
any production AI system with it but you may train experimental AI that
will only be used for evaluation of the AI methods it implements.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bonita Montero@3:633/10 to All on Mon Mar 16 09:06:54 2026

Am 15.03.2026 um 21:58 schrieb Chris M. Thomasson:

See how many times it trips during an "intense simulation" for fun?
Build a program, let it run for say, 12 hours. And see how many times
the SEH fired. You are making me think about the good ol' days. Thanks.

It's not possible to simulate that with the same environment like in
a real application since the allocation and dellocation behaviour of
a real application is highly dependent on the surrounding code.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Chris M. Thomasson@3:633/10 to All on Mon Mar 16 12:03:37 2026

On 3/16/2026 1:06 AM, Bonita Montero wrote:

Am 15.03.2026 um 21:58 schrieb Chris M. Thomasson:

See how many times it trips during an "intense simulation" for fun?
Build a program, let it run for say, 12 hours. And see how many times
the SEH fired. You are making me think about the good ol' days. Thanks.

It's not possible to simulate that with the same environment like in
a real application since the allocation and dellocation behaviour of
a real application is highly dependent on the surrounding code.

Well, I have seen many programs over the years that call new for a node;
push it; pop a node; read its data and delete it in a thread pool with producers and consumers.

That is not efficient, but yet they were there.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tristan Wibberley@3:633/10 to All on Tue Mar 17 00:04:25 2026

On 16/03/2026 19:03, Chris M. Thomasson wrote:

Well, I have seen many programs over the years that call new for a node;
push it; pop a node; read its data and delete it in a thread pool with producers and consumers.

That is not efficient, but yet they were there.

overloaded operator new?

--
Tristan Wibberley

The message body is Copyright (C) 2026 Tristan Wibberley except
citations and quotations noted. All Rights Reserved except that you may,
of course, cite it academically giving credit to me, distribute it
verbatim as part of a usenet system or its archives, and use it to
promote my greatness and general superiority without misrepresentation
of my opinions other than my opinion of my greatness and general
superiority which you _may_ misrepresent. You definitely MAY NOT train
any production AI system with it but you may train experimental AI that
will only be used for evaluation of the AI methods it implements.

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Chris M. Thomasson@3:633/10 to All on Mon Mar 16 18:13:19 2026

On 3/16/2026 5:04 PM, Tristan Wibberley wrote:

On 16/03/2026 19:03, Chris M. Thomasson wrote:

Well, I have seen many programs over the years that call new for a node;
push it; pop a node; read its data and delete it in a thread pool with
producers and consumers.

That is not efficient, but yet they were there.

overloaded operator new?

No. Just new and delete. Iirc, some of them were hooked up to custom allocators. Remember that special delete[] for arrays that had meta data
for the size of the array?

--- PyGate Linux v1.5.13
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

Who's Online

System Info

Sysop:	Tetrazocine
Location:	Melbourne, VIC, Australia
Users:	15
Nodes:	8 (0 / 8)
Uptime:	180:44:46
Calls:	216
Files:	21,502
Messages:	83,062

How to smartly deal with SIGSEGV (hazard pointers)

Who's Online

System Info