Is anyone using or planning to use this new C23 feature?
What could be the motivation?
Philipp Klaus Krause <pkk@spth.de> wrote:
Am 22.10.25 um 14:45 schrieb Thiago Adams:
Is anyone using or planning to use this new C23 feature?
What could be the motivation?
Saving memory by using the smallest multiple-of-8 N that will do.
IIUC nothing in the standard says that it is smallest multiple-of-8.
Using gcc-15.1 on AMD-64 is get 'sizeof(_BitInt(22))' equal to 4,
while the number cound fit in 3 bytes.
Also
being able to use bit-fields wider than int.
For me main gain is reasonably standard syntax for integers bigger
that 64 bits.
On 23/11/2025 13:32, Waldek Hebisch wrote:
Philipp Klaus Krause <pkk@spth.de> wrote:
Am 22.10.25 um 14:45 schrieb Thiago Adams:
Is anyone using or planning to use this new C23 feature?
What could be the motivation?
Saving memory by using the smallest multiple-of-8 N that will do.
IIUC nothing in the standard says that it is smallest multiple-of-8.
Using gcc-15.1 on AMD-64 is get 'sizeof(_BitInt(22))' equal to 4,
while the number cound fit in 3 bytes.
The rationale mentions a use-case where there is a custom processor
that might actually have a 22-bit hardware types.
Implementing such odd-size types on regular 8/16/32/64-bit hardware
is full of problems if you want to do it without padding (in order to
get the savings). On even with padding (to get the desired overflow semantics).
Such as working out how pointers to them will work.
Also
being able to use bit-fields wider than int.
For me main gain is reasonably standard syntax for integers bigger
that 64 bits.
Standard syntax I guess would be something like int128_t and
int256_t. Such wider integers tend to be powers of two.
But there are two problems with _BitInt:
* Any odd sizes are allowed, such as _BitInt(391)
* There appears to be no upper limit on size, so _BitInt(2997901) is
a valid type
So what is the result type of multiplying values of those two types?
Integer sizes greater than 1K or 2K bits should use an arbitrary
precision type (which is how large _BitInts will likely be
implemented anyway), where the precision is a runtime attribute.
On 23/11/2025 13:32, Waldek Hebisch wrote:
Philipp Klaus Krause <pkk@spth.de> wrote:
Am 22.10.25 um 14:45 schrieb Thiago Adams:IIUC nothing in the standard says that it is smallest multiple-of-8.
Is anyone using or planning to use this new C23 feature?
What could be the motivation?
Saving memory by using the smallest multiple-of-8 N that will do.
Using gcc-15.1 on AMD-64 is get 'sizeof(_BitInt(22))' equal to 4,
while the number cound fit in 3 bytes.
The rationale mentions a use-case where there is a custom processor
that might actually have a 22-bit hardware types.
Implementing such odd-size types on regular 8/16/32/64-bit hardware is
full of problems if you want to do it without padding (in order to get
the savings). On even with padding (to get the desired overflow
semantics).
Such as working out how pointers to them will work.
Also being able to use bit-fields wider than int.For me main gain is reasonably standard syntax for integers bigger
that 64 bits.
Standard syntax I guess would be something like int128_t and
int256_t. Such wider integers tend to be powers of two.
But there are two problems with _BitInt:
* Any odd sizes are allowed, such as _BitInt(391)
* There appears to be no upper limit on size, so _BitInt(2997901) is a
valid type
So what is the result type of multiplying values of those two types?
Integer sizes greater than 1K or 2K bits should use an arbitrary
precision type (which is how large _BitInts will likely be implemented anyway), where the precision is a runtime attribute.
bart <bc@freeuk.com> writes:
On 23/11/2025 13:32, Waldek Hebisch wrote:
Philipp Klaus Krause <pkk@spth.de> wrote:
Am 22.10.25 um 14:45 schrieb Thiago Adams:IIUC nothing in the standard says that it is smallest multiple-of-8.
Is anyone using or planning to use this new C23 feature?
What could be the motivation?
Saving memory by using the smallest multiple-of-8 N that will do.
Using gcc-15.1 on AMD-64 is get 'sizeof(_BitInt(22))' equal to 4,
while the number cound fit in 3 bytes.
The rationale mentions a use-case where there is a custom processor
that might actually have a 22-bit hardware types.
What rationale are you referring to? There hasn't been an official ISO
C Rationale document since C99.
Implementing such odd-size types on regular 8/16/32/64-bit hardware is
full of problems if you want to do it without padding (in order to get
the savings). On even with padding (to get the desired overflow
semantics).
Such as working out how pointers to them will work.
Why would pointers to _BitInt types be a problem? A _BitInt object is
a fixed-size chunk of memory, similar to a struct object.
Also being able to use bit-fields wider than int.For me main gain is reasonably standard syntax for integers bigger
that 64 bits.
Standard syntax I guess would be something like int128_t and
int256_t. Such wider integers tend to be powers of two.
But there are two problems with _BitInt:
* Any odd sizes are allowed, such as _BitInt(391)
Why is that a problem? If you don't want odd-sized types, don't use them.
* There appears to be no upper limit on size, so _BitInt(2997901) is a
valid type
The upper limit is specified by the implementation as BITINT_MAXWIDTH, a macro defined in <limits.h>.
For gcc 15.2.0 on x86_64, BITINT_MAXWIDTH is 65535 (2**16-1).
For clang 21.1.5 it's 8388608 (2**23 bits, 1048576 bytes).
clang seems to have some problems with _BitInt(8388608). For example,
this program:
#include <limits.h>
_BitInt(BITINT_MAXWIDTH) n = 42;
int main(void) {
n *= n;
}
takes a *long* time to compile with clang. I believe it's generating
inline code to do the 8388608 by 8388608 bit multiplication.
So what is the result type of multiplying values of those two types?
_BitInt types are exempt from the integer promotion rules (so _BitInt(3) doesn't promote to int), but the usual arithmetic conversions apply.
If you multiply values of two _BitInt types, the result is the wider of
the two types.
On 23/11/2025 13:32, Waldek Hebisch wrote:
Philipp Klaus Krause <pkk@spth.de> wrote:
Am 22.10.25 um 14:45 schrieb Thiago Adams:
Is anyone using or planning to use this new C23 feature?
What could be the motivation?
Saving memory by using the smallest multiple-of-8 N that will do.
IIUC nothing in the standard says that it is smallest multiple-of-8.
Using gcc-15.1 on AMD-64 is get 'sizeof(_BitInt(22))' equal to 4,
while the number cound fit in 3 bytes.
The rationale mentions a use-case where there is a custom processor that might actually have a 22-bit hardware types.
Implementing such odd-size types on regular 8/16/32/64-bit hardware is
full of problems if you want to do it without padding (in order to get
the savings). On even with padding (to get the desired overflow semantics).
Such as working out how pointers to them will work.
Also
being able to use bit-fields wider than int.
For me main gain is reasonably standard syntax for integers bigger
that 64 bits.
Standard syntax I guess would be something like int128_t and int256_t.
Such wider integers tend to be powers of two.
But there are two problems with _BitInt:
* Any odd sizes are allowed, such as _BitInt(391)
* There appears to be no upper limit on size, so _BitInt(2997901) is a
valid type
So what is the result type of multiplying values of those two types?
Integer sizes greater than 1K or 2K bits should use an arbitrary
precision type (which is how large _BitInts will likely be implemented anyway), where the precision is a runtime attribute.
On Sun, 23 Nov 2025 13:59:59 +0000
bart <bc@freeuk.com> wrote:
On 23/11/2025 13:32, Waldek Hebisch wrote:
Philipp Klaus Krause <pkk@spth.de> wrote:
Am 22.10.25 um 14:45 schrieb Thiago Adams:
Is anyone using or planning to use this new C23 feature?
What could be the motivation?
Saving memory by using the smallest multiple-of-8 N that will do.
IIUC nothing in the standard says that it is smallest multiple-of-8.
Using gcc-15.1 on AMD-64 is get 'sizeof(_BitInt(22))' equal to 4,
while the number cound fit in 3 bytes.
The rationale mentions a use-case where there is a custom processor
that might actually have a 22-bit hardware types.
Implementing such odd-size types on regular 8/16/32/64-bit hardware
is full of problems if you want to do it without padding (in order to
get the savings). On even with padding (to get the desired overflow
semantics).
Such as working out how pointers to them will work.
Also
being able to use bit-fields wider than int.
For me main gain is reasonably standard syntax for integers bigger
that 64 bits.
Standard syntax I guess would be something like int128_t and
int256_t. Such wider integers tend to be powers of two.
But there are two problems with _BitInt:
* Any odd sizes are allowed, such as _BitInt(391)
* There appears to be no upper limit on size, so _BitInt(2997901) is
a valid type
Upper limit is implementation-defined.
On both existing implementations the limit (on 64-bit targets) appears
to be 2**16 or 2**16-1. I don't remember which one.
So what is the result type of multiplying values of those two types?
I think, traditional C rules for integer types apply here as well: type
of result is the same as type of wider operand. It is arithmetically unsatisfactory, but consistent with the rest of language.
And practically sufficient, because C programmers are already accustomed
to write statements like:
uint64_t foo(uint32_t x, uint16 y) { return (uint64_t)x*y; }
So it would be natural for them to write:
_BitInt(1536) foo(_BitInt(1024) x, _BitInt(512) y) {
return _BitInt(1536)x*y;
}
Since the pattern is so common already, optimizing compiler is likely to understand the meaning and generate only necessary calculations.
Or, at least, to not generate too much of unnecessary calculations.
Integer sizes greater than 1K or 2K bits should use an arbitrary
precision type (which is how large _BitInts will likely be
implemented anyway), where the precision is a runtime attribute.
I think, the Standard is written in such way that implementing _BitInt
as an arbitrary precision numbers, i.e. with number of bits held as part
of the data, is not allowed.
Of course, Language Support Library can be
(and hopefully is, at least for gcc; clang is messy a.t.m.) based on arbitrary precision core routines, but the API used by compiler should
be similar to GMP's mpn_xxx family of functions rather than GMP's
mpz_xxx family, i.e. # of bits as separate parameters from data arrays
rather than combined.
On 23/11/2025 16:06, Michael S wrote:
On Sun, 23 Nov 2025 13:59:59 +0000
bart <bc@freeuk.com> wrote:
So what is the result type of multiplying values of those two types?
I think, traditional C rules for integer types apply here as well: type
of result is the same as type of wider operand. It is arithmetically
unsatisfactory, but consistent with the rest of language.
There is one key difference between the _BitInt() types and other
integer types - with _BitInt(), there are no automatic promotions to
other integer types. Thus if you are using _BitInt() operands in an arithmetic expression, these are not promoted to "int" or "unsigned int" even if they are smaller (lower rank). If you mix _BitInt()'s of
different sizes, then the smaller one is first converted to the larger
type.
I think, the Standard is written in such way that implementing _BitInt
as an arbitrary precision numbers, i.e. with number of bits held as part
of the data, is not allowed.
Correct. _BitInt(N) is a signed integer type with precisely N value
bits. It can have padding bits if necessary (according to the target
ABI), but it can't have any other information.
Of course, Language Support Library can be
(and hopefully is, at least for gcc; clang is messy a.t.m.) based on
arbitrary precision core routines, but the API used by compiler should
be similar to GMP's mpn_xxx family of functions rather than GMP's
mpz_xxx family, i.e. # of bits as separate parameters from data arrays
rather than combined.
Yes, exactly. At the call site, the size of the _BitInt type is always
a known compile-time constant, so it can easily be passed on. Thus :
_BitInt(N) x;
_BitInt(M) y;
_BitInt(NM) z = x * y;
can be implemented as something like :
__bit_int_signed_mult(NM, (unsigned char *) &z,
N, (const unsigned char *) &x,
M, (const unsigned char *) &y);
On 23/11/2025 22:38, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
On 23/11/2025 13:32, Waldek Hebisch wrote:
Philipp Klaus Krause <pkk@spth.de> wrote:
Am 22.10.25 um 14:45 schrieb Thiago Adams:IIUC nothing in the standard says that it is smallest multiple-of-8.
Is anyone using or planning to use this new C23 feature?
What could be the motivation?
Saving memory by using the smallest multiple-of-8 N that will do.
Using gcc-15.1 on AMD-64 is get 'sizeof(_BitInt(22))' equal to 4,
while the number cound fit in 3 bytes.
The rationale mentions a use-case where there is a custom processor
that might actually have a 22-bit hardware types.
What rationale are you referring to? There hasn't been an official ISO
C Rationale document since C99.
See Introduction and Rationale here:
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2709.pdf
Implementing such odd-size types on regular 8/16/32/64-bit hardware is
full of problems if you want to do it without padding (in order to get
the savings). On even with padding (to get the desired overflow
semantics).
Such as working out how pointers to them will work.
Why would pointers to _BitInt types be a problem? A _BitInt object is
a fixed-size chunk of memory, similar to a struct object.
Saving memory was mentioned. To achieve that means having bitfields that
may not start at bit 0 of a byte, and may cross byte- or word-boundaries.
For example, an array of 1M 5-bit values would occupy 1M 8-bit bytes,
but storing packed values means it would use only 625K bytes.
Anyway, pointers to individual values, or to some arbitrary element or
slice of such an array, would need some extra info.
Also being able to use bit-fields wider than int.For me main gain is reasonably standard syntax for integers bigger
that 64 bits.
Standard syntax I guess would be something like int128_t and
int256_t. Such wider integers tend to be powers of two.
But there are two problems with _BitInt:
* Any odd sizes are allowed, such as _BitInt(391)
Why is that a problem? If you don't want odd-sized types, don't use
them.
It is an unnecessary complication. There will be a lot of extra rules
that maybe partly 'implementation defined', so behaviour may vary. And people WILL uses those types because they are there, and likely they
will be inefficient.
What happens when a 391-bit type, even unsigned, overflows? These larger types are likely to use a multiple of 64-bits, and for 391 bits will
need 7 x 64 bits, of which the last word will have 57 bits of padding.
It's very messy.
Specifying a multiple of 64 bits is better; a power of two even better.
* There appears to be no upper limit on size, so _BitInt(2997901) is a
valid type
The upper limit is specified by the implementation as BITINT_MAXWIDTH, a
macro defined in <limits.h>.
For gcc 15.2.0 on x86_64, BITINT_MAXWIDTH is 65535 (2**16-1).
For clang 21.1.5 it's 8388608 (2**23 bits, 1048576 bytes).
clang seems to have some problems with _BitInt(8388608). For example,
this program:
#include <limits.h>
_BitInt(BITINT_MAXWIDTH) n = 42;
int main(void) {
n *= n;
}
takes a *long* time to compile with clang. I believe it's generating
inline code to do the 8388608 by 8388608 bit multiplication.
Now try it with two disparate sizes.
So what is the result type of multiplying values of those two types?
_BitInt types are exempt from the integer promotion rules (so _BitInt(3)
doesn't promote to int), but the usual arithmetic conversions apply.
If you multiply values of two _BitInt types, the result is the wider of
the two types.
So multiplying even two one-million-bit types could overflow!
Such limits for /fixed-width/ integers are ridiculous.
You might say this is no different from defining an array of exactly
123,456 elements. But the use-cases are very different.
I starting going into details but I guess you don't care about such
matters or whether the feature makes much sense.
The proposal is not about saving /memory/. It specifically says that
a _BitInt(N) has the same size and alignment as the smallest basic
type that can contain it, until you get to N greater than 64-bit, in
which they are contained in an array of int64_t. (The reality is a
little more formal, to handle targets that have other sizes of their
basic types.)
On 11/23/2025 7:59 AM, bart wrote:
On 23/11/2025 13:32, Waldek Hebisch wrote:
Philipp Klaus Krause <pkk@spth.de> wrote:
Am 22.10.25 um 14:45 schrieb Thiago Adams:
Is anyone using or planning to use this new C23 feature?
What could be the motivation?
Saving memory by using the smallest multiple-of-8 N that will do.
IIUC nothing in the standard says that it is smallest multiple-of-8.
Using gcc-15.1 on AMD-64 is get 'sizeof(_BitInt(22))' equal to 4,
while the number cound fit in 3 bytes.
The rationale mentions a use-case where there is a custom processor
that might actually have a 22-bit hardware types.
Implementing such odd-size types on regular 8/16/32/64-bit hardware is
full of problems if you want to do it without padding (in order to get
the savings). On even with padding (to get the desired overflow
semantics).
Such as working out how pointers to them will work.
In BGBCC, for any size <= 256 bits, it is padded to the next power-of-2 size. Although if the size is NPOT, some extra handling exists to mask/ extend it to the requested size.
In BGBCC, there is a hard limit of IIRC 16384 bits.
As an extension, it also allows for very large literals, though
currently literals larger than 128 bits can only use hexadecimal or
similar.
This is encoded via suffixes, eg:
I, L, LL, U, UI, UL, ULL: Normal 32/64 bit.
I128, UI128: 128-bit
I256, UI256: 256-bit
other odd sizes map to _BitInt or _UBitInt (unsigned _BitInt).
Larger decimal numbers could be supported, but for now I don't have a
strong need for decimal literals beyond 128 bits.
But my scripting language has an arbitrary-precision /decimal/
floating point type, which can also be used for pure integer
calculations.
It is an unnecessary complication. There will be a lot of extra rules
that maybe partly 'implementation defined', so behaviour may vary.
And people WILL uses those types because they are there, and likely
they will be inefficient.
What happens when a 391-bit type, even unsigned, overflows? These
larger types are likely to use a multiple of 64-bits, and for 391
bits will need 7 x 64 bits, of which the last word will have 57 bits
of padding. It's very messy.
Specifying a multiple of 64 bits is better; a power of two even
better.
On 23/11/2025 22:38, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
On 23/11/2025 13:32, Waldek Hebisch wrote:What rationale are you referring to? There hasn't been an official
Philipp Klaus Krause <pkk@spth.de> wrote:
Am 22.10.25 um 14:45 schrieb Thiago Adams:IIUC nothing in the standard says that it is smallest multiple-of-8.
Is anyone using or planning to use this new C23 feature?
What could be the motivation?
Saving memory by using the smallest multiple-of-8 N that will do.
Using gcc-15.1 on AMD-64 is get 'sizeof(_BitInt(22))' equal to 4,
while the number cound fit in 3 bytes.
The rationale mentions a use-case where there is a custom processor
that might actually have a 22-bit hardware types.
ISO C Rationale document since C99.
See Introduction and Rationale here:
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2709.pdf
Implementing such odd-size types on regular 8/16/32/64-bit hardware isWhy would pointers to _BitInt types be a problem? A _BitInt object
full of problems if you want to do it without padding (in order to get
the savings). On even with padding (to get the desired overflow
semantics).
Such as working out how pointers to them will work.
is a fixed-size chunk of memory, similar to a struct object.
Saving memory was mentioned. To achieve that means having bitfields
that may not start at bit 0 of a byte, and may cross byte- or word-boundaries.
For example, an array of 1M 5-bit values would occupy 1M 8-bit bytes,
but storing packed values means it would use only 625K bytes.
Anyway, pointers to individual values, or to some arbitrary element or
slice of such an array, would need some extra info.
Why is that a problem? If you don't want odd-sized types, don't useAlso being able to use bit-fields wider than int.For me main gain is reasonably standard syntax for integers bigger
that 64 bits.
Standard syntax I guess would be something like int128_t and
int256_t. Such wider integers tend to be powers of two.
But there are two problems with _BitInt:
* Any odd sizes are allowed, such as _BitInt(391)
them.
It is an unnecessary complication. There will be a lot of extra rules
that maybe partly 'implementation defined', so behaviour may vary. And
people WILL uses those types because they are there, and likely they
will be inefficient.
What happens when a 391-bit type, even unsigned, overflows? These
larger types are likely to use a multiple of 64-bits, and for 391 bits
will need 7 x 64 bits, of which the last word will have 57 bits of
padding. It's very messy.
Specifying a multiple of 64 bits is better; a power of two even better.
* There appears to be no upper limit on size, so _BitInt(2997901) is aThe upper limit is specified by the implementation as
valid type
BITINT_MAXWIDTH, a macro defined in <limits.h>. For gcc 15.2.0 on
x86_64, BITINT_MAXWIDTH is 65535 (2**16-1). For clang 21.1.5 it's
8388608 (2**23 bits, 1048576 bytes). clang seems to have some
problems with _BitInt(8388608). For example, this program: #include
<limits.h> _BitInt(BITINT_MAXWIDTH) n = 42;
int main(void) {
n *= n;
}
takes a *long* time to compile with clang. I believe it's generating
inline code to do the 8388608 by 8388608 bit multiplication.
Now try it with two disparate sizes.
So what is the result type of multiplying values of those two types?_BitInt types are exempt from the integer promotion rules (so
_BitInt(3) doesn't promote to int), but the usual arithmetic
conversions apply. If you multiply values of two _BitInt types, the
result is the wider of the two types.
So multiplying even two one-million-bit types could overflow!
Such limits for /fixed-width/ integers are ridiculous.
You might say this is no different from defining an array of exactly
123,456 elements. But the use-cases are very different.
I starting going into details but I guess you don't care about such
matters or whether the feature makes much sense.
On 24/11/2025 01:30, bart wrote:
Saving memory was mentioned. To achieve that means having bitfields
that may not start at bit 0 of a byte, and may cross byte- or word-
boundaries.
No, that is incorrect.
The proposal mentions saving /space/ as relevant in FPGAs - not saving / memory/.
The authors use-case here is in writing code that can be
compiled with a "normal" C compiler on a "normal" target, and also
compiled to FPGA /hardware/, with the same semantics. In hardware, a 5-
bit by 5-bit single-cycle multiplier is very much smaller than an 8-bit
by 8-bit multiplier, and orders of magnitude smaller than if the 5-bit integers are promoted to 32-bit before multiplying.
The proposal is not about saving /memory/. It specifically says that a _BitInt(N) has the same size and alignment as the smallest basic type
that can contain it, until you get to N greater than 64-bit, in which
they are contained in an array of int64_t. (The reality is a little
more formal, to handle targets that have other sizes of their basic types.)
So on a "normal" target, a _BitInt(3) is the same size and alignment as
a uint8_t, a _BitInt(35) is effectively contained in an uint64_t, and an array of 4 _BitInt(17) on a 32-bit system will take 16 bytes or 128
bits, not 68 bits.
As far as I can see, the C23 standard does not specify these details,
and leaves them up to the target ABI. But at the very least, they will always take an integer number of bytes - unsigned char. There can never
be any crossing of byte boundaries.
Why? And why do you talk specifically about odd numbers? I can
understand your concern about packing arrays of _BitInts that are not multiples of 8, though I hope you now understand that it is not the
problem you thought it was. However, I see no reason to suppose that _BitInt(5) is any more or less "complicated" than _BitInt(6) just
because 5 is an odd number!
A major point of the _BitInt concept is to be able to specify and use integers of specific explicit sizes in a way that is as implementation independent as possible. Some aspects of the implementation cannot be avoided - such as the size of unsigned char and alignment and padding
for storage. But the behaviour of the types is entirely independent of
the implementation. There are no "extra rules" - neither for specific implementations, nor for specific sizes of _BitInt's.
Efficiency of implementation is, of course, up to the implementation.
But there is absolutely no reason to suppose that working with a _BitInt
of size up to the implementation's maximum integer type is going to be
less efficient than using other types and masking. For larger
_BitInt's, there are different possible implementation strategies with different pros and cons in regard to efficiency.
What happens when a 391-bit type, even unsigned, overflows? These
larger types are likely to use a multiple of 64-bits, and for 391 bits
will need 7 x 64 bits, of which the last word will have 57 bits of
padding. It's very messy.
It is not messy at all. Signed integer overflow is UB, unsigned integer overflow is wrapping. It's the same as always, and could not be
simpler, clearer or neater.
Such limits for /fixed-width/ integers are ridiculous.
Um, I think you might want to re-read and re-phrase that. When you have fixed-width integers, you have a finite range.
In BGBCC, there is a hard limit of IIRC 16384 bits.
As an extension, it also allows for very large literals, though
currently literals larger than 128 bits can only use hexadecimal or
similar.
This is encoded via suffixes, eg:
I, L, LL, U, UI, UL, ULL: Normal 32/64 bit.
I128, UI128: 128-bit
I256, UI256: 256-bit
other odd sizes map to _BitInt or _UBitInt (unsigned _BitInt).
On Mon, 24 Nov 2025 11:45:18 +0000
bart <bc@freeuk.com> wrote:
But my scripting language has an arbitrary-precision /decimal/
floating point type, which can also be used for pure integer
calculations.
Arbitrary-precision floating point? That sounds problematic, regardless
of base. Unless you don't use the word 'arbitrary' in the same sense
that it is used, for example, in GMP.
Gnu MPFR is very careful to never call itself "arbitrary-precision" in official docs.
Yes, exactly. At the call site, the size of the _BitInt type is
always a known compile-time constant, so it can easily be passed on.
Thus :
_BitInt(N) x;
_BitInt(M) y;
_BitInt(NM) z = x * y;
can be implemented as something like :
__bit_int_signed_mult(NM, (unsigned char *) &z,
N, (const unsigned char *) &x,
M, (const unsigned char *) &y);
On 24/11/2025 09:29, David Brown wrote:
On 23/11/2025 16:06, Michael S wrote:
On Sun, 23 Nov 2025 13:59:59 +0000
bart <bc@freeuk.com> wrote:
There is one key difference between the _BitInt() types and otherSo what is the result type of multiplying values of those two types?
I think, traditional C rules for integer types apply here as well: type
of result is the same as type of wider operand. It is arithmetically
unsatisfactory, but consistent with the rest of language.
integer types - with _BitInt(), there are no automatic promotions to
other integer types. Thus if you are using _BitInt() operands in an
arithmetic expression, these are not promoted to "int" or "unsigned
int" even if they are smaller (lower rank). If you mix _BitInt()'s
of different sizes, then the smaller one is first converted to the
larger type.
I think, the Standard is written in such way that implementing _BitInt
as an arbitrary precision numbers, i.e. with number of bits held as part >>> of the data, is not allowed.
Correct. _BitInt(N) is a signed integer type with precisely N value
bits. It can have padding bits if necessary (according to the
target ABI), but it can't have any other information.
Of course, Language Support Library can beYes, exactly. At the call site, the size of the _BitInt type is
(and hopefully is, at least for gcc; clang is messy a.t.m.) based on
arbitrary precision core routines, but the API used by compiler should
be similar to GMP's mpn_xxx family of functions rather than GMP's
mpz_xxx family, i.e. # of bits as separate parameters from data arrays
rather than combined.
always a known compile-time constant, so it can easily be passed
on. Thus :
_BitInt(N) x;
_BitInt(M) y;
_BitInt(NM) z = x * y;
So what is NM here; is it N*M (the potential maximum size of the
result), or max(N, M)?
How would you write a generic user function that operates on any size
BitInt? For example:
_BitInt(?) bi_square(_BitInt(?));
On 24/11/2025 11:57, Michael S wrote:
On Mon, 24 Nov 2025 11:45:18 +0000
bart <bc@freeuk.com> wrote:
But my scripting language has an arbitrary-precision /decimal/
floating point type, which can also be used for pure integer
calculations.
Arbitrary-precision floating point? That sounds problematic,
regardless of base. Unless you don't use the word 'arbitrary' in
the same sense that it is used, for example, in GMP.
Gnu MPFR is very careful to never call itself "arbitrary-precision"
in official docs.
If you mean problems like repeated multiplies giving ever larger
numbers, then that will happen also with integers (or rationals).
If you mean the problems with a divide operation potentially carrying
on indefinitely, then a cap needs to be set on that.
I haven't attempted libraries for working out trancendental
functions; the problems there are in getting a particular precision
even if you know that in advance.
But for basic arithmetic, it works extremely well.
(While it is built-in to my scripting language, it was originally a standalone library and has been ported to C. See the bignum.c and
bignum.h files here:
https://github.com/sal55/langs/tree/master/bignum
You can try out division like this:
#include <stdio.h>
#include "bignum.h"
int main() {
Bignum a, b, c;
a = bn_makeint(1);
b = bn_makeint(7);
c = bn_init();
bn_div(c, a, b, 1000);
bn_println(c);
}
(Build as 'gcc prog.c bignum.c' etc.)
You can see that 'bn_div' needs a precision argument: this is the
number of significant decimal digits. Using 100M here produced 100
million digits and took about 6 seconds.)
David Brown <david.brown@hesbynett.no> writes:
[...]
Yes, exactly. At the call site, the size of the _BitInt type is
always a known compile-time constant, so it can easily be passed on.
Thus :
_BitInt(N) x;
_BitInt(M) y;
_BitInt(NM) z = x * y;
can be implemented as something like :
__bit_int_signed_mult(NM, (unsigned char *) &z,
N, (const unsigned char *) &x,
M, (const unsigned char *) &y);
That looks like it's supposed to avoid overflow (I'm assuming NM is N
+ M), but it wouldn't work. The type of a C expression is almost
always determined by the expression itself, regardless of the context
in which it appears. The type of x * y is _BitInt(max(N, M)), not _BitInt(N+M), so it can overflow even if the full result would fit
into z.
You can do this instead (not tested):
_BitInt(N) x;
_BitInt(M) y;
_Bit_Int(N+M) z = (_BitInt(N+M))x * y;
(I'm assuming N+M is sufficient, but I might have missed an off-by-one
error somewhere.)
On 24/11/2025 11:17, David Brown wrote:
On 24/11/2025 01:30, bart wrote:
Saving memory was mentioned. To achieve that means having bitfieldsNo, that is incorrect.
that may not start at bit 0 of a byte, and may cross byte- or word-
boundaries.
The proposal mentions saving /space/ as relevant in FPGAs - not
saving / memory/.
But I was responding to a suggestion here that one use of _BitInts - presumably for ordinary hardware - was to save memory.
What about arrays of _BitInt(1), _BitInt(2) and _BitInt(4)? These
could actually be practically implemented, with a few restrictions,
and could save a lot of memory.
In my 391-bit example, the top 7 bits will be within a 64-bit
word. What values will those extra 57 bits be?
Taking just those 7 bits by themselves, if the value is 1111111, that is:
00000000'00000000'00000000'00000000'00000000'00000000'00000000'01111111)
and you do an arithmetic right shift, then you will get 0111111 not
1111111, since the hardware sign bit is bit 63 not bit 6. It needs
more work.
No, I stand by it. There are even different levels of ridiculousness: expecting a language to support a huge fixed integer type like
int1000000_t (when C only acquired 8/16/32/64-bit types in C99, and
those still aren't built-in).
And allowing random sizes such as int817838_t. (See, it seems much
sillier using this syntax!)
For such sizes it makes much more sense to acknowledge the existence
of arbitrary-precision support, so that the equivalents of
int1000000_t and int817838_t would be compatible types. Or you can
forget specific widths and just have the one bigint type.
There are two kinds of BitInts: those smaller than 64 bits; and those
larger than 64 bits, sometimes /much/ larger.
I had been responding to the claim that those smaller types save
memory, compared to using sizes 8/16/32 bits which are commonly
available and have better hardware support.
But if a _BitInt(17) is rounded up to 32 bits, there's not going to be
any saving!
On 24/11/2025 09:29, David Brown wrote:
On 23/11/2025 16:06, Michael S wrote:
On Sun, 23 Nov 2025 13:59:59 +0000
bart <bc@freeuk.com> wrote:
So what is the result type of multiplying values of those two types?
I think, traditional C rules for integer types apply here as well: type
of result is the same as type of wider operand. It is arithmetically
unsatisfactory, but consistent with the rest of language.
There is one key difference between the _BitInt() types and other
integer types - with _BitInt(), there are no automatic promotions to
other integer types. Thus if you are using _BitInt() operands in an
arithmetic expression, these are not promoted to "int" or "unsigned
int" even if they are smaller (lower rank). If you mix _BitInt()'s of
different sizes, then the smaller one is first converted to the larger
type.
I think, the Standard is written in such way that implementing _BitInt
as an arbitrary precision numbers, i.e. with number of bits held as part >>> of the data, is not allowed.
Correct. _BitInt(N) is a signed integer type with precisely N value
bits. It can have padding bits if necessary (according to the target
ABI), but it can't have any other information.
Of course, Language Support Library can be
(and hopefully is, at least for gcc; clang is messy a.t.m.) based on
arbitrary precision core routines, but the API used by compiler should
be similar to GMP's mpn_xxx family of functions rather than GMP's
mpz_xxx family, i.e. # of bits as separate parameters from data arrays
rather than combined.
Yes, exactly. At the call site, the size of the _BitInt type is
always a known compile-time constant, so it can easily be passed on.
Thus :
_BitInt(N) x;
_BitInt(M) y;
_BitInt(NM) z = x * y;
So what is NM here; is it N*M (the potential maximum size of the
result), or max(N, M)?
It sounds like the max precision you get will be the latter.
can be implemented as something like :
__bit_int_signed_mult(NM, (unsigned char *) &z,
N, (const unsigned char *) &x,
M, (const unsigned char *) &y);
How would you write a generic user function that operates on any size BitInt? For example:
_BitInt(?) bi_square(_BitInt(?));
Even if you passed the size as a parameter, there would be a problem
with the BitInt type.
This assumes BitInts are passed and returned by value, but even using BitInt* wouldn't help.
This sets it apart from arrays, where you also define very large, fixed
size arrays, but can use a T(*)[] type to write generic functions, that
take an additional length parameter.
This will be for a particular T, but for BitInt, T is also fixed; it
happens to be an implicit bit type.
David Brown <david.brown@hesbynett.no> writes:
[...]
Yes, exactly. At the call site, the size of the _BitInt type is
always a known compile-time constant, so it can easily be passed on.
Thus :
_BitInt(N) x;
_BitInt(M) y;
_BitInt(NM) z = x * y;
can be implemented as something like :
__bit_int_signed_mult(NM, (unsigned char *) &z,
N, (const unsigned char *) &x,
M, (const unsigned char *) &y);
That looks like it's supposed to avoid overflow (I'm assuming NM is N + M), but
it wouldn't work. The type of a C expression is almost always determined
by the expression itself, regardless of the context in which it appears.
The type of x * y is _BitInt(max(N, M)), not _BitInt(N+M), so it can
overflow even if the full result would fit into z.
You can do this instead (not tested):
_BitInt(N) x;
_BitInt(M) y;
_Bit_Int(N+M) z = (_BitInt(N+M))x * y;
(I'm assuming N+M is sufficient, but I might have missed an off-by-one
error somewhere.)
On Mon, 24 Nov 2025 12:17:58 +0100
David Brown <david.brown@hesbynett.no> wrote:
The proposal is not about saving /memory/. It specifically says that
a _BitInt(N) has the same size and alignment as the smallest basic
type that can contain it, until you get to N greater than 64-bit, in
which they are contained in an array of int64_t. (The reality is a
little more formal, to handle targets that have other sizes of their
basic types.)
That is a bit unfortunate.
Compiler support for arrays of 17 to 24bit numbers packed as 3 octet
per item would have been handy. And not hard at all for compiler to implement, at least on architectures that has proper support for
unaligned access, like x86, POWER, Arm and RISC-V.
I certainly have real-world applications that use packed arrays like
that. They could have been written in cleaner and less error-prone
way if such feature was available.
I suppose, packed numeric arrays with 5, 6 or 7 octets per item are also
used by some people, although they are probably less common than my
case.
bart <bc@freeuk.com> writes:
[...]
There are two kinds of BitInts: those smaller than 64 bits; and those
larger than 64 bits, sometimes /much/ larger.
As far as I know, the standard makes no such distinction.
I had been responding to the claim that those smaller types save
memory, compared to using sizes 8/16/32 bits which are commonly
available and have better hardware support.
I don't recall any such claim. Do you have a citation (other than
the FPGA-specific wording in N2709)?
Am 22.10.25 um 14:45 schrieb Thiago Adams:
Is anyone using or planning to use this new C23 feature?
What could be the motivation?
Saving memory by using the smallest multiple-of-8 N that will do. Also
being able to use bit-fields wider than int.
Saving memory for two reasons:
* On small embedded systems where there is very little memory
* For code that needs to be very fast on big systems to make data
structures fit into cache
bart <bc@freeuk.com> writes:
What about arrays of _BitInt(1), _BitInt(2) and _BitInt(4)? These
could actually be practically implemented, with a few restrictions,
and could save a lot of memory.
No, they couldn't. Array indexing is defined in terms of pointer
arithmetic, and you can't have a pointer to something smaller than one
byte.
On 24/11/2025 11:17, David Brown wrote:
On 24/11/2025 01:30, bart wrote:
Saving memory was mentioned. To achieve that means having bitfields
that may not start at bit 0 of a byte, and may cross byte- or word-
boundaries.
No, that is incorrect.
The proposal mentions saving /space/ as relevant in FPGAs - not saving
/ memory/.
But I was responding to a suggestion here that one use of _BitInts - presumably for ordinary hardware - was to save memory.
That's not going to happen if they are simply rounded up to the next power-of-two type.
If the purpose is, say, a 17-bit type that wraps past values of 131071,
then that sounds like a lot of extra code needed, for something that
does not sound that useful. Why modulo 2**17; why not 100,000? Or any
value more relevant to the task.
The authors use-case here is in writing code that can be compiled
with a "normal" C compiler on a "normal" target, and also compiled to
FPGA /hardware/, with the same semantics. In hardware, a 5- bit by
5-bit single-cycle multiplier is very much smaller than an 8-bit by
8-bit multiplier, and orders of magnitude smaller than if the 5-bit
integers are promoted to 32-bit before multiplying.
The proposal is not about saving /memory/. It specifically says that
a _BitInt(N) has the same size and alignment as the smallest basic
type that can contain it, until you get to N greater than 64-bit, in
which they are contained in an array of int64_t. (The reality is a
little more formal, to handle targets that have other sizes of their
basic types.)
So on a "normal" target, a _BitInt(3) is the same size and alignment
as a uint8_t, a _BitInt(35) is effectively contained in an uint64_t,
and an array of 4 _BitInt(17) on a 32-bit system will take 16 bytes or
128 bits, not 68 bits.
As far as I can see, the C23 standard does not specify these details,
and leaves them up to the target ABI. But at the very least, they
will always take an integer number of bytes - unsigned char. There
can never be any crossing of byte boundaries.
What about arrays of _BitInt(1), _BitInt(2) and _BitInt(4)? These could actually be practically implemented, with a few restrictions, and could
save a lot of memory.
Why? And why do you talk specifically about odd numbers? I can
understand your concern about packing arrays of _BitInts that are not
multiples of 8, though I hope you now understand that it is not the
problem you thought it was. However, I see no reason to suppose that
_BitInt(5) is any more or less "complicated" than _BitInt(6) just
because 5 is an odd number!
I mean odd compared with powers-of-two, or multiples of 8.
A major point of the _BitInt concept is to be able to specify and use
integers of specific explicit sizes in a way that is as implementation
independent as possible. Some aspects of the implementation cannot be
avoided - such as the size of unsigned char and alignment and padding
for storage. But the behaviour of the types is entirely independent
of the implementation. There are no "extra rules" - neither for
specific implementations, nor for specific sizes of _BitInt's.
Efficiency of implementation is, of course, up to the implementation.
But there is absolutely no reason to suppose that working with a
_BitInt of size up to the implementation's maximum integer type is
going to be less efficient than using other types and masking. For
larger _BitInt's, there are different possible implementation
strategies with different pros and cons in regard to efficiency.
What happens when a 391-bit type, even unsigned, overflows? These
larger types are likely to use a multiple of 64-bits, and for 391
bits will need 7 x 64 bits, of which the last word will have 57 bits
of padding. It's very messy.
It is not messy at all. Signed integer overflow is UB, unsigned
integer overflow is wrapping. It's the same as always, and could not
be simpler, clearer or neater.
In my 391-bit example, the top 7 bits will be within a 64-bit word. What values will those extra 57 bits be?
Taking just those 7 bits by themselves, if the value is 1111111, that is:
00000000'00000000'00000000'00000000'00000000'00000000'00000000'01111111)
and you do an arithmetic right shift, then you will get 0111111 not
1111111, since the hardware sign bit is bit 63 not bit 6. It needs more work.
Such limits for /fixed-width/ integers are ridiculous.
Um, I think you might want to re-read and re-phrase that. When you
have fixed-width integers, you have a finite range.
No, I stand by it. There are even different levels of ridiculousness: expecting a language to support a huge fixed integer type like
int1000000_t (when C only acquired 8/16/32/64-bit types in C99, and
those still aren't built-in).
And allowing random sizes such as int817838_t. (See, it seems much
sillier using this syntax!)
For such sizes it makes much more sense to acknowledge the existence of arbitrary-precision support, so that the equivalents of int1000000_t and int817838_t would be compatible types. Or you can forget specific widths
and just have the one bigint type.
(I use such types, but within a library, and there there are ways cap
the precision.)
On Mon, 24 Nov 2025 12:56:58 +0000
bart <bc@freeuk.com> wrote:
On 24/11/2025 11:57, Michael S wrote:
On Mon, 24 Nov 2025 11:45:18 +0000
bart <bc@freeuk.com> wrote:
But my scripting language has an arbitrary-precision /decimal/
floating point type, which can also be used for pure integer
calculations.
Arbitrary-precision floating point? That sounds problematic,
regardless of base. Unless you don't use the word 'arbitrary' in
the same sense that it is used, for example, in GMP.
Gnu MPFR is very careful to never call itself "arbitrary-precision"
in official docs.
If you mean problems like repeated multiplies giving ever larger
numbers, then that will happen also with integers (or rationals).
If you mean the problems with a divide operation potentially carrying
on indefinitely, then a cap needs to be set on that.
Yes, that what I meant.
BGB <cr88192@gmail.com> writes:
[...]
In BGBCC, there is a hard limit of IIRC 16384 bits.
As an extension, it also allows for very large literals, though
currently literals larger than 128 bits can only use hexadecimal or
similar.
This is encoded via suffixes, eg:
I, L, LL, U, UI, UL, ULL: Normal 32/64 bit.
I128, UI128: 128-bit
I256, UI256: 256-bit
other odd sizes map to _BitInt or _UBitInt (unsigned _BitInt).
In C23, an integer constant with a "wb" or "WB" suffix is of type
_BitInt(n). One with a "wbu" suffix is of type unsigned _BitInt(n).
The value of n is the smallest that can accomodate the value of the
constant.
[...]
On 24/11/2025 13:31, bart wrote:
That's all up to the implementation.
You are worrying about completely negligible things here.
And allowing random sizes such as int817838_t. (See, it seems much
sillier using this syntax!)
I had taken your "ridiculous" comment to be part of your complaint that "multiplying even two one-million-bit types could overflow". But those statements are independent, then only the first is silly - of course arithmetic on any finite sized type can overflow unless specifically
limited (such as by wrapping behaviour for unsigned types). I agree
that huge fixed-size integer types are not useful, though I am not sure where the ideal limit lies.
On 24/11/2025 13:35, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
[...]
There are two kinds of BitInts: those smaller than 64 bits; and those
larger than 64 bits, sometimes /much/ larger.
As far as I know, the standard makes no such distinction.
*I* am making the distinction. From an implementation point of view (and assuming 64-bit hardware), they are quite different.
And that leads to different kinds of language features.
If the possibilities above 64 bits were less ambitious (say i128 and
i256), then the concept might be stretched to cover both. But not when
when you can also have i1234567.
It would be having a GETBITS macro, which is not limited to a 1- to 63-
bit bitfield of a u64 value, but could return a slice of an arbitrarily large array.
I had been responding to the claim that those smaller types save
memory, compared to using sizes 8/16/32 bits which are commonly
available and have better hardware support.
I don't recall any such claim. Do you have a citation (other than
the FPGA-specific wording in N2709)?
This is where it came up in this thread:
On 23/11/2025 11:46, Philipp Klaus Krause wrote:
Am 22.10.25 um 14:45 schrieb Thiago Adams:
Is anyone using or planning to use this new C23 feature?
What could be the motivation?
Saving memory by using the smallest multiple-of-8 N that will do. Also being able to use bit-fields wider than int.
Saving memory for two reasons:
* On small embedded systems where there is very little memory
* For code that needs to be very fast on big systems to make data structures fit into cache
Although this doesn't go as far as using odd bit-sizes: it would mean
using sizes like 24, 40, 48, and 56 bits instead of 32 or 64 bits.
The savings would be sparse.
On 24/11/2025 14:41, David Brown wrote:
On 24/11/2025 13:31, bart wrote:
That's all up to the implementation.
You are worrying about completely negligible things here.
Is it that negligible? That's easy to say when you're not doing the implementing!
However it may impact on the size and performance of code.
And allowing random sizes such as int817838_t. (See, it seems much
sillier using this syntax!)
I had taken your "ridiculous" comment to be part of your complaint
that "multiplying even two one-million-bit types could overflow". But
those statements are independent, then only the first is silly - of
course arithmetic on any finite sized type can overflow unless
specifically limited (such as by wrapping behaviour for unsigned
types). I agree that huge fixed-size integer types are not useful,
though I am not sure where the ideal limit lies.
You don't think it strange that C doesn't even have a 128-bit type yet
(it only barely has width-specific 64-bit ones).
There is just the poor gnu extension where 128-bit integers didn't have
a literal form, and there was no way to print such values.
But now there is this huge leap, not only to 128/256/512/1024 bits, but
to conceivably millions, plus the ability to specify any weird type you like, like 182 bits (eg. somebody makes a typo for _BitInt(128), but
they silently get a viable type that happens to be a little less efficient!).
So, 20 years of having 64-bit processors with little or no support for
even double-word types, and now there is this explosion in capabilities.
Or, are literals and print facilities for these new types still missing?
Personally I think they should have got the basics right first, like a decent 128-bit type, proper literals, and ways to print.
This looks like VLAs all over again (eg. is '_BitInt(1000000) A'
allocated on the stack?). A poorly suited, hard-to-implement feature.
On 24/11/2025 19:35, bart wrote:
There is just the poor gnu extension where 128-bit integers didn't
have a literal form, and there was no way to print such values.
How many times have you felt the need to write a 128-bit literal? And
how many times has that literal been in decimal
(it's not difficult to
put together a 128-bit value from two 64-bit values)? You really are
making a mountain out of a molehill here.
But now there is this huge leap, not only to 128/256/512/1024 bits,
but to conceivably millions, plus the ability to specify any weird
type you like, like 182 bits (eg. somebody makes a typo for
_BitInt(128), but they silently get a viable type that happens to be a
little less efficient!).
And this huge leap also lets you have 128-bit, 256-bit, 512-bit, etc.,
On 24/11/2025 13:33, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
What about arrays of _BitInt(1), _BitInt(2) and _BitInt(4)? TheseNo, they couldn't. Array indexing is defined in terms of pointer
could actually be practically implemented, with a few restrictions,
and could save a lot of memory.
arithmetic, and you can't have a pointer to something smaller than one
byte.
The restrictions I mentioned were to do with pointers to individual bits.
It is possible that operations such as:
x = A[i]
A[i] = x
can be well defined when A is an array of 1/2/4-bit values, even if
expressed like this:
*(A + i)
But this would have to be indivisible when A is such an array: only
the whole thing is valid, not (A + i) by itself, or A by itself; you'd
need &A.
This would need a small tweak to the language, but that is nothing
compared to supporting (i3783467 * i999 / i3) >> i17.
But I write a script in my dynamic language,[...]
C can only get down to that u8 figure (100MB) using its 'char'
type. Even 'bool' doesn't make it smaller (presumably for the reasons
you mentioned).
You are forced to emulate such arrays in user-code using shifts and masks.
On 11/24/2025 8:21 AM, bart wrote:
On 24/11/2025 13:35, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
[...]
There are two kinds of BitInts: those smaller than 64 bits; and those
larger than 64 bits, sometimes /much/ larger.
As far as I know, the standard makes no such distinction.
*I* am making the distinction. From an implementation point of view
(and assuming 64-bit hardware), they are quite different.
And that leads to different kinds of language features.
As noted, as I understand it there is no reason for the storage to be
smaller than the next power-of-2 size.
On 24/11/2025 12:17, bart wrote:[...]
On 24/11/2025 09:29, David Brown wrote:
So if you want the full range of values of x and y to be usable here,
then NM would have to be N * M. But you would also need a cast, such
as "_BitInt(NM) z = (_BitInt(NM)) x * y;", just as you do if you want
to multiply two 32-bit ints as a 64-bit operation.
Alternatively, you might know more about the values that might be in x
and y, and have a smaller NM (though you still need a cast if it is
greater than both N and M). Or you might be using unsigned types and
want the wrapping / masking behaviour.
The point was not what size NM is, but that it is known to the
compiler at the time of writing the expression.
It sounds like the max precision you get will be the latter.
can be implemented as something like :How would you write a generic user function that operates on any
__bit_int_signed_mult(NM, (unsigned char *) &z,
N, (const unsigned char *) &x,
M, (const unsigned char *) &y);
size BitInt? For example:
_BitInt(?) bi_square(_BitInt(?));
You can't. _BitInt(N) and _BitInt(M) are distinct types, for
differing N and M. You can't write a generic user function in C that implements "T foo(T)" where T can be "int", "short", "long int", or
other types. C simply does not have type-generic functions.
You /can/ write generic macros that handle different _BitInt types,
but that would quickly get painful given that you'd need a case for
each size of _BitInt you wanted for the _Generic macro.
This assumes BitInts are passed and returned by value, but even
using BitInt* wouldn't help.
Yes, they are passed around as values - they are integer types and are
passed around like other integer types. (Implementations may use
stack blocks and pointers for passing the values around if they are
too big for registers, just as implementations can do with any value
type. That's an implementation detail - logically, they are passed and returned as values.)
On 24/11/2025 14:41, David Brown wrote:
On 24/11/2025 13:31, bart wrote:
That's all up to the implementation.
You are worrying about completely negligible things here.
Is it that negligible? That's easy to say when you're not doing the implementing! However it may impact on the size and performance of
code.
You don't think it strange that C doesn't even have a 128-bit type yet
(it only barely has width-specific 64-bit ones).
There is just the poor gnu extension where 128-bit integers didn't
have a literal form, and there was no way to print such values.
But now there is this huge leap, not only to 128/256/512/1024 bits,
but to conceivably millions, plus the ability to specify any weird
type you like, like 182 bits (eg. somebody makes a typo for
_BitInt(128), but they silently get a viable type that happens to be a
little less efficient!).
So, 20 years of having 64-bit processors with little or no support for
even double-word types, and now there is this explosion in
capabilities.
Or, are literals and print facilities for these new types still missing?
Personally I think they should have got the basics right first, like a
decent 128-bit type, proper literals, and ways to print.
This looks like VLAs all over again (eg. is '_BitInt(1000000) A'
allocated on the stack?). A poorly suited, hard-to-implement feature.
On 24/11/2025 20:26, David Brown wrote:[...]
And this huge leap also lets you have 128-bit, 256-bit, 512-bit,
etc.,
And 821 bits. This is what I don't get. Why is THAT so important?
Why couldn't 128/256/etc have been added first, and then those funny
ones if the demand was still there?
If the proposal had instead been simply to extend the 'u8 u16 u32 u64'
set of types by a few more entries on the right, say 'u128 u256 u512',
would anyone have been clamouring for types like 'u1187'? I doubt it.
For sub-64-bit types on conventional hardware, I simply can't see the
point, not if they are rounded up anyway. Either have a full
range-based types like Ada, or not at all.
BGB <cr88192@gmail.com> writes:
On 11/24/2025 8:21 AM, bart wrote:
On 24/11/2025 13:35, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
[...]
There are two kinds of BitInts: those smaller than 64 bits; and those >>>>> larger than 64 bits, sometimes /much/ larger.
As far as I know, the standard makes no such distinction.
*I* am making the distinction. From an implementation point of view
(and assuming 64-bit hardware), they are quite different.
And that leads to different kinds of language features.
As noted, as I understand it there is no reason for the storage to be
smaller than the next power-of-2 size.
Really?
Rounding up to 8, 16, 32, or the next multiple of 64 bits seems
reasonable. Rounding 1025 bits up to 2048 does not (and is not
the current gcc and llvm/clang implementations do).
What advantage does rounding 1025 up to 2048 give you over rounding
it up to 1088 (17*64)? It seems to me that the only real difference
is in how many times a loop has to iterate.
My understanding is that power-of-two sizes lose their advantages
beyond about 64 or 128 bits. Am I mistaken?
[...]
David Brown <david.brown@hesbynett.no> writes:
On 24/11/2025 12:17, bart wrote:[...]
On 24/11/2025 09:29, David Brown wrote:
So if you want the full range of values of x and y to be usable here,
then NM would have to be N * M. But you would also need a cast, such
as "_BitInt(NM) z = (_BitInt(NM)) x * y;", just as you do if you want
to multiply two 32-bit ints as a 64-bit operation.
N + M, not N * M.
Alternatively, you might know more about the values that might be in x
and y, and have a smaller NM (though you still need a cast if it is
greater than both N and M). Or you might be using unsigned types and
want the wrapping / masking behaviour.
The point was not what size NM is, but that it is known to the
compiler at the time of writing the expression.
It sounds like the max precision you get will be the latter.
can be implemented as something like :How would you write a generic user function that operates on any
__bit_int_signed_mult(NM, (unsigned char *) &z,
N, (const unsigned char *) &x,
M, (const unsigned char *) &y);
size BitInt? For example:
_BitInt(?) bi_square(_BitInt(?));
You can't. _BitInt(N) and _BitInt(M) are distinct types, for
differing N and M. You can't write a generic user function in C that
implements "T foo(T)" where T can be "int", "short", "long int", or
other types. C simply does not have type-generic functions.
Sort of. C23 defines the term "generic function" (N3220 7.26.5.1,
string search functions). For example, strchr() can take a const void* argument and return a const void* result, or it can take a void*
argument and return a void* result. (C++ does this by having two
overloaded strchr() functions.)
These "generic functions" are (almost certainly) implemented as macros
that use _Generic. If you bypass the macro definition, you get the
function that can take a const char* and return a char*.
So C doesn't have type-generic functions, but it does have feature that
let you implement things that act like type-generic functions.
You /can/ write generic macros that handle different _BitInt types,
but that would quickly get painful given that you'd need a case for
each size of _BitInt you wanted for the _Generic macro.
Indeed. A _Generic selection that handles all the ordinary non-extended integer types needs to handle 12 cases if I'm counting correctly, which
is feasible. But the addition of bit-precise types adds
BITINT_MAXWIDTH*2-1 new distinct predefined types, and a generic
selection would need one case for each.
However, you could have a function that takes a void*, a size, and a
width as arguments and operates on a _BitInt(?) or unsigned _BitInt(?)
type. In fact, gcc has internal functions like that for multiplication
and division. (You mentioned something like that in text that I've
snipped.)
[...]
This assumes BitInts are passed and returned by value, but even
using BitInt* wouldn't help.
Yes, they are passed around as values - they are integer types and are
passed around like other integer types. (Implementations may use
stack blocks and pointers for passing the values around if they are
too big for registers, just as implementations can do with any value
type. That's an implementation detail - logically, they are passed and
returned as values.)
Yes, and in general a _BitInt argument has to be copied to the
corresponding parameter, since a change to the parameter can't affect
the value of the argument.
But passing huge _BitInts by value is no more problematic than passing
huge structs by value.
bart <bc@freeuk.com> writes:
On 24/11/2025 14:41, David Brown wrote:
On 24/11/2025 13:31, bart wrote:
That's all up to the implementation.
You are worrying about completely negligible things here.
Is it that negligible? That's easy to say when you're not doing the
implementing! However it may impact on the size and performance of
code.
You're right, it's easy to say when I'm not doing the implementing.
Which I'm not.
The maintainers of gcc and llvm/clang have done that for me, so I don't
have to worry about it.
Are you planning to implement bit-precise integer types yourself? I
don't think you've said so in this thread. If you are, you have at
least two existing implementations you can look at for ideas.
Here's an idea. Rather than asserting that _BitInt(1'000'000)
is silly and obviously useless, try *asking* how it's useful.
I personally don't know what I'd do with a million-bit integer,
but maybe somebody out there has a valid use for it. Meanwhile,
its existence doesn't bother me.
My guess is that once you've implemented integers wider than 128
or 256 bits, million-bit integers aren't much extra effort.
No, apart from the usual set of 8/16/32/64 bits. I've done 128 bits,
and played with 1/2/4 bits, but my view is that above this range,
using exact bit-sizes is the wrong way to go.
While for odd sizes up to 64 bits, bitfields are more apt than
employing the type system.
On Tue, 25 Nov 2025 11:38:32 +0000
bart <bc@freeuk.com> wrote:
No, apart from the usual set of 8/16/32/64 bits. I've done 128 bits,
and played with 1/2/4 bits, but my view is that above this range,
using exact bit-sizes is the wrong way to go.
Either that or manifestation of your NIH syndrome.
Which explanation do you consider more likely?
While for odd sizes up to 64 bits, bitfields are more apt than
employing the type system.
int sign_extend12(unsigned x)
{
return (_BitInt(12))x;
}
Nice, is not it?
On 25/11/2025 12:12, Michael S wrote:
On Tue, 25 Nov 2025 11:38:32 +0000
bart <bc@freeuk.com> wrote:
No, apart from the usual set of 8/16/32/64 bits. I've done 128
bits, and played with 1/2/4 bits, but my view is that above this
range, using exact bit-sizes is the wrong way to go.
Either that or manifestation of your NIH syndrome.
Which explanation do you consider more likely?
I can invent anything I like. I've looked at such things many times,
and came to the conclusion that using types is the wrong approach,
certainly for this level of language.
(Yes, long ago I allowed type denotations such as:
int*N a a has N bytes or N*8 bits (from Fortran)
int:N b b has N bits
Then I realised I was never going to use anything other than some power-of-two size of 8 bits or more, for discrete variables.)
While for odd sizes up to 64 bits, bitfields are more apt than
employing the type system.
int sign_extend12(unsigned x)
{
return (_BitInt(12))x;
}
Nice, is not it?
By 'bitfields' I mean bitfields within structs, but also bitfield
operators whch work on any integer values.
Bitfields are nearly always unsigned in my projects, so I don't have
an exact equivalent to this example.
But a solution not using types would look like this:
y := x.[0..11] # get first 12 bits
y := x.[12..23] # next 12 bits
x.[24..35] := y # set next 12 bits (x, y are 64 bits!)
y := x.[0..i] # get first i+1 bits
To optionally interpret a bitfield extraction as signed, I'd need to
think up some way of denoting that. For bitfield insertion it doesn't matter.
Your example is interesting but rather limited; while it does deal
with a signed field:
* That field can only start at bit zero, without extra manipulations
* The size is fixed at 12 (if you decide to change the field size, or
you want it as a constant parameter somewhere, it starts getting
awkward)
* If you are dealing with a range of bitfield sizes, you will need a
dedicated function, or somehow enumerate all possibilities using
_Generic.
* It's not clear how bitfield insertion would work, whether you'd
still employ a _BitInt type, and/or just revert to those shifts and
masks.
On Tue, 25 Nov 2025 14:57:17 +0000
bart <bc@freeuk.com> wrote:
On 25/11/2025 12:12, Michael S wrote:
On Tue, 25 Nov 2025 11:38:32 +0000
bart <bc@freeuk.com> wrote:
No, apart from the usual set of 8/16/32/64 bits. I've done 128
bits, and played with 1/2/4 bits, but my view is that above this
range, using exact bit-sizes is the wrong way to go.
Either that or manifestation of your NIH syndrome.
Which explanation do you consider more likely?
I can invent anything I like. I've looked at such things many times,
and came to the conclusion that using types is the wrong approach,
certainly for this level of language.
(Yes, long ago I allowed type denotations such as:
int*N a a has N bytes or N*8 bits (from Fortran)
int:N b b has N bits
Then I realised I was never going to use anything other than some
power-of-two size of 8 bits or more, for discrete variables.)
While for odd sizes up to 64 bits, bitfields are more apt than
employing the type system.
int sign_extend12(unsigned x)
{
return (_BitInt(12))x;
}
Nice, is not it?
By 'bitfields' I mean bitfields within structs, but also bitfield
operators whch work on any integer values.
Bitfields are nearly always unsigned in my projects, so I don't have
an exact equivalent to this example.
But a solution not using types would look like this:
y := x.[0..11] # get first 12 bits
y := x.[12..23] # next 12 bits
x.[24..35] := y # set next 12 bits (x, y are 64 bits!)
y := x.[0..i] # get first i+1 bits
To optionally interpret a bitfield extraction as signed, I'd need to
think up some way of denoting that. For bitfield insertion it doesn't
matter.
Your example is interesting but rather limited; while it does deal
with a signed field:
* That field can only start at bit zero, without extra manipulations
* The size is fixed at 12 (if you decide to change the field size, or
you want it as a constant parameter somewhere, it starts getting
awkward)
* If you are dealing with a range of bitfield sizes, you will need a
dedicated function, or somehow enumerate all possibilities using
_Generic.
* It's not clear how bitfield insertion would work, whether you'd
still employ a _BitInt type, and/or just revert to those shifts and
masks.
My example is from real world. Dealing with A-to-D converters. I need
sign extension of that sort quite often.
* I don't recollect needing to sign-extend field that does not start
offset zero,
Same for your other points - I don't recollect that I neeed something
like that sufficiently often to ... well... recollect.
On 24/11/2025 20:26, David Brown wrote:
On 24/11/2025 19:35, bart wrote:
There is just the poor gnu extension where 128-bit integers didn't
have a literal form, and there was no way to print such values.
How many times have you felt the need to write a 128-bit literal? And
how many times has that literal been in decimal
I don't think there were hex literals either.
(it's not difficult to put together a 128-bit value from two 64-bit
values)? You really are making a mountain out of a molehill here.
Well, it seems that such literals now exist (with 'wb' suffix). So I
guess somebody other than you decided that feature WAS worth adding!
But you can't as yet print out such values; I guess you can't 'scanf'
them either. These are necessary to perform I/O on such data from/to
text files.
I must say you have a very laidback attibute to language design:
"Let's add this 128-bit type, but let's not bother providing a way to
enter such values, or add any facilities to print them out. How often
would somebody need to do that anyway? But if they really /have/ to,
then there are plenty of hoops they can jump through to achieve it!"
(In my implementation of 128-bit types, from 2021, I allowed full 128-
bit decimal, hex and binary literals, and they could be printed in any
base.
But they weren't used enough and were dropped, in favour of an unlimited precision type in my other language.
On interesting use-case for literals was short-strings; 128 bits allowed character literals up to 16 characters: 'ABCDEFGHIJKLMNOP'. I think C is still stuck at one, or 4 if you're lucky.)
But now there is this huge leap, not only to 128/256/512/1024 bits,
but to conceivably millions, plus the ability to specify any weird
type you like, like 182 bits (eg. somebody makes a typo for
_BitInt(128), but they silently get a viable type that happens to be
a little less efficient!).
And this huge leap also lets you have 128-bit, 256-bit, 512-bit, etc.,
And 821 bits. This is what I don't get. Why is THAT so important?
Why couldn't 128/256/etc have been added first, and then those funny
ones if the demand was still there?
If the proposal had instead been simply to extend the 'u8 u16 u32 u64'
set of types by a few more entries on the right, say 'u128 u256 u512',
would anyone have been clamouring for types like 'u1187'? I doubt it.
For sub-64-bit types on conventional hardware, I simply can't see the
point, not if they are rounded up anyway. Either have a full range-based types like Ada, or not at all.
On Tue, 25 Nov 2025 11:38:32 +0000
bart <bc@freeuk.com> wrote:
No, apart from the usual set of 8/16/32/64 bits. I've done 128 bits,
and played with 1/2/4 bits, but my view is that above this range,
using exact bit-sizes is the wrong way to go.
Either that or manifestation of your NIH syndrome.
Which explanation do you consider more likely?
While for odd sizes up to 64 bits, bitfields are more apt than
employing the type system.
int sign_extend12(unsigned x)
{
return (_BitInt(12))x;
}
Nice, is not it?
Doing the same with bit fields is possible, but less obvious and less convenient. Also it potentially can play havoc with compiler that took
strict aliasing rules more seriously than they deserve.
int sign_extend12(unsigned x)
{
struct bar {
signed a: 12;
};
return ((struct bar*)&x)->a;;
}
Doing the same with shifts is almost as convenient as with _BitInt and
it works great on all popular compilers, but according to wording of C Standard it is Undefined Behavior.
int sign_extend12(unsigned x)
{
return (int32_t)((uint32_t)x << 20) >> 20;
}
Doing the same with shifts is almost as convenient as with _BitInt and
it works great on all popular compilers, but according to wording of C Standard it is Undefined Behavior.
int sign_extend12(unsigned x)
{
return (int32_t)((uint32_t)x << 20) >> 20;
}
But the _BitInt version is definitely neater. I can see myself using _BitInt(12) and similar sizes for things like values read from[...]
hardware sensors of different resolutions.
(The code for all three is the same with gcc on x86 or arm64 -
unfortunately, gcc does not yet support _BitInt on many targets.)
On 24/11/2025 23:27, bart wrote:
On interesting use-case for literals was short-strings; 128 bits
allowed character literals up to 16 characters: 'ABCDEFGHIJKLMNOP'. I
think C is still stuck at one, or 4 if you're lucky.)
I have no idea or opinion on why /you/ might want 128-bit or larger
integer types. I believe there is very little use for "normal" numbers
- things you might want to write as literals, calculate with, and read
or write - that won't fit perfectly well within 64 bit types, and would
not be better served by arbitrary sized integers.
Arbitrary sized
integers are a very different kettle of fish from large fixed-size
integers, and are not something that would fit in the C language - they
need a library.
I can tell you why /I/ might find larger integer types useful. They
include :
* 128-bit for IPv6 address. These use a variety of styles for input and display, and thus would use specialised routines, not simple literals or printf-style IO.
* Big units for passing data around with larger memory transfers, using
SIMD registers. IO is irrelevant here.
* Cryptography. IO is irrelevant here. But a variety of sizes are
useful including 56, 80, 112, 128, 168, 192, 384, 512, 521, 2048, 3072, 4096, 7680, 8096 bits. There may be more common sizes - I'm just
thinking of DES, 3DES, AES, SHA, ECC and RSA.
Smaller sizes can be useful for holding RGB pixel values, audio data, etc.
And 821 bits. This is what I don't get. Why is THAT so important?
Why couldn't 128/256/etc have been added first, and then those funny
ones if the demand was still there?
The folks behind the proposal provided both. The fact that you can
write _BitInt(821) does not in any way hinder use of _BitInt(256). I
really don't get your problem here.
If the proposal had instead been simply to extend the 'u8 u16 u32 u64'
set of types by a few more entries on the right, say 'u128 u256 u512',
would anyone have been clamouring for types like 'u1187'? I doubt it.
/You/ might not have wanted them, but other people would.
For sub-64-bit types on conventional hardware, I simply can't see the
point, not if they are rounded up anyway. Either have a full range-
based types like Ada, or not at all.
Fortunately for the C world, you are not on the C committee - it doesn't matter if you can't see beyond the end of your nose.
On 25/11/2025 20:25, David Brown wrote:[...]
Arbitrary sized integers are a very different kettle of fish from
large fixed-size integers, and are not something that would fit in
the C language - they need a library.
Really? I wouldn't have thought there was any appreciable difference
between the code for multiplying two 100,000-bit BitInts, and that for multiplying two abitrary-precision ints that happen to be 100,000
bits.
So, a better fit for a struct then? Here I'm curious as to what
BitInt(128) brings to the table.
That BigInt() defaults to a signed integer (twos complement?), even
for very large sizes suggests that /numeric/ applications are a
primary use.
OK, so why are you not allowed to have _BitInt(1)? That is, a 1-bit
signed integer. It might only have two values of 0 and -1; doesn't
nobody want that particular combination?
At least, I've been able to add to my collection of C types that
represent an 8-bit byte:
signed char
unsigned char
int8_t
uint8_t
_BitInt(8)
unsigned _BitInt(8)
The last two are apparently incompatible with the char versions.
bart <bc@freeuk.com> writes:
On 25/11/2025 20:25, David Brown wrote:[...]
Arbitrary sized integers are a very different kettle of fish from
large fixed-size integers, and are not something that would fit in
the C language - they need a library.
Really? I wouldn't have thought there was any appreciable difference
between the code for multiplying two 100,000-bit BitInts, and that for
multiplying two abitrary-precision ints that happen to be 100,000
bits.
It's not about the code that implements multiplication. In gcc, that's
done by calling a built-in function that can operate on arbitrary data widths.
Think about memory management.
Perhaps a future standard will provide a more flexible flavor of
_BitInt. It might allow the n in _BitInt(n) to be non-constant, or
empty, or "*", to denote an arbitrary-precision integer. But it's
hard to see how that could be done without adding other fundamental
features to the language. And a lot of people's response would be
that if you want C++, you know where to find it.
Similarly, C99 added complex types as a built-in language feature.
C++ added complex types as a template class, because C++ has language features that support that kind of thing, including user-defined
literals.
If you can think of a way to add arbitrary-precision integers to C
without other radical changes to the language, let us know.
It could also be nice to be able to write code that deals with
multiple widths of _BitInt types, as we can do for arrays even
without VLAs. But C's treatment of arrays is messy, and I'm not
sure duplicating that mess for _BitInt types would be a great idea.
And I wouldn't want to lose the ability to pass _BitInt values
to functions.
[...]
So, a better fit for a struct then? Here I'm curious as to what
BitInt(128) brings to the table.
It brings a 128-bit integer type with constants and straightforward assignment, comparison, and arithmetic operators.
[...]
That BigInt() defaults to a signed integer (twos complement?), even
for very large sizes suggests that /numeric/ applications are a
primary use.
Yes, C23 requires two's-complement for signed integers. (It mandates two's-complement representation, not wraparound behavior; signed
overflow is still UB).
At least, I've been able to add to my collection of C types that
represent an 8-bit byte:
signed char
unsigned char
int8_t
uint8_t
_BitInt(8)
unsigned _BitInt(8)
The last two are apparently incompatible with the char versions.
You forgot plain char,
int_least8_t, and uint_least8_t.
On 25/11/2025 23:20, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
On 25/11/2025 20:25, David Brown wrote:[...]
It's not about the code that implements multiplication. In gcc,Arbitrary sized integers are a very different kettle of fish from
large fixed-size integers, and are not something that would fit in
the C language - they need a library.
Really? I wouldn't have thought there was any appreciable difference
between the code for multiplying two 100,000-bit BitInts, and that for
multiplying two abitrary-precision ints that happen to be 100,000
bits.
that's done by calling a built-in function that can operate on
arbitrary data widths. Think about memory management.
Well, I was responding to a suggestion that BitInt support didn't need
a library.
But memory management is a good point. Actual, variable-sized bigints
would be awkward in C if you want to use them in ordinary expressions.
Although managing large fixed-sized types, which may also involve intermediate, transient values, can have their own problems.
I think I would have responded better to BitInt if presented as a
'bit-set', effectively a fixed-size bit-array, but passed by
value. This is something that I'd considered myself at one time.
Those would have logical operators, access to indvidual bits, but not arithmetic nor shifts, and no notion of twos complement. (In my implementation, they could also have been initialised like Pascal
bitsets.)
More significantly, an unbounded version could be passed by reference,
with an accompanying length (I could also use slices that have the
length) as happens with arrays in C.
It could also be nice to be able to write code that deals with
multiple widths of _BitInt types, as we can do for arrays even
without VLAs. But C's treatment of arrays is messy, and I'm not
sure duplicating that mess for _BitInt types would be a great idea.
And I wouldn't want to lose the ability to pass _BitInt values
to functions.
[...]
So, a better fit for a struct then? Here I'm curious as to whatIt brings a 128-bit integer type with constants and straightforward
BitInt(128) brings to the table.
assignment, comparison, and arithmetic operators.
I was commenting on the ipv6 example, where structs give you that
already, except arithmetic which makes little sense.
At least, I've been able to add to my collection of C types thatYou forgot plain char,
represent an 8-bit byte:
signed char
unsigned char
int8_t
uint8_t
_BitInt(8)
unsigned _BitInt(8)
The last two are apparently incompatible with the char versions.
I had char but took it out, as it's a outlier.
int_least8_t, and uint_least8_t.
And 'fast' versions? I still don't know what any of these mean! No
other languages seem to have bothered.
bart <bc@freeuk.com> writes:[...]
[...]OK, so why are you not allowed to have _BitInt(1)? That is, a 1-bit
signed integer. It might only have two values of 0 and -1; doesn't
nobody want that particular combination?
I don't know. The language allows 1-bit signed bit-fields, so
_BitInt(1) would make some sense, but the language requires N to
be at least 1 for unsigned _BitInt and 2 for signed _BitInt.
It doesn't bother me too much, since I'm unlikely to have a
use for signed _BitInt(1). But it's an arbitrary restriction.
On 25/11/2025 20:25, David Brown wrote:
On 24/11/2025 23:27, bart wrote:
On interesting use-case for literals was short-strings; 128 bits
allowed character literals up to 16 characters: 'ABCDEFGHIJKLMNOP'. I
think C is still stuck at one, or 4 if you're lucky.)
I have no idea or opinion on why /you/ might want 128-bit or larger
integer types. I believe there is very little use for "normal"
numbers - things you might want to write as literals, calculate with,
and read or write - that won't fit perfectly well within 64 bit types,
and would not be better served by arbitrary sized integers.
Arbitrary sized integers are a very different kettle of fish from
large fixed-size integers, and are not something that would fit in the
C language - they need a library.
Really? I wouldn't have thought there was any appreciable difference
between the code for multiplying two 100,000-bit BitInts, and that for multiplying two abitrary-precision ints that happen to be 100,000 bits.
Maybe the latter is autoranging, and might give a 200,000-bit result.
Presumably the former doesn't use inline code, so it would be surprising
if each distinct size of BitInt had dedicated sets of routines for this.
So it sounds like they have to use a generic library anyway.
And sure enough, gcc-generated code contains stuff like this:
mov r8, rcx
mov edx, 50000 # (BitInt(50000)
mov rcx, rax
call __mulbitint3
So, BitInts are different in that they /don't/ need a library?
I can tell you why /I/ might find larger integer types useful. They
include :
* 128-bit for IPv6 address. These use a variety of styles for input
and display, and thus would use specialised routines, not simple
literals or printf-style IO.
So, a better fit for a struct then? Here I'm curious as to what
BitInt(128) brings to the table.
* Big units for passing data around with larger memory transfers,
using SIMD registers. IO is irrelevant here.
Structs and arrays again spring to mind if you just want an anonymous
data block. (I wonder why it has to be bit-precise for byte-addressed memory?)
* Cryptography. IO is irrelevant here. But a variety of sizes are
useful including 56, 80, 112, 128, 168, 192, 384, 512, 521, 2048,
3072, 4096, 7680, 8096 bits. There may be more common sizes - I'm
just thinking of DES, 3DES, AES, SHA, ECC and RSA.
And I'm again curious as to what /non-numeric/ use a 200,000-bit BitInt might be put to, that is not better served by an array or struct.
Maybe bit-sets? But there are no special features for accessing
individual bits.
That BigInt() defaults to a signed integer (twos complement?), even for
very large sizes suggests that /numeric/ applications are a primary use.
Smaller sizes can be useful for holding RGB pixel values, audio data,
etc.
Except that these are probably rounded up, to the next multiple of two.
So the benefit is minimal; it do something with those padding bits.
And 821 bits. This is what I don't get. Why is THAT so important?
Why couldn't 128/256/etc have been added first, and then those funny
ones if the demand was still there?
The folks behind the proposal provided both. The fact that you can
write _BitInt(821) does not in any way hinder use of _BitInt(256). I
really don't get your problem here.
You've heard of 'code smell'? Well, this is the same, but for features.
I've been doing this stuff long enough to recognise when a feature is over-elaborate, over-specified and over-flexible. You need to know the minimum you can get away with, not the maximum!
Let me guess, some committee members have been looking too long at how
C++ does things? That language is utterly incapable of creating anything small and simple.
If the proposal had instead been simply to extend the 'u8 u16 u32
u64' set of types by a few more entries on the right, say 'u128 u256
u512', would anyone have been clamouring for types like 'u1187'? I
doubt it.
/You/ might not have wanted them, but other people would.
OK, so why are you not allowed to have _BitInt(1)? That is, a 1-bit
signed integer. It might only have two values of 0 and -1; doesn't
nobody want that particular combination?
For sub-64-bit types on conventional hardware, I simply can't see the
point, not if they are rounded up anyway. Either have a full range-
based types like Ada, or not at all.
Fortunately for the C world, you are not on the C committee - it
doesn't matter if you can't see beyond the end of your nose.
Maybe unfortunately. C used to be a fairly simple language with a lot of baggage; now it's a much heftier one with a lot of baggage!
At least, I've been able to add to my collection of C types that
represent an 8-bit byte:
signed char
unsigned char
int8_t
uint8_t
_BitInt(8)
unsigned _BitInt(8)
The last two are apparently incompatible with the char versions.
(The only ADCs I've used were 4-bit (homemade)
and 8-bit, both giving
unsigned data in parallel, used for frame-grabbing video circuits so
read directly into memory rather than via an explicit memory- or
port-read instruction.)
* I don't recollect needing to sign-extend field that does not
start
offset zero,
So what's in the rest of the 32-bit field, garbage?
Same for your other points - I don't recollect that I neeed
something like that sufficiently often to ... well... recollect.
Yours is one of a thousand possible applications. Everyone will have different needs. Maybe someone else will have a 16 or 32-bit value
with assorted bitfields of different widths.
Then maybe C bitfields could be used, but a bigger problem with those
is poor control over layout, which is anyway implementation-defined.
(Mine of course don't have that problem!)
BTW, clang has had this feature (originally called _ExtInt rather than _BitInt) since 2019. Here's the git log entry. The committer is one
of the authors of the N2021 paper, so the similarities are
unsurprising.
```
commit 61ba1481e200b5b35baa81ffcff81acb678e8508
Author: Erich Keane <erich.keane@intel.com>
Date: 2019-12-24 07:28:40 -0800
Implement _ExtInt as an extended int type specifier.
Introduction/Motivation:
LLVM-IR supports integers of non-power-of-2 bitwidth, in the iN
syntax. Integers of non-power-of-two aren't particularly interesting
or useful on most hardware, so much so that no language in Clang has
been motivated to expose it before.
However, in the case of FPGA hardware normal integer types where
the full bitwidth isn't used, is extremely wasteful and has severe
performance/space concerns. Because of this, Intel has
introduced this functionality in the High Level Synthesis compiler[0]
under the name "Arbitrary Precision Integer" (ap_int for short).
This has been extremely useful and effective for our users,
permitting them to optimize their storage and operation space on an architecture where both can be extremely expensive.
We are proposing upstreaming a more palatable version of this to
the community, in the form of this proposal and accompanying patch.
We are proposing the syntax _ExtInt(N). We intend to propose this to
the WG14 committee[1], and the underscore-capital seems like the
active direction for a WG14 paper's acceptance. An alternative that
Richard Smith suggested on the initial review was __int(N), however
we believe that is much less acceptable by WG14. We considered _Int,
however _Int is used as an identifier in libstdc++ and there is no
good way to fall back to an identifier (since _Int(5) is
indistinguishable from an unnamed initializer of a template type
named _Int).
[0]https://www.intel.com/content/www/us/en/software/programmable/quartus-prime/hls-compiler.html)
[1]http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2472.pdf
Differential Revision: https://reviews.llvm.org/D73967
```
[...]
David Brown <david.brown@hesbynett.no> writes:
[...]
But the _BitInt version is definitely neater. I can see myself
using _BitInt(12) and similar sizes for things like values read from hardware sensors of different resolutions.
(The code for all three is the same with gcc on x86 or arm64 - unfortunately, gcc does not yet support _BitInt on many targets.)[...]
Is support for _BitInt limited by target or by version?
It looks like _BitInt support was introduced in gcc 14.1.0. You might
have older versions of gcc on other platforms.
On 25/11/2025 22:58, bart wrote:
On 25/11/2025 20:25, David Brown wrote:
On 24/11/2025 23:27, bart wrote:
On interesting use-case for literals was short-strings; 128 bits
allowed character literals up to 16 characters: 'ABCDEFGHIJKLMNOP'.
I think C is still stuck at one, or 4 if you're lucky.)
I have no idea or opinion on why /you/ might want 128-bit or larger
integer types. I believe there is very little use for "normal"
numbers - things you might want to write as literals, calculate with,
and read or write - that won't fit perfectly well within 64 bit
types, and would not be better served by arbitrary sized integers.
Arbitrary sized integers are a very different kettle of fish from
large fixed-size integers, and are not something that would fit in
the C language - they need a library.
Really? I wouldn't have thought there was any appreciable difference
between the code for multiplying two 100,000-bit BitInts, and that for
multiplying two abitrary-precision ints that happen to be 100,000 bits.
You are looking at things in completely the wrong way.
Long before you start thinking of how to implement operations, think
about what the types are at a fundamental level.
A fixed-size integer is a value type of fixed, compile-time size. It is passed around as a value. Local instances can be put on a stack with compile-time fixed offsets (and thus using [sp + N] access modes in an implementation). The type has a single simple and obvious (albeit
slightly implementation-dependent) bit representation. A _BitInt(32)
will be identical at the low level to an int32_t. Bigger _BitInt types
are just the same, only bigger. There is no difference in concept, or representation, whether the type is 32-bit or 32 million bits.
An arbitrary sized integer is a dynamic type with variable size. The
base object will hold information about pointers to data, sizes for that stored data - including both how much is in use, and how much is
available. There are endless ways to make such types - you can support multiple allocation parts, or use a single contiguous allocation. You
can store the data in binary, or some kind of packed decimal, or other formats. Passing them around might mean just passing around the base object, but sometimes you need to make deep copies. Operations might
lead to heap memory allocations or deallocations.
They are so /totally/ different that any similarities in the way you do
a particular arithmetic operation are completely incidental.
Structs and arrays again spring to mind if you just want an anonymous
data block. (I wonder why it has to be bit-precise for byte-addressed
memory?)
If I have a processor that has 256-bit vector registers, then moving
data by loading and storing 256-bit blocks is going to be more efficient than doing a loop of 16 byte moves. Today, I would use uint64_t for the task, as the biggest type available. Why does it have to be bit-
precise? It must be bit-precise because I would want to move 256 bits -
not 255 bits or 257 bits.
* Cryptography. IO is irrelevant here. But a variety of sizes are
useful including 56, 80, 112, 128, 168, 192, 384, 512, 521, 2048,
3072, 4096, 7680, 8096 bits. There may be more common sizes - I'm
just thinking of DES, 3DES, AES, SHA, ECC and RSA.
And I'm again curious as to what /non-numeric/ use a 200,000-bit
BitInt might be put to, that is not better served by an array or struct.
I don't have a use for a 200,000 bit integer type at the moment. But I cannot imagine any reason why the language specifications should have arbitrary limits. Are you suggesting that the C standards show say "You
can have _BitInt's up to 8096 because someone found a use for them, but
you can't have size 8097 and above - and 200,000 is right out - because someone else can't imagine they are useful" ?
An implementation can - indeed, must - set a limit to the sizes it
supports. Implementations can have many reasons to do so. Some implementations might have quite low limits (the size of "long long int"
is the minimum allowed for conformance), but then that implementation
might not be so useful to some people.
Maybe bit-sets? But there are no special features for accessing
individual bits.
That BigInt() defaults to a signed integer (twos complement?), even
for very large sizes suggests that /numeric/ applications are a
primary use.
Obviously the C standards should have made "_BitInt" signed up to size
73 bits, and unsigned from then on. That would have been /so/ much
clearer and simpler for everyone.
Smaller sizes can be useful for holding RGB pixel values, audio data,
etc.
Except that these are probably rounded up, to the next multiple of
two. So the benefit is minimal; it do something with those padding bits.
I write C code. I want my C code to be clear and represent what I am handling, and then let the compiler do its job of generating efficient results. So if I am dealing with data that is 24-bit signed integer
data, then _BitInt(24) (especially with a typedef name) is more accurate source code than "int" or "int32_t".
You've heard of 'code smell'? Well, this is the same, but for features.
Your nose is blocked. Or to be more accurate, you are so obsessed with
the idea that your own language is "perfect" that you simply cannot
accept that other languages might have good features that your language
does not, or that other programmers might want features that your
language does not have.
I've been doing this stuff long enough to recognise when a feature is
over-elaborate, over-specified and over-flexible. You need to know the
minimum you can get away with, not the maximum!
NIH syndrome combined with megalomania. Other people do this stuff
better than you.
Let me guess, some committee members have been looking too long at how
C++ does things? That language is utterly incapable of creating
anything small and simple.
And yet C and C++ programmers outnumber programmers of Bart's own
language by millions. No language - except for yours, of course - is perfect. But it seems C and C++ are both pretty good for getting the
job done.
On 25/11/2025 23:20, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
On 25/11/2025 20:25, David Brown wrote:[...]
Arbitrary sized integers are a very different kettle of fish from
large fixed-size integers, and are not something that would fit in
the C language - they need a library.
Really? I wouldn't have thought there was any appreciable difference
between the code for multiplying two 100,000-bit BitInts, and that for
multiplying two abitrary-precision ints that happen to be 100,000
bits.
It's not about the code that implements multiplication. In gcc, that's
done by calling a built-in function that can operate on arbitrary data
widths.
Think about memory management.
Well, I was responding to a suggestion that BitInt support didn't need a library.
But memory management is a good point. Actual, variable-sized bigints
would be awkward in C if you want to use them in ordinary expressions.
Although managing large fixed-sized types, which may also involve intermediate, transient values, can have their own problems.
Perhaps a future standard will provide a more flexible flavor of
_BitInt. It might allow the n in _BitInt(n) to be non-constant, or
empty, or "*", to denote an arbitrary-precision integer. But it's
hard to see how that could be done without adding other fundamental
features to the language. And a lot of people's response would be
that if you want C++, you know where to find it.
I think I would have responded better to BitInt if presented as a
'bit-set', effectively a fixed-size bit-array, but passed by value.
This is something that I'd considered myself at one time.
Those would have logical operators, access to indvidual bits, but not arithmetic nor shifts, and no notion of twos complement. (In my implementation, they could also have been initialised like Pascal bitsets.)
More significantly, an unbounded version could be passed by reference,
with an accompanying length (I could also use slices that have the
length) as happens with arrays in C.
Similarly, C99 added complex types as a built-in language feature.
C++ added complex types as a template class, because C++ has language
features that support that kind of thing, including user-defined
literals.
If you can think of a way to add arbitrary-precision integers to C
without other radical changes to the language, let us know.
I have considered adding my actual arbitrary precision library to my
systems language. It would have been superfical (such types would not be nestable within other data structures), but would have been simpler to
use than function calls.
Some degree of automatic memory management would have been needed (initialise locals on function entry, free on exit, deal with intermediates), but not on the C++ scale due to the restrictions.
But I rejected that as being too high-level a feature, and my use-cases
more suitable for a scripting language.
It could also be nice to be able to write code that deals with
multiple widths of _BitInt types, as we can do for arrays even
without VLAs. But C's treatment of arrays is messy, and I'm not
sure duplicating that mess for _BitInt types would be a great idea.
And I wouldn't want to lose the ability to pass _BitInt values
to functions.
[...]
So, a better fit for a struct then? Here I'm curious as to what
BitInt(128) brings to the table.
It brings a 128-bit integer type with constants and straightforward
assignment, comparison, and arithmetic operators.
I was commenting on the ipv6 example, where structs give you that
already, except arithmetic which makes little sense.
[...]
That BigInt() defaults to a signed integer (twos complement?), even
for very large sizes suggests that /numeric/ applications are a
primary use.
Yes, C23 requires two's-complement for signed integers. (It mandates
two's-complement representation, not wraparound behavior; signed
overflow is still UB).
Even though it will now likely be under software control? OK.
At least, I've been able to add to my collection of C types that
represent an 8-bit byte:
signed char
unsigned char
int8_t
uint8_t
_BitInt(8)
unsigned _BitInt(8)
The last two are apparently incompatible with the char versions.
You forgot plain char,
I had char but took it out, as it's a outlier.
int_least8_t, and uint_least8_t.
And 'fast' versions? I still don't know what any of these mean! No other languages seem to have bothered.
David Brown <david.brown@hesbynett.no> writes:
[...]
But the _BitInt version is definitely neater. I can see myself using[...]
_BitInt(12) and similar sizes for things like values read from
hardware sensors of different resolutions.
(The code for all three is the same with gcc on x86 or arm64 -
unfortunately, gcc does not yet support _BitInt on many targets.)
Is support for _BitInt limited by target or by version?
It looks like _BitInt support was introduced in gcc 14.1.0. You might
have older versions of gcc on other platforms.
On Tue, 25 Nov 2025 18:33:30 +0000
bart <bc@freeuk.com> wrote:
(The only ADCs I've used were 4-bit (homemade)
Why am I not surprised? ;-)
and 8-bit, both giving
unsigned data in parallel, used for frame-grabbing video circuits so
read directly into memory rather than via an explicit memory- or
port-read instruction.)
ADC technology is improving at decent rate.
Recently we used converter with successive-approximation
architecture that delivers better SNR than most delta-sigma
converters of just few years ago. Without suffering from all
dis-advantages of delta-sigma. Almost 18 true bits at 2 MSPS.
https://www.analog.com/en/products/ad4030-24.html
I did not say that. (You really need to get a better understanding
of basic logic.) I said that arbitrary sized integers need a library
- I did not say that fixed-sized integers do not need a library.
Perhaps more clearly, arbitrary sized integers need a user-visible
library in C. They need functions to allocate, deallocate, and copy
the integers, as well as converting to and from normal integers, at a
bare minimum.
On 26/11/2025 09:12, Michael S wrote:
On Tue, 25 Nov 2025 18:33:30 +0000
bart <bc@freeuk.com> wrote:
(The only ADCs I've used were 4-bit (homemade)
Why am I not surprised? ;-)
and 8-bit, both giving
unsigned data in parallel, used for frame-grabbing video circuits
so read directly into memory rather than via an explicit memory- or
port-read instruction.)
ADC technology is improving at decent rate.
Recently we used converter with successive-approximation
architecture that delivers better SNR than most delta-sigma
converters of just few years ago. Without suffering from all
dis-advantages of delta-sigma. Almost 18 true bits at 2 MSPS.
https://www.analog.com/en/products/ad4030-24.html
That's interesting; my 4-bit circuit also worked at 2M samples per
second (128 samples every 52us), and probably would have worked much
higher if I'd had the memory to store the results.
This was in 1981.
On Tue, 25 Nov 2025 13:42:37 -0800
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
David Brown <david.brown@hesbynett.no> writes:
[...]
But the _BitInt version is definitely neater. I can see myself[...]
using _BitInt(12) and similar sizes for things like values read from
hardware sensors of different resolutions.
(The code for all three is the same with gcc on x86 or arm64 -
unfortunately, gcc does not yet support _BitInt on many targets.)
Is support for _BitInt limited by target or by version?
It looks like _BitInt support was introduced in gcc 14.1.0. You might
have older versions of gcc on other platforms.
The most recent version of arm-none-eabi-gcc in my distribution of
choice (msys2) is 13.3.0.
I am too lazy to compile arm-none-eabi-gcc from source. Would rather
wait.
I suppose, David is like me in that regard, except that he probably
uses even more conservative distribution.
On 26/11/2025 07:55, David Brown wrote:
On 25/11/2025 22:58, bart wrote:
On 25/11/2025 20:25, David Brown wrote:
On 24/11/2025 23:27, bart wrote:
On interesting use-case for literals was short-strings; 128 bits
allowed character literals up to 16 characters: 'ABCDEFGHIJKLMNOP'. >>>>> I think C is still stuck at one, or 4 if you're lucky.)
I have no idea or opinion on why /you/ might want 128-bit or larger
integer types. I believe there is very little use for "normal"
numbers - things you might want to write as literals, calculate
with, and read or write - that won't fit perfectly well within 64
bit types, and would not be better served by arbitrary sized integers.
Arbitrary sized integers are a very different kettle of fish from
large fixed-size integers, and are not something that would fit in
the C language - they need a library.
Really? I wouldn't have thought there was any appreciable difference
between the code for multiplying two 100,000-bit BitInts, and that
for multiplying two abitrary-precision ints that happen to be 100,000
bits.
You are looking at things in completely the wrong way.
Long before you start thinking of how to implement operations, think
about what the types are at a fundamental level.
A fixed-size integer is a value type of fixed, compile-time size. It
is passed around as a value. Local instances can be put on a stack
with compile-time fixed offsets (and thus using [sp + N] access modes
in an implementation). The type has a single simple and obvious
(albeit slightly implementation-dependent) bit representation. A
_BitInt(32) will be identical at the low level to an int32_t. Bigger
_BitInt types are just the same, only bigger. There is no difference
in concept, or representation, whether the type is 32-bit or 32
million bits.
An arbitrary sized integer is a dynamic type with variable size. The
base object will hold information about pointers to data, sizes for
that stored data - including both how much is in use, and how much is
available. There are endless ways to make such types - you can
support multiple allocation parts, or use a single contiguous
allocation. You can store the data in binary, or some kind of packed
decimal, or other formats. Passing them around might mean just
passing around the base object, but sometimes you need to make deep
copies. Operations might lead to heap memory allocations or
deallocations.
They are so /totally/ different that any similarities in the way you
do a particular arithmetic operation are completely incidental.
But BitInts /will/ need runtime library support?
I've acknowledged in my last post that arbitrary precision would have
memory management issues, /if/ you wanted to add them to the language in such a way that, if variables 'a b c d' had such a type, you can write:
a = b + c * d;
This is not what I had in mind; such arithmetic would use explicit
function calls with explicit management of intermediates (like GMP).
So from this point of view, fixed-size BitInts are better, but also a
higher level ability than I would have considered added to the language.
Even if BitInts were restricted to saner and smaller sizes, I'd consider actual arithmetic on 128 bits up to a few K bits and above a specialist, niche application.
But logic operations (== & | ^) on unsigned BitInts are more reasonable (because they implement some features of bit-sets).
For arithmetic on considerably larger numbers, I still think arbitrary precision is the best bet.
Structs and arrays again spring to mind if you just want an anonymous
data block. (I wonder why it has to be bit-precise for byte-addressed
memory?)
If I have a processor that has 256-bit vector registers, then moving
data by loading and storing 256-bit blocks is going to be more
efficient than doing a loop of 16 byte moves. Today, I would use
uint64_t for the task, as the biggest type available. Why does it
have to be bit- precise? It must be bit-precise because I would want
to move 256 bits - not 255 bits or 257 bits.
By bit-precise I mean being able to specify 255 and 257 bits! Memory is usually expression in bytes or words; not bits.
* Cryptography. IO is irrelevant here. But a variety of sizes are
useful including 56, 80, 112, 128, 168, 192, 384, 512, 521, 2048,
3072, 4096, 7680, 8096 bits. There may be more common sizes - I'm
just thinking of DES, 3DES, AES, SHA, ECC and RSA.
And I'm again curious as to what /non-numeric/ use a 200,000-bit
BitInt might be put to, that is not better served by an array or struct. >>>
I don't have a use for a 200,000 bit integer type at the moment. But
I cannot imagine any reason why the language specifications should
have arbitrary limits. Are you suggesting that the C standards show
say "You can have _BitInt's up to 8096 because someone found a use for
them, but you can't have size 8097 and above - and 200,000 is right
out - because someone else can't imagine they are useful" ?
And yet, integer widths have been roughly capped at double a machine
word size for decades - until 64 bits came along and then few even
bothered with double-width.
Nobody thought how easy it would be to just have an integer of whatever
size you like - you just generate whatever code is necessary to make it happen. We could have had BitInts on 32- and even 16-bit machines if
only somebody had thought of it!
An implementation can - indeed, must - set a limit to the sizes it
supports. Implementations can have many reasons to do so. Some
implementations might have quite low limits (the size of "long long
int" is the minimum allowed for conformance), but then that
implementation might not be so useful to some people.
Maybe bit-sets? But there are no special features for accessing
individual bits.
That BigInt() defaults to a signed integer (twos complement?), even
for very large sizes suggests that /numeric/ applications are a
primary use.
Obviously the C standards should have made "_BitInt" signed up to size
73 bits, and unsigned from then on. That would have been /so/ much
clearer and simpler for everyone.
Or unsigned could have been the default.
Smaller sizes can be useful for holding RGB pixel values, audio
data, etc.
Except that these are probably rounded up, to the next multiple of
two. So the benefit is minimal; it do something with those padding bits. >>>
I write C code. I want my C code to be clear and represent what I am
handling, and then let the compiler do its job of generating efficient
results. So if I am dealing with data that is 24-bit signed integer
data, then _BitInt(24) (especially with a typedef name) is more
accurate source code than "int" or "int32_t".
Suddenly everybody is dealing with signed values of 12 and 24 bits!
I actually had exactly that feature:
int*3 a # from 1980s; a 3-byte or 24-bit signed type
int:24 b # from 1990s; a 24-bit signed type
Or at least, I had the syntax. Those odd values would have been
rejected, as I didn't have support for them, or a way to emulate them
(which is what BitInt(24) appears to do).
So I got rid of the feature and ended up with int32 and then i32. (I
think Zig allows types like i24 and i123456, presumably built upon
LLVM's integer types which go up to 2**23 or 2**24 bits.)
You've heard of 'code smell'? Well, this is the same, but for features.
Your nose is blocked. Or to be more accurate, you are so obsessed
with the idea that your own language is "perfect" that you simply
cannot accept that other languages might have good features that your
language does not, or that other programmers might want features that
your language does not have.
I've been doing this stuff long enough to recognise when a feature is
over-elaborate, over-specified and over-flexible. You need to know
the minimum you can get away with, not the maximum!
NIH syndrome combined with megalomania. Other people do this stuff
better than you.
I've noticed that other languages tend to go overboard with things, and
now it's happening to C.
I made a decision to keep my systems language at a certain level
regarding such things as the type system, while having lots of
convenient micro-features:
print int@(x+y).[52..62]
This type-puns a float64 r-value expression into an int, and extracts
that bitfield (which is the unsigned exponent field when float64 uses iee754).
I'd be interested to see how you can do this better, using general
language features (adding a dedicated .exponent property to floats would
be cheating!).
Let me guess, some committee members have been looking too long at
how C++ does things? That language is utterly incapable of creating
anything small and simple.
And yet C and C++ programmers outnumber programmers of Bart's own
language by millions. No language - except for yours, of course - is
perfect. But it seems C and C++ are both pretty good for getting the
job done.
My systems language DOES have lots of very nice micro-features compared
to C. And usually they are presented in a tidy fashion. I don't think there's any argument about that. (Look at C's ugly X-macros for example.)
My language is not perfect; a big thing it's missing is Pascal-style enumeration types that are type-safe, that would detect a lot of errors.
But as a systems language, it is much more enticing than C.
(Today I need to start porting a 20Kloc application in my language, to
C; proper C not machine transpiling. I'm not looking forward to all that typing!)
On 26/11/2025 13:05, bart wrote:
On 26/11/2025 07:55, David Brown wrote:
NIH syndrome combined with megalomania. Other people do this stuff
better than you.
I made a decision to keep my systems language at a certain level
regarding such things as the type system, while having lots of
convenient micro-features:
print int@(x+y).[52..62]
This type-puns a float64 r-value expression into an int, and extracts
that bitfield (which is the unsigned exponent field when float64 uses
iee754).
I'd be interested to see how you can do this better, using general
language features (adding a dedicated .exponent property to floats
would be cheating!).
What an absurd thing to ask for.
You have a special feature in your
language for writing obscure things that are rarely if ever useful in
normal coding.
Of course you can write the same effect in C, in a
simple function a few lines long.
And that's the way it should be -
obscure things should not take up cognitive space that makes common
things harder.
But as a systems language, it is much more enticing than C.
And that is presumably why it is so much more popular than C.
On 26/11/2025 14:49, David Brown wrote:
On 26/11/2025 13:05, bart wrote:
On 26/11/2025 07:55, David Brown wrote:
NIH syndrome combined with megalomania. Other people do this stuff
better than you.
I made a decision to keep my systems language at a certain level
regarding such things as the type system, while having lots of
convenient micro-features:
print int@(x+y).[52..62]
This type-puns a float64 r-value expression into an int, and extracts
that bitfield (which is the unsigned exponent field when float64 uses
iee754).
I'd be interested to see how you can do this better, using general
language features (adding a dedicated .exponent property to floats
would be cheating!).
What an absurd thing to ask for.
You said, "Other people do this stuff better than you". Presumably,
devising language features. So I gave an example of a small task, and
asked which features those people would devise, or what solution they
would use.
You have a special feature in your language for writing obscure
things that are rarely if ever useful in normal coding.
Yes, I call them 'micro-features'.
The examples showed rvalue type-punning and bitfield extraction, which
were recent examples in this thread.
In C, the solution for my example might look like this:
double temp = x+y;
printf("%llu", ((*(uint64_t*)&temp)>>52) & 2047);
Rather more fiddly and error prone, and it needs an auxiliary statement
that makes it awkward to embed into an expression. (I also had to think twice about that format code.)
BTW here is how my C transpiler translated it, so it /can/ be done
without explicit temporaries:
mminc$m_print_u64(msysc$m_getdotslice((i64)msysc$m_tp_r64toi64((x + y)),(i64)52,(i64)62),NULL);
Of course you can write the same effect in C, in a simple function a
few lines long.
Yes, everyone can invent their own solutions. (I've just taken that a
few steps further with an entire language.)
And that's the way it should be - obscure things should not take up
cognitive space that makes common things harder.
But _BitInt(12) was also used as an example of saving a few lines of
code or having to write a function or macro (there, to sign-extend the
low-N bits of an integer value, when N is known at compile-time).
But as a systems language, it is much more enticing than C.
And that is presumably why it is so much more popular than C.
If it was generally available then I think quite a few would prefer it.
As it is I enjoy the benefits myself.
On 26/11/2025 16:44, bart wrote:
The "other people" I referred to are the folks behind the C language,
not me.
In C, the solution for my example might look like this:
double temp = x+y;
printf("%llu", ((*(uint64_t*)&temp)>>52) & 2047);
No, that's not how a C solution would work. People who know C would
know that. As a challenge for you, see if you can spot your mistake.
(And of course if anyone wanted to do this stuff in real code, they'd
wrap things in a static inline "bit_range_extract" function.)
Rather more fiddly and error prone, and it needs an auxiliary
statement that makes it awkward to embed into an expression. (I also
had to think twice about that format code.)
BTW here is how my C transpiler translated it, so it /can/ be done
without explicit temporaries:
mminc$m_print_u64(msysc$m_getdotslice((i64)msysc$m_tp_r64toi64((x
+ y)),(i64)52,(i64)62),NULL);
Avoiding explicit temporaries is not a goal to aspire to - unless you
are trying to squeeze performance from a poorly optimising compiler.
No, what was shown was how _BitInt(12) could let people write clearer C
code than C without _BitInt. There was no comparison to other languages
or other features.
But as a systems language, it is much more enticing than C.
And that is presumably why it is so much more popular than C.
If it was generally available then I think quite a few would prefer it.
Sure. Keep telling yourself that.
As it is I enjoy the benefits myself.
That I /do/ believe - and I genuinely think it is great that you enjoy it.
On 26/11/2025 16:37, David Brown wrote:
On 26/11/2025 16:44, bart wrote:
The "other people" I referred to are the folks behind the C language,
not me.
OK. The people who chose to make 'break' do two jobs, unfortunately in
parts of the language that can overlap in use; those people! (I guess
you mean the more recent lot.)
In C, the solution for my example might look like this:
double temp = x+y;
printf("%llu", ((*(uint64_t*)&temp)>>52) & 2047);
No, that's not how a C solution would work. People who know C would
know that. As a challenge for you, see if you can spot your mistake.
This was my point. (Although I can't see the problem, making it even
more pertinent.)
(And of course if anyone wanted to do this stuff in real code, they'd
wrap things in a static inline "bit_range_extract" function.)
Also my point: everyone will invent their own incompatible solutions for this fundamental stuff.
You forgot about the type-punning part, which I guess needs yet another inlined function,
Rather more fiddly and error prone, and it needs an auxiliary
statement that makes it awkward to embed into an expression. (I also
had to think twice about that format code.)
BTW here is how my C transpiler translated it, so it /can/ be done
without explicit temporaries:
mminc$m_print_u64(msysc$m_getdotslice((i64)msysc$m_tp_r64toi64((x +
y)),(i64)52,(i64)62),NULL);
Avoiding explicit temporaries is not a goal to aspire to - unless you
are trying to squeeze performance from a poorly optimising compiler.
The memory temp involved a declaration which needs to exist outside of
the expression in standard C. While type-punning in C either means
writing to a union, or using & and applying a cast.
(My type-punning works on rvalues and will work on values in registers.)
No, what was shown was how _BitInt(12) could let people write clearer
C code than C without _BitInt. There was no comparison to other
languages or other features.
But when it came my example, it could trivially be done with inline functions, just like this could.
But as a systems language, it is much more enticing than C.
And that is presumably why it is so much more popular than C.
If it was generally available then I think quite a few would prefer it.
Sure. Keep telling yourself that.
Well, it would be a minority. Grown-up languages with decent syntax
exist such as Ada and Fortran; those are not that popular. People prefer brace-based languages such as C, Java, Go, Zig, Rust.
Anything without braces isn't taken as seriously, eg. scripting languages.
As it is I enjoy the benefits myself.
That I /do/ believe - and I genuinely think it is great that you enjoy
it.
I've had several opportunities to retire my language and switch to C.
Each time, I rejected that and chose to perservere with mine, despite
the extra problems of working with a language used by only one person on
the planet.
Then, because I genuinely considered it better, and now because I enjoy working at it and with it. Using C feels like driving a model T.
On 26/11/2025 19:42, bart wrote:
On 26/11/2025 16:37, David Brown wrote:
On 26/11/2025 16:44, bart wrote:
The "other people" I referred to are the folks behind the C language,
not me.
OK. The people who chose to make 'break' do two jobs, unfortunately in
parts of the language that can overlap in use; those people! (I guess
you mean the more recent lot.)
In C, the solution for my example might look like this:
double temp = x+y;
printf("%llu", ((*(uint64_t*)&temp)>>52) & 2047);
No, that's not how a C solution would work. People who know C would
know that. As a challenge for you, see if you can spot your mistake.
This was my point. (Although I can't see the problem, making it even
more pertinent.)
So you can claim to have a "better" solution than C, without knowing how
to write it correctly in C?
(And of course if anyone wanted to do this stuff in real code, they'd
wrap things in a static inline "bit_range_extract" function.)
Also my point: everyone will invent their own incompatible solutions
for this fundamental stuff.
It is not remotely fundamental. Extracting groups of bits from the representation of a type, especially a floating point type, is a niche operation.
(It can be an important operation - such as for software
floating point routines.
But the people who write those are few, and
they know what they are doing.)
"Type punning" refers to using a union to access or reinterpret the underlying bit representation. Using references and a cast to do so is
UB,
except when using pointers to character types. Neither involves
actually putting data into memory or the stack unless you are using a compiler that can't optimise well - and then it is just a matter of less efficient generated code.
Anything without braces isn't taken as seriously, eg. scripting
languages.
What a /very/ strange way to distinguish or classify languages.
And
what a bizarre way to generalise what people think, as though all programmers share the same opinions.
On 25/11/2025 02:03, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
On 24/11/2025 14:41, David Brown wrote:
On 24/11/2025 13:31, bart wrote:
That's all up to the implementation.
You are worrying about completely negligible things here.
Is it that negligible? That's easy to say when you're not doing the
implementing! However it may impact on the size and performance of
code.
You're right, it's easy to say when I'm not doing the implementing.
Which I'm not.
The maintainers of gcc and llvm/clang have done that for me, so I don't
have to worry about it.
Are you planning to implement bit-precise integer types yourself? I
don't think you've said so in this thread. If you are, you have at
least two existing implementations you can look at for ideas.
No, apart from the usual set of 8/16/32/64 bits. I've done 128 bits, and played with 1/2/4 bits, but my view is that above this range, using
exact bit-sizes is the wrong way to go.
While for odd sizes up to 64 bits, bitfields are more apt than employing
the type system.
Here's an idea. Rather than asserting that _BitInt(1'000'000)
is silly and obviously useless, try *asking* how it's useful.
I personally don't know what I'd do with a million-bit integer,
but maybe somebody out there has a valid use for it. Meanwhile,
its existence doesn't bother me.
Again, my view is that types like _BitInt(123456) (could they have made
it any more fiddly to type?!) is the same mistake that early Pascal made with arrays.
It is common that an N-array of T and an M-array of T are not
compatible, but usually there are ways to deal generically with both.
My guess is that once you've implemented integers wider than 128
or 256 bits, million-bit integers aren't much extra effort.
I've implemented 128-bit arithmetic, and have seen some scary-looking C
code that implemented 256-bit arithmetic. Neither of those would scale
to N-bits where N can be arbitrary large /and/ might not be a multiple
of either 64 or 8.
You would need pretty much the same algorithms as used for arbitrary precision. Those usually require N to be some multiple of 'limb' size.
On 11/25/2025 5:38 AM, bart wrote:
On 25/11/2025 02:03, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
On 24/11/2025 14:41, David Brown wrote:
On 24/11/2025 13:31, bart wrote:
That's all up to the implementation.
You are worrying about completely negligible things here.
Is it that negligible? That's easy to say when you're not doing the
implementing! However it may impact on the size and performance of
code.
You're right, it's easy to say when I'm not doing the implementing.
Which I'm not.
The maintainers of gcc and llvm/clang have done that for me, so I don't
have to worry about it.
Are you planning to implement bit-precise integer types yourself? I
don't think you've said so in this thread. If you are, you have at
least two existing implementations you can look at for ideas.
No, apart from the usual set of 8/16/32/64 bits. I've done 128 bits,
and played with 1/2/4 bits, but my view is that above this range,
using exact bit-sizes is the wrong way to go.
On normal PC's, it is meh.
On FPGA's, more so the whole HLS (High Level Synthesis) thing, it is
much more significant.
Also it is a bridge that allows sensible mapping some Verilog semantics
onto C, which can in turn be made more efficient than "ye olde shifts
and masking". This is partly a case because the compiler has more
freedom to either use specific CPU features, or to implement the
constructs in ways that are more efficient but would impose too much
mental computational burden on normal programmers (such as shifts being relative to other shifts, and/or where the most efficient masking
strategy depends on the width of the type being masked, etc).
Though, granted, bolting a bunch of Verilog stuff onto C is also
nonstandard (and goes well beyond the scope of _BitInt). But, a lot of
it is stuff that wouldn't really make sense at all in C in the absence
of exact-width integers.
Though, the other parts of Verilog don't map over quite so easily...
always @(posedge clock)
...
... yeah ...
Ironically, had started looking into adding Verilog support to my
compiler (at the time hoping maybe to be able to implement something
that was less of a pain to debug on than Verilator), most I got here was
the idea that modules would be mapped onto classes and so each module
could be implemented as a class instance, with an internal run/step
method which would check variables and fire off any "always" blocks when appropriate.
The effort kinda stalled out at this stage though (and motivation
lessened when I actually found some of the bugs I had been looking for).
Some other functionality had ended up mapped onto C, some features (ironically) being useful in this C land, and others not so much.
Well, maybe some people could cheer for things like "casez()" or "__switchz()":
__switchz(val[15:0])
{
case 0bZZZZ_ZZZZ_ZZZZ_ZZZ0u16: ... matches everything with LSB clear
case 0bZZZZ_ZZZZ_ZZZZ_ZZ01u16: ... matches with LSB's as 01
case 0bZZZZ_ZZZZ_ZZZZ_Z011u16: ...
case 0b1111_ZZZZ_ZZZZ_0111u16: ... mastches 0111 and MSBs set to 1s.
}
Where, 0bZZZZ_ZZZZ_ZZZZ_Z011u16 is a C syntax analog of 16'bbZZZZ_ZZZZ_ZZZZ_Z011 (and in this case my compiler allows for either
_ or single quotes).
Though, implementing this in a way that is efficient is a harder problem (much more complicated than a normal "switch()").
Though, had I gotten this part implemented, would still have also needed:
A high performance emulator (now partly written, but, would likely need
a full JIT compiler rather than a call-threading interpreter);
A better/more usable debugger (*).
*: My existing "jx2vm" emulator mostly dumps stuff if the emulator
exits, and has an integrated GDB style debugger, this still leaves
something to be desired.
So, more likely the desired debugger would likely be built on "x3vm",
but have not yet done so.
Also compiler needs to produce more complete debuginfo. As-is, it is outputting symbol maps (in nm notation, similar to that typically used
by the Linux kernel), with line-numbers in a slightly nonstandard way,
and some small about of STABS. Maybe weak, but currently the most
reachable strategy (contrast, GCC would typically put the debug info
inside the binary, either as STABS or DWARF depending on target, ...).
The debuginfo is still very incomplete, and I am also lacking a good debugger here.
I had considered the possibility of going to a binary format for the map files to save space, but for now they are still ASCII based (well, or
the possible lazier option of internally generating the map in ASCII
format, but then dumping it in gzip format or similar ".map.gz"; would
need to decompress them when loaded, but would leave an easy option for
a user to get back to an ASCII map file as needed). Including STABS
would add considerable bulk even vs just a normal symbol listing.
I have my own reasons for not wanting to put debuginfo inside the
binaries themselves. MSVC is kinda similar, just uses ".PDB" files instead.
While for odd sizes up to 64 bits, bitfields are more apt than
employing the type system.
This is missing the point of the purpose of _BitInt...
bart <bc@freeuk.com> wrote:
And yet, integer widths have been roughly capped at double a machine
word size for decades - until 64 bits came along and then few even
bothered with double-width.
Nobody thought how easy it would be to just have an integer of whatever
size you like - you just generate whatever code is necessary to make it
happen. We could have had BitInts on 32- and even 16-bit machines if
only somebody had thought of it!
PL/I had things like 'fixed binary(23)' (that is ability to
specify bit size) around 1965, but that stopped at machine
word length. Pascal had range types, but similarly stopped
at at integer size.
GNU Pascal allowed specifiying size in
bits and going to twice machine word (that was limitation
imposed by gcc backend).
And yes, such types could be added much earlier and it
is a shame that they are added only now.
Part of reason may be that in nineties usage of other
(than C) lower level languages went down. C was
traditionally quite minimal and did not want new to
introduce new features.
On Tue, 25 Nov 2025 18:33:30 +0000...
bart <bc@freeuk.com> wrote:
Then maybe C bitfields could be used, but a bigger problem with those
is poor control over layout, which is anyway implementation-defined.
(Mine of course don't have that problem!)
According to the language of The Standard, it's not 'poor control'.
As far as standard requirements goes, there is *no* control on layout of
bit fields.
In C, the solution for my example might look like this:
double temp = x+y;
printf("%llu", ((*(uint64_t*)&temp)>>52) & 2047);
Rather more fiddly and error prone, and it needs an auxiliary statement
that makes it awkward to embed into an expression. (I also had to think twice about that format code.)
On 26/11/2025 20:43, David Brown wrote:
On 26/11/2025 19:42, bart wrote:
On 26/11/2025 16:37, David Brown wrote:
On 26/11/2025 16:44, bart wrote:
The "other people" I referred to are the folks behind the C
language, not me.
OK. The people who chose to make 'break' do two jobs, unfortunately
in parts of the language that can overlap in use; those people! (I
guess you mean the more recent lot.)
In C, the solution for my example might look like this:
double temp = x+y;
printf("%llu", ((*(uint64_t*)&temp)>>52) & 2047);
No, that's not how a C solution would work. People who know C would
know that. As a challenge for you, see if you can spot your mistake.
This was my point. (Although I can't see the problem, making it even
more pertinent.)
So you can claim to have a "better" solution than C, without knowing
how to write it correctly in C?
(And of course if anyone wanted to do this stuff in real code,
they'd wrap things in a static inline "bit_range_extract" function.)
Also my point: everyone will invent their own incompatible solutions
for this fundamental stuff.
It is not remotely fundamental. Extracting groups of bits from the
representation of a type, especially a floating point type, is a niche
operation.
A bit like that BitInt(12) example then?
This is about a lower-level systems language working with primitive
machine types, and having access to the underlying bits of those types.
How much more fundamental can you get?
C provides only basic bitwise operators, and you have to do some bit-fiddling, while trying to avoid UB, in order to extract or inject individual bits or bitfields.
I provide direct indexing ops to get or set any bit or bitfield, which
is actually a great core feature to have, but for some reason you want
to downplay it.
You might just admit for once that it is quite neat.
(It can be an important operation - such as for software floating
point routines.
That particular task can be important for lots of reasons.
But the people who write those are few, and they know what they are
doing.)
And I don't? I used to write FP emulation routines...
"Type punning" refers to using a union to access or reinterpret the
underlying bit representation. Using references and a cast to do so
is UB,
In C maybe, using your favoured compilers.
In my implementations of C,
and in my languages, it is well defined, especially as it is
type-punning a 64-bit quantity to another 64-bit quantity.
(This is a great thing about creating your own implementations: you get
to say what is UB, which will be for genuine, not artificial ones
maintained so that C compilers can be one-up on each other.
As it is, somebody using C as an intermediate language can have a
situation where something is well-defined in their source language,
known to be well-defined on their platforms of interest, but inbetween,
C says otherwise.)
Note that in the original example in my language, no references are used (the code just copies a FP register to a GPR without conversion).
except when using pointers to character types. Neither involves
actually putting data into memory or the stack unless you are using a
compiler that can't optimise well - and then it is just a matter of
less efficient generated code.
OK, so how would you do a 'reinterpret' cast in C, of a value like 'x+y'?
Anything without braces isn't taken as seriously, eg. scripting
languages.
What a /very/ strange way to distinguish or classify languages.
It's an observation. Which languages that call themselves 'systems languages' these days don't use braces?
And what a bizarre way to generalise what people think, as though
all programmers share the same opinions.
You're welcome to do your own survey.
On 26/11/2025 23:19, bart wrote:
What I don't like about your bit extraction operations is that you have
an operator syntax for a fairly obscure and rarely used operation.
"bit_range_extract" standard library function would make more sense to
me, though I think shifting and masking works well enough for the few situations where you need it. A syntax that looks very much like array access is not going to be helpful to people looking at the code - for general-purpose languages, most programmers will never see or use bit ranges.
How much more fundamental can you get?
It is not fundamental for a low-level systems language.
But the people who write those are few, and they know what they are
doing.)
And I don't? I used to write FP emulation routines...
The thing you always seem to forget, is that your languages are written
for /you/ - no one else. It doesn't make a difference whether something
is added /to/ the language or written in code /for/ the language. You
think other languages are missing critical features simply because there
is a thing that /you/ want to do that you added to your own language.
And you think other languages are overly complex or bloated because they have features that you don't want to use.
Imagine asking the regulars in this group what features or changes they would like C to have in order to make C "perfect" for their uses,
regardless of everyone else, all existing code, all existing tools. We could all fill pages with ideas. And if those were all added to C, the result would be a language that made C++ look as easy as Logo, while
being riddled with inconsistencies and contradictions.
As it is, somebody using C as an intermediate language can have a
situation where something is well-defined in their source language,
known to be well-defined on their platforms of interest, but
inbetween, C says otherwise.)
You've never really understood how languages are defined, have you? With your own languages and tools, you don't have to - there is no need for standards, specifications, or anything like that. You can just make up
what suits you at the time. The language is "defined" by what the implementation does. That's been very convenient for you, but it has
left you with serious misconceptions about how non-personal languages work.
OK, so how would you do a 'reinterpret' cast in C, of a value like 'x+y'?
As you know, you use a union. So just to please you, here is your bit extraction - written as a one-line function (split over two lines for Usenet) because you seem to think that kind of thing is important :
uint64_t get_exponent(double x) {
return ((union { double d; uint64_t u;}) { x }.u >> 52)
& ((1ull << (62 - 52 + 1)) - 1);
}
That compiles (with gcc on x86-64) to :
movq rax, xmm0
shr rax, 52
and eax, 2047
ret
There's nothing in C that suggests this must be put in memory or do
anything more than this.
bart <bc@freeuk.com> wrote:
This is about a lower-level systems language working with primitive
machine types, and having access to the underlying bits of those types.
How much more fundamental can you get?
C provides only basic bitwise operators, and you have to do some
bit-fiddling, while trying to avoid UB, in order to extract or inject
individual bits or bitfields.
I provide direct indexing ops to get or set any bit or bitfield, which
is actually a great core feature to have, but for some reason you want
to downplay it.
You might just admit for once that it is quite neat.
Yes, it is neat.
OK, so how would you do a 'reinterpret' cast in C, of a value like 'x+y'?
#include <stdint.h>
#include <string.h>
uint64_t
d_to_u(double d) {
uint64_t tmp;
memcpy(&tmp, &d, sizeof(tmp));
return tmp;
}
int
f_exp(double d) {
return (d_to_u(d)>>52)&2047;
}
Using 'gcc -O' I get the following assembly (only code, without
unimportant directives/labels):
d_to_u:
movq %xmm0, %rax
ret
f_exp:
movq %xmm0, %rax
shrq $52, %rax
andl $2047, %eax
ret
As you can see 'd_to_u' is single computational instruction,
you can not do better given that floating point registers
are distinct from integer registers. And 'f_exp' looks
optimal assuming lack of "bit extract" or "extract exponent"
instructions.
Note that you can put both functions above in a header file,
so once you have written few lines above you can use them
in all your C code. Of course, efficientcy depends on
compiler optimization.
On 27/11/2025 10:43, David Brown wrote:
On 26/11/2025 23:19, bart wrote:
What I don't like about your bit extraction operations is that you
have an operator syntax for a fairly obscure and rarely used operation.
So shift and masking operations in C are obscure?!
A
"bit_range_extract" standard library function would make more sense to
me, though I think shifting and masking works well enough for the few
situations where you need it. A syntax that looks very much like
array access is not going to be helpful to people looking at the code
- for general-purpose languages, most programmers will never see or
use bit ranges.
The syntax actually comes from DEC Algol60 IIRC. It was used to access individual characters of a string, normally an indivisible type in that language, and I applied the same concept to bits of an integer.
How much more fundamental can you get?
It is not fundamental for a low-level systems language.
So bits are not fundamental either! But then, it has taken until C23 to standardise binary literals, and there is still no format code for
binary output.
But the people who write those are few, and they know what they
are doing.)
And I don't? I used to write FP emulation routines...
The thing you always seem to forget, is that your languages are
written for /you/ - no one else. It doesn't make a difference whether
something is added /to/ the language or written in code /for/ the
language. You think other languages are missing critical features
simply because there is a thing that /you/ want to do that you added
to your own language. And you think other languages are overly complex
or bloated because they have features that you don't want to use.
They frequently have advanced features while ignoring the basics.
Imagine asking the regulars in this group what features or changes
they would like C to have in order to make C "perfect" for their uses,
regardless of everyone else, all existing code, all existing tools.
We could all fill pages with ideas. And if those were all added to C,
the result would be a language that made C++ look as easy as Logo,
while being riddled with inconsistencies and contradictions.
Yes, that's the trick. That's why a lot of features I've played with
have disappeared, while some have proved indispensable.
As it is, somebody using C as an intermediate language can have a
situation where something is well-defined in their source language,
known to be well-defined on their platforms of interest, but
inbetween, C says otherwise.)
You've never really understood how languages are defined, have you?
With your own languages and tools, you don't have to - there is no
need for standards, specifications, or anything like that. You can
just make up what suits you at the time. The language is "defined" by
what the implementation does. That's been very convenient for you,
but it has left you with serious misconceptions about how non-personal
languages work.
Here's a program in a very simple language, where all variables have
i64 type:
c = a + b
Here, the author has decreed that any overflow in this addition will
wrap (any overflow bits above 64 are lost). If directly compiled to x64
code it might use this (here 'a b c' are aliases for the registers where they reside):
mov c, a
add c, b
Or on ARM64:
add c, a, b
Now, the author decides to use intermediate C (for portability, for optimisations etc), and will generate perhaps:
int64_t a, b, c;
...
c = a + b;
But here, if a + b happens to overflow, it is UB, and for no good
reason. You have to fix it. This is where it can be harder to generate
HLL code than assembly!
*Now* do you understand? This is nothing to do with me or my personal languages, it is a problem for every language that transpiles to C,
where there is a mismatch between the sets of behaviour considered UB in each.
OK, so how would you do a 'reinterpret' cast in C, of a value like
'x+y'?
As you know, you use a union. So just to please you, here is your bit
extraction - written as a one-line function (split over two lines for
Usenet) because you seem to think that kind of thing is important :
uint64_t get_exponent(double x) {
return ((union { double d; uint64_t u;}) { x }.u >> 52)
& ((1ull << (62 - 52 + 1)) - 1);
}
That compiles (with gcc on x86-64) to :
movq rax, xmm0
shr rax, 52
and eax, 2047
ret
There's nothing in C that suggests this must be put in memory or do
anything more than this.
(This only seems to work with gcc. Clang and MSVS don't like it.)
On 27/11/2025 02:32, Waldek Hebisch wrote:
bart <bc@freeuk.com> wrote:
This is about a lower-level systems language working with primitive
machine types, and having access to the underlying bits of those types.
How much more fundamental can you get?
C provides only basic bitwise operators, and you have to do some
bit-fiddling, while trying to avoid UB, in order to extract or inject
individual bits or bitfields.
I provide direct indexing ops to get or set any bit or bitfield, which
is actually a great core feature to have, but for some reason you want
to downplay it.
You might just admit for once that it is quite neat.
Yes, it is neat.
Hmm, perhaps you're being sincere, perhaps not ...
OK, so how would you do a 'reinterpret' cast in C, of a value like#include <stdint.h>
'x+y'?
#include <string.h>
uint64_t
d_to_u(double d) {
uint64_t tmp;
memcpy(&tmp, &d, sizeof(tmp));
return tmp;
}
int
f_exp(double d) {
return (d_to_u(d)>>52)&2047;
}
Using 'gcc -O' I get the following assembly (only code, without
unimportant directives/labels):
d_to_u:
movq %xmm0, %rax
ret
f_exp:
movq %xmm0, %rax
shrq $52, %rax
andl $2047, %eax
ret
As you can see 'd_to_u' is single computational instruction,
you can not do better given that floating point registers
are distinct from integer registers. And 'f_exp' looks
optimal assuming lack of "bit extract" or "extract exponent"
instructions.
Note that you can put both functions above in a header file,
so once you have written few lines above you can use them
in all your C code. Of course, efficientcy depends on
compiler optimization.
Yes (that's something I can't rely on).
These examples are interesting: with a HLL you normally express yourself
in a clear manner, and it is the compiler's job to generate the
complicated code required to implement what you mean.
Here it seems to be other way around: it is the programmer who writes
the convoluted code, and the compiler turns that into short, clear instructions! Which unfortunately no one will see.
If I use your functions like this:
a = f_exp(x + y);
then once the x+y result is in a register, gcc-O2 generates this inline
code for the extraction:
movq rax, xmm0
shr rax, 52
and eax, 2047
If I express it in my language:
a := int@(x + y).[52..62]
then my non-optimising compiler generates this (D0 is rax):
movq D0, XMM4
shr D0, 52
and D0, 2047
So such features have definite advantages, in being able to express
intent directly, and to make it easier for a simple compiler to know
that intent and help it generate reasonable code without lots of
analysis or needing function inlining.
BTW, your example explicitly writes to memory; David Brown posted a
version that didn't do so that I could see. Unless a compound literal is designed to be built in memory? However that version only seemed to work with one compiler.
On 27/11/2025 13:20, bart wrote:
On 27/11/2025 10:43, David Brown wrote:
On 26/11/2025 23:19, bart wrote:
What I don't like about your bit extraction operations is that you
have an operator syntax for a fairly obscure and rarely used
operation.
So shift and masking operations in C are obscure?!
Both shift operators and bitwise operators have lots of other uses.
When you are designing a programming language, you first provide
general features that can be used for multiple purposes. You only
implement specialised features if the need arises - it is too
cumbersome, or error-prone, or inefficient, or laborious to use the
general features.
In some areas of C usage, shifts and masks - and bitfield extraction
- turn up quite a bit. But it seems the C operators work fine for
the task. It would not exactly be difficult to add a standard "bit_range_extract" function to the C standard library, yet no one
has felt it to be worth the effort over the last 50 years. Perhaps
it is not as essential or fundamental as you think? Or perhaps C's
current features do the job well enough that there's no need for
anything else?
);? A
"bit_range_extract" standard library function would make more
sense to me, though I think shifting and masking works well enough
for the few situations where you need it.? A syntax that looks
very much like array access is not going to be helpful to people
looking at the code
- for general-purpose languages, most programmers will never see
or use bit ranges.
The syntax actually comes from DEC Algol60 IIRC. It was used to
access individual characters of a string, normally an indivisible
type in that language, and I applied the same concept to bits of an integer.
I don't care if you found the syntax on the back of a cornflakes
packet. The origin is not relevant.
How much more fundamental can you get?
It is not fundamental for a low-level systems language.
So bits are not fundamental either! But then, it has taken until
C23 to standardise binary literals, and there is still no format
code for binary output.
Very few programmers are at all interested in bits. A "double" holds
a floating point value, not a pattern of bits. You are thinking on a
level of abstraction that is not realistic for most programming tasks.
? But the people who write those are few, and they know what
they are doing.)
And I don't? I used to write FP emulation routines...
The thing you always seem to forget, is that your languages are
written for /you/ - no one else.? It doesn't make a difference
whether something is added /to/ the language or written in code
/for/ the language.? You think other languages are missing
critical features simply because there is a thing that /you/ want
to do that you added to your own language. And you think other
languages are overly complex or bloated because they have features
that you don't want to use.
They frequently have advanced features while ignoring the basics.
No - they frequently have features that /you/ call "advanced" because
you don't need or want them, and they ignore things that /you/ call
"basics" because you /do/ need or want them. It's all about /you/.
Imagine asking the regulars in this group what features or changes
they would like C to have in order to make C "perfect" for their
uses, regardless of everyone else, all existing code, all existing
tools. We could all fill pages with ideas.? And if those were all
added to C, the result would be a language that made C++ look as
easy as Logo, while being riddled with inconsistencies and
contradictions.
Yes, that's the trick. That's why a lot of features I've played
with have disappeared, while some have proved indispensable.
As it is, somebody using C as an intermediate language can have a
situation where something is well-defined in their source
language, known to be well-defined on their platforms of
interest, but inbetween, C says otherwise.)
You've never really understood how languages are defined, have
you? With your own languages and tools, you don't have to - there
is no need for standards, specifications, or anything like that.
You can just make up what suits you at the time.? The language is
"defined" by what the implementation does.? That's been very
convenient for you, but it has left you with serious
misconceptions about how non-personal languages work.
Here's a? program in a very simple language, where all variables
have i64 type:
? c = a + b
Here, the author has decreed that any overflow in this addition
will wrap (any overflow bits above 64 are lost). If directly
compiled to x64 code it might use this (here 'a b c' are aliases
for the registers where they reside):
??? mov c, a
??? add c, b
Or on ARM64:
??? add c, a, b
Now, the author decides to use intermediate C (for portability, for optimisations etc), and will generate perhaps:
??? int64_t a, b, c;
??? ...
??? c = a + b;
But here, if a + b happens to overflow, it is UB, and for no good
reason. You have to fix it. This is where it can be harder to
generate HLL code than assembly!
You are talking nonsense.
Either a + b results in the correct answer, or it does not. Any sane
person reads that as "a plus b" - mathematically adding two integers
to get their sum. That's what the programmer wants, and that's what
they ask for. And any sane programmer expects the language to give
the correct result within its limitations, but doe not expect it to
do magic. Expecting to form a sum that is greater than 2 ^ 63 and
somehow produce the "correct" result is a total misunderstanding of mathematics and programming - any primary school kid will tell you
that using the fingers of one hand, you can't add 3 and 4. They will
/not/ tell you that it's fine to add them on one hand because 3 + 4
is actually equal to 2.
*Now* do you understand? This is nothing to do with me or my
personal languages, it is a problem for every language that
transpiles to C, where there is a mismatch between the sets of
behaviour considered UB in each.
I understand that simple maths and common sense is beyond you. I
understand that you think mathematics should be defined in terms of accidental byproducts of the way hardware logic designs happen to be implemented.
OK, so how would you do a 'reinterpret' cast in C, of a value
like 'x+y'?
As you know, you use a union.? So just to please you, here is your
bit extraction - written as a one-line function (split over two
lines for Usenet) because you seem to think that kind of thing is
important :
uint64_t get_exponent(double x) {
???? return ((union { double d; uint64_t u;}) { x }.u >> 52)
????????????? & ((1ull << (62 - 52 + 1)) - 1
}
That compiles (with gcc on x86-64) to :
?????movq rax, xmm0
?????shr rax, 52
?????and eax, 2047
?????ret
There's nothing in C that suggests this must be put in memory or
do anything more than this.
(This only seems to work with gcc. Clang and MSVS don't like it.)
I think you are mistaken. clang is fine with it. It is standard
C99, so any decent C compiler from the last 25 years will handle it
fine. MS gave up on bothering to make C compilers before the turn of
the century (they make a reasonable enough C++ compiler). Even your
hero tcc is fine with it (though on my attempts, it produces rubbish
code - maybe it needs different flags for optimisation). The C code
is not made invalid by the existence of C90-only compilers.
On 27/11/2025 13:20, bart wrote:
In some areas of C usage, shifts and masks - and bitfield extraction -
turn up quite a bit. But it seems the C operators work fine for the
task. It would not exactly be difficult to add a standard "bit_range_extract" function to the C standard library, yet no one has
felt it to be worth the effort over the last 50 years.
The syntax actually comes from DEC Algol60 IIRC. It was used to access
individual characters of a string, normally an indivisible type in
that language, and I applied the same concept to bits of an integer.
I don't care if you found the syntax on the back of a cornflakes packet.
The origin is not relevant.
How much more fundamental can you get?
It is not fundamental for a low-level systems language.
So bits are not fundamental either! But then, it has taken until C23
to standardise binary literals, and there is still no format code for
binary output.
Very few programmers are at all interested in bits.
floating point value, not a pattern of bits. You are thinking on a
level of abstraction that is not realistic for most programming tasks.
They frequently have advanced features while ignoring the basics.
No - they frequently have features that /you/ call "advanced" because
you don't need or want them, and they ignore things that /you/ call
"basics" because you /do/ need or want them. It's all about /you/.
You are talking nonsense.
I understand that simple maths and common sense is beyond you.
uint64_t get_exponent(double x) {
return ((union { double d; uint64_t u;}) { x }.u >> 52)
& ((1ull << (62 - 52 + 1)) - 1);
}
That compiles (with gcc on x86-64) to :
movq rax, xmm0
shr rax, 52
and eax, 2047
ret
There's nothing in C that suggests this must be put in memory or do
anything more than this.
(This only seems to work with gcc. Clang and MSVS don't like it.)
I think you are mistaken. clang is fine with it. It is standard C99,
so any decent C compiler from the last 25 years will handle it fine. MS gave up on bothering to make C compilers before the turn of the century (they make a reasonable enough C++ compiler). Even your hero tcc is
fine with it (though on my attempts, it produces rubbish code - maybe it needs different flags for optimisation). The C code is not made invalid
by the existence of C90-only compilers.
Well, let's stick with C. Here are some features I use, and the C equivalents (A has whatever type is needed):
M C
-------------------------------------------------------------
[snip]
A.odd A & 1, or A % 1
On 2025-11-27, bart <bc@freeuk.com> wrote:
Well, let's stick with C. Here are some features I use, and the C
equivalents (A has whatever type is needed):
M C
-------------------------------------------------------------
[snip]
A.odd A & 1, or A % 1
"A % 1" ?
On Thu, 27 Nov 2025 14:02:38 +0100
David Brown <david.brown@hesbynett.no> wrote:
MSVC compilers compile your code and produce correct result, but the
code
looks less nice:
0000000000000000 <get_exponent>:
0: f2 0f 11 44 24 08 movsd %xmm0,0x8(%rsp)
6: 48 8b 44 24 08 mov 0x8(%rsp),%rax
b: 48 c1 e8 34 shr $0x34,%rax
f: 25 ff 07 00 00 and $0x7ff,%eax
14: c3 ret
Although on old AMD processors it is likely faster than nicer code
generated by gcc and clang. On newer processor gcc code is likely a bit better, but the difference is unlikely to be detected by simple
measurements.
Also MSVC compiler does not like your style and produces following
warning:
dave_b.c(5): warning C4116: unnamed type definition in parentheses
BTW, I don't like your style either. My preferred code will look
very similar to the code of Waldek Hebisch except that I'd declare
d_to_u() static.
I don't like union trick. Not just in this particular context, but
generally. memcpy() much cleaner in expressing programmer's intentions.
On 27/11/2025 15:02, Michael S wrote:
On Thu, 27 Nov 2025 14:02:38 +0100
David Brown <david.brown@hesbynett.no> wrote:
MSVC compilers compile your code and produce correct result, but the
code
looks less nice:
0000000000000000 <get_exponent>:
0: f2 0f 11 44 24 08 movsd %xmm0,0x8(%rsp)
6: 48 8b 44 24 08 mov 0x8(%rsp),%rax
b: 48 c1 e8 34 shr $0x34,%rax
f: 25 ff 07 00 00 and $0x7ff,%eax
14: c3 ret
Although on old AMD processors it is likely faster than nicer code generated by gcc and clang. On newer processor gcc code is likely a
bit better, but the difference is unlikely to be detected by simple measurements.
I think it is unlikely that this version - moving from xmm0 to rax
via memory instead of directly - is faster on any processor. But I
fully agree that it is unlikely to be a measurable difference in
practice.
On 27/11/2025 10:43, David Brown wrote:[...]
uint64_t get_exponent(double x) {
return ((union { double d; uint64_t u;}) { x }.u >> 52)
& ((1ull << (62 - 52 + 1)) - 1);
}
That compiles (with gcc on x86-64) to :
movq rax, xmm0
shr rax, 52
and eax, 2047
ret
There's nothing in C that suggests this must be put in memory or do
anything more than this.
(This only seems to work with gcc. Clang and MSVS don't like it.)
bart <bc@freeuk.com> writes:
On 27/11/2025 10:43, David Brown wrote:[...]
uint64_t get_exponent(double x) {
return ((union { double d; uint64_t u;}) { x }.u >> 52)
& ((1ull << (62 - 52 + 1)) - 1);
}
That compiles (with gcc on x86-64) to :
movq rax, xmm0
shr rax, 52
and eax, 2047
ret
There's nothing in C that suggests this must be put in memory or do
anything more than this.
(This only seems to work with gcc. Clang and MSVS don't like it.)
How exactly did clang and msvs express their dislike? What versions are
you using?
On my systems, it works correctly with gcc 13.3.0, clang 18.1.3,
tcc 0.9.27, Microsoft Visual Studio 2022 17.14.20.
If your problem is that you're using older compilers that don't support compound literals, it would have saved some time if you had said so.
On 27/11/2025 23:59, Keith Thompson wrote:[...]
bart <bc@freeuk.com> writes:
On 27/11/2025 10:43, David Brown wrote:[...]
uint64_t get_exponent(double x) {
return ((union { double d; uint64_t u;}) { x }.u >> 52)
& ((1ull << (62 - 52 + 1)) - 1);
}
How exactly did clang and msvs express their dislike? What versions
are
you using?
On my systems, it works correctly with gcc 13.3.0, clang 18.1.3,
tcc 0.9.27, Microsoft Visual Studio 2022 17.14.20.
If your problem is that you're using older compilers that don't
support
compound literals, it would have saved some time if you had said so.
I said in a followup that I'd been using a C++ compiler by mistake
(this was on Godbolt).
That gcc's C++ compiler accepted the code wasn't helpful.
bart <bc@freeuk.com> writes:
On 27/11/2025 23:59, Keith Thompson wrote:[...]
bart <bc@freeuk.com> writes:
On 27/11/2025 10:43, David Brown wrote:[...]
uint64_t get_exponent(double x) {
return ((union { double d; uint64_t u;}) { x }.u >> 52)
& ((1ull << (62 - 52 + 1)) - 1);
}
How exactly did clang and msvs express their dislike? What versions
are
you using?
On my systems, it works correctly with gcc 13.3.0, clang 18.1.3,
tcc 0.9.27, Microsoft Visual Studio 2022 17.14.20.
If your problem is that you're using older compilers that don't
support
compound literals, it would have saved some time if you had said so.
Can you *please* do something about the way your newsreader
(apparently Mozilla Thunderbird) mangles quoted text? That first
quoted line, starting with "> How exactly", would have been just
74 columns, but your newsreader folded it, making it more difficult
to read. It also deletes blank lines between paragraphs.
I don't recall similar problems from other Thunderbird users.
On 27/11/2025 17:38, Ike Naar wrote:
On 2025-11-27, bart <bc@freeuk.com> wrote:
Well, let's stick with C. Here are some features I use, and the C
equivalents (A has whatever type is needed):
M C
-------------------------------------------------------------
[snip]
A.odd A & 1, or A % 1
"A % 1" ?
I guess A % 2 then.
Note my remark about error proneness later on.
On 28/11/2025 00:39, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
On 27/11/2025 23:59, Keith Thompson wrote:[...]
bart <bc@freeuk.com> writes:
On 27/11/2025 10:43, David Brown wrote:[...]
uint64_t get_exponent(double x) {
return ((union { double d; uint64_t u;}) { x }.u >> 52)
& ((1ull << (62 - 52 + 1)) - 1);
}
Can you *please* do something about the way your newsreaderHow exactly did clang and msvs express their dislike? What versions
are
you using?
On my systems, it works correctly with gcc 13.3.0, clang 18.1.3,
tcc 0.9.27, Microsoft Visual Studio 2022 17.14.20.
If your problem is that you're using older compilers that don't
support
compound literals, it would have saved some time if you had said so.
(apparently Mozilla Thunderbird) mangles quoted text? That first
quoted line, starting with "> How exactly", would have been just
74 columns, but your newsreader folded it, making it more difficult
to read. It also deletes blank lines between paragraphs.
I don't recall similar problems from other Thunderbird users.
I don't see anything amiss with quoted content in my own posts. My
last post looks like this to me:
https://github.com/sal55/langs/blob/master/tbird.png
In any case, I've no idea how to fix the problem, assuming it is at my end.
On Thu, 27 Nov 2025 21:15:53 +0100
David Brown <david.brown@hesbynett.no> wrote:
On 27/11/2025 15:02, Michael S wrote:
On Thu, 27 Nov 2025 14:02:38 +0100
David Brown <david.brown@hesbynett.no> wrote:
MSVC compilers compile your code and produce correct result, but the
code
looks less nice:
0000000000000000 <get_exponent>:
0: f2 0f 11 44 24 08 movsd %xmm0,0x8(%rsp)
6: 48 8b 44 24 08 mov 0x8(%rsp),%rax
b: 48 c1 e8 34 shr $0x34,%rax
f: 25 ff 07 00 00 and $0x7ff,%eax
14: c3 ret
Although on old AMD processors it is likely faster than nicer code
generated by gcc and clang. On newer processor gcc code is likely a
bit better, but the difference is unlikely to be detected by simple
measurements.
I think it is unlikely that this version - moving from xmm0 to rax
via memory instead of directly - is faster on any processor. But I
fully agree that it is unlikely to be a measurable difference in
practice.
I wonder, how do you have a nerve "to think" about things that you have absolutely no idea about?
Instead of "thinking" you could just as well open Optimization
Reference manuals of AMD Bulldozer family or of Bobcat. Or to read
Agner Fog's instruction tables. Move from XMM to GPR on these
processors is very slow: 8 clocks on BD, 7 on BbC.
BTW, AMD K8 has the opposite problem. Move from XMM to GPR is reasonably fast, but move from GPR to XMM is painfully slow.
On the other hand, moves "via memory" are reasonably fast on these
CPUs (except, may be, Bobcat? I am not sure about it), because data
does not really travels through memory or through cache. Load-store forwarding picks the data directly from the store queue.
On 27/11/2025 13:02, David Brown wrote:
On 27/11/2025 13:20, bart wrote:
(This only seems to work with gcc. Clang and MSVS don't like it.)
I think you are mistaken. clang is fine with it. It is standard C99,
so any decent C compiler from the last 25 years will handle it fine.
MS gave up on bothering to make C compilers before the turn of the
century (they make a reasonable enough C++ compiler). Even your hero
tcc is fine with it (though on my attempts, it produces rubbish code -
maybe it needs different flags for optimisation). The C code is not
made invalid by the existence of C90-only compilers.
I was mistaken. I used godbolt.org but it was set to C++. Presumably gcc
has some C++ extensions that make it valid.
On 27/11/2025 23:15, Michael S wrote:
On Thu, 27 Nov 2025 21:15:53 +0100
David Brown <david.brown@hesbynett.no> wrote:
On 27/11/2025 15:02, Michael S wrote:
On Thu, 27 Nov 2025 14:02:38 +0100
David Brown <david.brown@hesbynett.no> wrote:
MSVC compilers compile your code and produce correct result, but
the code
looks less nice:
0000000000000000 <get_exponent>:
0: f2 0f 11 44 24 08 movsd %xmm0,0x8(%rsp)
6: 48 8b 44 24 08 mov 0x8(%rsp),%rax
b: 48 c1 e8 34 shr $0x34,%rax
f: 25 ff 07 00 00 and $0x7ff,%eax
14: c3 ret
Although on old AMD processors it is likely faster than nicer code
generated by gcc and clang. On newer processor gcc code is likely
a bit better, but the difference is unlikely to be detected by
simple measurements.
I think it is unlikely that this version - moving from xmm0 to rax
via memory instead of directly - is faster on any processor. But I
fully agree that it is unlikely to be a measurable difference in
practice.
I wonder, how do you have a nerve "to think" about things that you
have absolutely no idea about?
I think about many things - and these are things I /do/ know about.
But I don't know all the details, and am happy to be corrected and
learn more.
Instead of "thinking" you could just as well open Optimization
Reference manuals of AMD Bulldozer family or of Bobcat. Or to read
Agner Fog's instruction tables. Move from XMM to GPR on these
processors is very slow: 8 clocks on BD, 7 on BbC.
Okay. But storing data to memory from xmm0 is also going to be slow,
and loading it to rax from memory is going to be slow. I am not an
expert at the x86 world or reading Fog's tables, but it looks to me
that on a Bulldozer, storing from xmm0 to memory has a latency of 6
cycles and reading the memory into rax has a latency of 4 cycles.
That adds up to more than the 8 cycles for the direct register
transfer, and I expect (but do not claim to know for sure!) that the dependency limits the scope for pipeline overlap - decode and address calculations can be done, but the data can't be fetched until the
previous store is complete.
So all in all, my estimate was, I think, quite reasonable. There may
be unusual circumstances on particular cores if the instruction
scheduling and pipelining, combined with the stack engine, make that
sequence faster than the single register move.
I've now had a short look at the relevant table from Fog's site. My conclusion from that is that the register move - though surprisingly
slow - is probably marginally faster than passing it through memory.
Perhaps if I spend enough time studying the details, I might find out
more and discover that I was wrong. But that would be an
extraordinary effort to learn about a meaningless little detail of a long-gone processor.
I am also fairly confident that the function as a whole will be
faster with the register move since you will get better overlap and superscaling with the call and return sequence when the instructions
in the middle don't access the stack.
Of curiosity, I compiled the code with gcc and "-march=bdver1", which
I believe is the correct flag for that processor. It generated the
register move version, but with a "vmovq" instruction instead of
"movq". I don't know if there is any difference there - x86
instruction naming seems to have a certain degree of variance.
(gcc's models of scheduling, pipelining and timing for processors is
far from perfect, but the gcc folks do study Agner Fog's publications
as well as having contributors from AMD and Intel.)
More interesting, however, was that with "-march=bdver2" (up to
bdver4) gcc changed the "shr / and" sequence to a single "bextr"
instruction. I didn't see that on other -march choices. It seems
the two instruction shift-and-mask is faster than a single bit
extract instruction on most x86 processors.
All in all, it is a lesson on how small details of architectures can
make a difference.
BTW, AMD K8 has the opposite problem. Move from XMM to GPR is
reasonably fast, but move from GPR to XMM is painfully slow.
On the other hand, moves "via memory" are reasonably fast on these
CPUs (except, may be, Bobcat? I am not sure about it), because data
does not really travels through memory or through cache. Load-store forwarding picks the data directly from the store queue.
Yes, and there can be even more specialised short-cuts for stack data.
On Fri, 28 Nov 2025 09:46:56 +0100
David Brown <david.brown@hesbynett.no> wrote:
On 27/11/2025 23:15, Michael S wrote:
On Thu, 27 Nov 2025 21:15:53 +0100
David Brown <david.brown@hesbynett.no> wrote:
On 27/11/2025 15:02, Michael S wrote:
On Thu, 27 Nov 2025 14:02:38 +0100
David Brown <david.brown@hesbynett.no> wrote:
MSVC compilers compile your code and produce correct result, but
the code
looks less nice:
0000000000000000 <get_exponent>:
0: f2 0f 11 44 24 08 movsd %xmm0,0x8(%rsp)
6: 48 8b 44 24 08 mov 0x8(%rsp),%rax
b: 48 c1 e8 34 shr $0x34,%rax
f: 25 ff 07 00 00 and $0x7ff,%eax
14: c3 ret
Although on old AMD processors it is likely faster than nicer code
generated by gcc and clang. On newer processor gcc code is likely
a bit better, but the difference is unlikely to be detected by
simple measurements.
I think it is unlikely that this version - moving from xmm0 to rax
via memory instead of directly - is faster on any processor. But I
fully agree that it is unlikely to be a measurable difference in
practice.
I wonder, how do you have a nerve "to think" about things that you
have absolutely no idea about?
I think about many things - and these are things I /do/ know about.
But I don't know all the details, and am happy to be corrected and
learn more.
Instead of "thinking" you could just as well open Optimization
Reference manuals of AMD Bulldozer family or of Bobcat. Or to read
Agner Fog's instruction tables. Move from XMM to GPR on these
processors is very slow: 8 clocks on BD, 7 on BbC.
Okay. But storing data to memory from xmm0 is also going to be slow,
and loading it to rax from memory is going to be slow. I am not an
expert at the x86 world or reading Fog's tables, but it looks to me
that on a Bulldozer, storing from xmm0 to memory has a latency of 6
cycles and reading the memory into rax has a latency of 4 cycles.
That adds up to more than the 8 cycles for the direct register
transfer, and I expect (but do not claim to know for sure!) that the
dependency limits the scope for pipeline overlap - decode and address
calculations can be done, but the data can't be fetched until the
previous store is complete.
So all in all, my estimate was, I think, quite reasonable. There may
be unusual circumstances on particular cores if the instruction
scheduling and pipelining, combined with the stack engine, make that
sequence faster than the single register move.
It seems, you are correct in this particular case.
Latency tables, esp. those that are measured by software rather
than supplied by designer, are problematic in case of moves between
registers of different types, memory stores of all types and even
memory loads, with exception of memory load into GPR. Agner explains why
they are problematic in te preface to his tables. In short, there is no direct way to measure this things in isolation, so one has to measure
latency of the sequence of instructions and then to apply either
guesswork or manufacturer's docs to somehow divide the combined
latency into individual parts.
So, the best way is to go by recommendations of the vendor in Opt.
Reference Manual.
There are no relevant recommendations for K8, unfortunately. I suspect
that all methods are slow here.
For Bobcat, there should be recommendations, but I don't have them and
too lazy to look for.
For Family 10h (Barcelona and derivatives):
"When moving data from a GPR to an MMX or XMM register, use separate
store and load instructions to move the data first from the source
register to a temporary location in memory and then from memory into
the destination register, taking the memory latency into account when scheduling both stages of the load-store sequence.
When moving data from an MMX or XMM register to a general-purpose
register, use the MOVD instruction.
Whenever possible, use loads and stores of the same data length. (See
5.3, ?Store-to-Load Forwarding Restrictions? on page 74 for more information.)"
For Family 15h (Bullozer and derivatives):
"When moving data from a GPR to an XMM register, use separate store and
load instructions to move the data first from the source register to a temporary location in memory and then from memory into the destination register, taking the memory latency into account when scheduling both
stages of the load-store sequence.
When moving data from an XMM register to a general-purpose register,
use the VMOVD instruction.
Whenever possible, use loads and stores of the same data length. (See
6.3, ?Store-to-Load Forwarding Restrictions? on page 98 for more information.)"
So, for both families, vendor recommends register move in direction from
SIMD to GPR and Store/Load sequence in direction from GPR to SIMD.
The suspect point here is specific mentioning of EVEX-encoded form
(VMOVD) in case of BD. It can mean that "legacy" (SSE-encoded) form is
slower or it can mean nothing. I suspect the latter.
I've now had a short look at the relevant table from Fog's site. My
conclusion from that is that the register move - though surprisingly
slow - is probably marginally faster than passing it through memory.
Perhaps if I spend enough time studying the details, I might find out
more and discover that I was wrong. But that would be an
extraordinary effort to learn about a meaningless little detail of a
long-gone processor.
I am also fairly confident that the function as a whole will be
faster with the register move since you will get better overlap and
superscaling with the call and return sequence when the instructions
in the middle don't access the stack.
Of curiosity, I compiled the code with gcc and "-march=bdver1", which
I believe is the correct flag for that processor. It generated the
register move version, but with a "vmovq" instruction instead of
"movq". I don't know if there is any difference there - x86
instruction naming seems to have a certain degree of variance.
(gcc's models of scheduling, pipelining and timing for processors is
far from perfect, but the gcc folks do study Agner Fog's publications
as well as having contributors from AMD and Intel.)
More interesting, however, was that with "-march=bdver2" (up to
bdver4) gcc changed the "shr / and" sequence to a single "bextr"
instruction. I didn't see that on other -march choices. It seems
the two instruction shift-and-mask is faster than a single bit
extract instruction on most x86 processors.
All in all, it is a lesson on how small details of architectures can
make a difference.
Zen3 has its own can of worms in the area of moving data between
GPR and SIMD. The issues here are more subtle than those mentioned
above. And unfortunately almost completely non-documented in the
manuals. And despite that issues are subtle, performance impact can be
very significant.
I encountered these things when implementing alternative
(to those currently in use by gcc) IEEE binary128 arithmetic routines.
My conclusion was that designers of binary128 ABI in general and of ABI
of support routines in particular made a serious mistake by treating binary128 (a.k.a. __float128, a.k.a _Float128, a.k.a. 'long double' on
ARM64) as "floating-point" type that is passed around in XMM registers
(or Neon registers on ARM64). Both passing it in pair of GPRs and via
memory would be significantly faster on AMD processors and detectably
faster on Intel processors.
BTW, AMD K8 has the opposite problem. Move from XMM to GPR is
reasonably fast, but move from GPR to XMM is painfully slow.
On the other hand, moves "via memory" are reasonably fast on these
CPUs (except, may be, Bobcat? I am not sure about it), because data
does not really travels through memory or through cache. Load-store
forwarding picks the data directly from the store queue.
Yes, and there can be even more specialised short-cuts for stack data.
On 11/27/25 18:59, bart wrote:
On 27/11/2025 17:38, Ike Naar wrote:
On 2025-11-27, bart <bc@freeuk.com> wrote:
Well, let's stick with C. Here are some features I use, and the C
equivalents (A has whatever type is needed):
M C
-------------------------------------------------------------
[snip]
A.odd A & 1, or A % 1
"A % 1" ?
I guess A % 2 then.
You guess? - LOL - okay. :-)
Note my remark about error proneness later on.
Higher level abstractions (usually found in higher level languages)
are always less error prone than low-level (or composed) constructs.
"C" is inherently and by design a comparably low-level language, so
I wonder what you complain here about. (You won't change that.)
'even' and 'odd' are higher level abstractions than bit-operations,
and they are also _special cases_ (nonetheless useful; I like them,
and I appreciate if they are present in any language). The general
case of the terms like "odd" and "even" is defined mathematically,
though;
so the natural way of describing them would (IMO) rather be
based on 'x mod 2 = 1' and 'x mod 2 = 0' respectively. (So the "C"
syntax with '%' is probably more "appropriate". Mileages may vary.)
You can of course add as many commodity features to "your language"
as you like. I seem to recall that one of the design principles of
"C" was to not add too many keywords. (Not sure whether 'A.odd' is
a function or keyword above [in "your language"].)
You made a mistake above (or just a typo), never mind. I suppose it
stems from your primary "thinking in bits". - This is not meant to
be offensive. - Back in university days (I still remember!) I made
a similar typo but vice versa; I wanted to express "div 2" in some
assembler language and accidentally wrote "shift-right 2", the same
type of typo but the other way round. I *knew*, and didn't "guess",
though, that "shift-right 1" would have been correct. ;-)
PS: BTW, I was always wondering why Pascal and Algol 68 supported
'odd' but not 'even'! - In the documents of the Genie compiler we
can read: "This is a relic of times long past.", but beyond that
it doesn't explain why it's a "relic". I can only guess that it's,
as a special case, considered just unnecessary in the presence of
the modulus operator.
I can believe that. If you have to implement floating point routines
in general integer hardware (and I expect that is the case for most
of your implementation here) then I would think it is better to start
and end with the data in GPR's. On some targets, moving data into
and out of floating point or vector registers is efficient enough
that those registers can effectively be used as caches, but it sounds
like that is not the case here.
On 28/11/2025 02:33, Janis Papanagnou wrote:
On 11/27/25 18:59, bart wrote:
On 27/11/2025 17:38, Ike Naar wrote:
On 2025-11-27, bart <bc@freeuk.com> wrote:
Well, let's stick with C. Here are some features I use, and the C
equivalents (A has whatever type is needed):
M C
-------------------------------------------------------------
[snip]
A.odd A & 1, or A % 1
"A % 1" ?
I guess A % 2 then.
You guess? - LOL - okay. :-)
Note my remark about error proneness later on.
Higher level abstractions (usually found in higher level languages)
are always less error prone than low-level (or composed) constructs.
"C" is inherently and by design a comparably low-level language, so
I wonder what you complain here about. (You won't change that.)
So is mine. But it has many more 'commodity' features that make life simpler. Plus a generally cleaner syntax to make it clearer.
On Fri, 28 Nov 2025 12:45:58 +0100
David Brown <david.brown@hesbynett.no> wrote:
I can believe that. If you have to implement floating point routines
in general integer hardware (and I expect that is the case for most
of your implementation here) then I would think it is better to start
and end with the data in GPR's. On some targets, moving data into
and out of floating point or vector registers is efficient enough
that those registers can effectively be used as caches, but it sounds
like that is not the case here.
On Windows the problem is only of moving data between various types of registers.
On SysV things are worse: there is also a problem of absence of
caller-saved FP/SIMD registers. In theory, the problem could have been
solved by defining specialized ABI for support routines (__addtf3,
__subtf3, __multf3, etc...), but that was not done either.
I think, that it all comes from the old mental model of soft floating
point routines being very slow; so slow that ABI impedance mismatches
lost in noise. But in specific case of binary128 on modern CPUs, it's
simply not true - arithmetic itself is quite fast so ABI mismatches are significant.
On 2025-11-27, bart <bc@freeuk.com> wrote:
Well, let's stick with C. Here are some features I use, and the C
equivalents (A has whatever type is needed):
M C
-------------------------------------------------------------
[snip]
A.odd A & 1, or A % 1
"A % 1" ?
On 27/11/2025 15:02, Michael S wrote:
On Thu, 27 Nov 2025 14:02:38 +0100
David Brown <david.brown@hesbynett.no> wrote:
MSVC compilers compile your code and produce correct result, but the
code
looks less nice:
0000000000000000 <get_exponent>:
0: f2 0f 11 44 24 08 movsd %xmm0,0x8(%rsp)
6: 48 8b 44 24 08 mov 0x8(%rsp),%rax
b: 48 c1 e8 34 shr $0x34,%rax
f: 25 ff 07 00 00 and $0x7ff,%eax
14: c3 ret
Although on old AMD processors it is likely faster than nicer code
generated by gcc and clang. On newer processor gcc code is likely a bit
better, but the difference is unlikely to be detected by simple
measurements.
I think it is unlikely that this version - moving from xmm0 to rax via memory instead of directly - is faster on any processor. But I fully
agree that it is unlikely to be a measurable difference in practice.
Also MSVC compiler does not like your style and produces following
warning:
dave_b.c(5): warning C4116: unnamed type definition in parentheses
Warnings are a matter of taste. There's nothing wrong with my code, but
it may be against some code styles.
BTW, I don't like your style either. My preferred code will look
very similar to the code of Waldek Hebisch except that I'd declare
d_to_u() static.
I don't like union trick. Not just in this particular context, but
generally. memcpy() much cleaner in expressing programmer's intentions.
I particularly don't like using unions in compound literals like this
either - it was just to make a compact demonstration. I'd write real
code in more re-usable bits with static inline functions.
I disagree, however, that memcpy() shows intent better. The intention
is not to copy it to memory - the intention is to access the underlying
bit representation as a different type. A type-punning union is at
least, if not more, clear for that purpose (IMHO - and judgements of
style and clarity are very much a matter of opinion).
But for
me, -O2 is generally the sweet spot. I have no real interest in using a compiler that doesn't do decent optimisation - if I am happy with slow
code, I'll use Python.
On 28/11/2025 10:41, David Brown wrote:
But for me, -O2 is generally the sweet spot. I have no real
interest in using a compiler that doesn't do decent optimisation - if
I am happy with slow code, I'll use Python.
That's like saying that if you can't go at 100mph, you're happy to walk!
There's no compromise at all?
I've taken a task (decode JPEG) which uses the same algorithm across
three languages, and applied it to the same input. These are the
runtimes, expressed in relative MPH:
Drive 1 mile:
gcc -O3 C 108 mph 33s
gcc -O2 C 100 mph 36s
mm M 77 mph (my lang) 47s
bcc C 55 mph (my product) 1m 05s
tcc C 25 mph 2m 24s
CPython Python 0.8 mph 1h 15m 00s
Actually, forget walking: you'd rather crawl on your hands on knees!
(The figure for PyPy for this task, which has lots of long loops to get stuck into, is 19 mph, but the speedup is generally unpredictable.)
It is also possible to compose values of _UBitInt and similar, say:
_UBitInt(24) rgb24;
_UBitInt(16) rgb5;
rgb5=(_UBitInt(16)) { 0b0u1, rgb24[23:19], rgb24[15:11], rgb24[7:3] };
On 28/11/2025 02:33, Janis Papanagnou wrote:[...]
You can of course add as many commodity features to "your language"
as you like. I seem to recall that one of the design principles of
"C" was to not add too many keywords. (Not sure whether 'A.odd' is
a function or keyword above [in "your language"].)
It is a reserved word, which means it can't be used as either a
top-level user identifier, or a member name. With extra effort, it
could be used for both, but that needs some special syntax, such as
Ada-style "A'odd"; I've never got around to it.
In Pascal (where I copied it from) it is a reserved word.
PS: BTW, I was always wondering why Pascal and Algol 68 supported
'odd' but not 'even'! - In the documents of the Genie compiler we
can read: "This is a relic of times long past.", but beyond that
it doesn't explain why it's a "relic". I can only guess that it's,
as a special case, considered just unnecessary in the presence of
the modulus operator.
Maybe because you can trivially define 'even' as 'not odd'.
bart <bc@freeuk.com> writes:
On 28/11/2025 02:33, Janis Papanagnou wrote:[...]
You can of course add as many commodity features to "your language"
as you like. I seem to recall that one of the design principles of
"C" was to not add too many keywords. (Not sure whether 'A.odd' is
a function or keyword above [in "your language"].)
It is a reserved word, which means it can't be used as either a
top-level user identifier, or a member name. With extra effort, it
could be used for both, but that needs some special syntax, such as
Ada-style "A'odd"; I've never got around to it.
In Pascal (where I copied it from) it is a reserved word.
<OT>In Pascal, "odd" is not a reserved word. It's the name of a
predefined function.</OT>
On 28/11/2025 02:33, Janis Papanagnou wrote:
so the natural way of describing them would (IMO) rather be
based on 'x mod 2 = 1' and 'x mod 2 = 0' respectively. (So the "C"
syntax with '%' is probably more "appropriate". Mileages may vary.)
I've made the mistake with % 1 more than once.
You can of course add as many commodity features to "your language"
as you like. I seem to recall that one of the design principles of
"C" was to not add too many keywords. (Not sure whether 'A.odd' is
a function or keyword above [in "your language"].)
It is a reserved word, which means it can't be used as either a top-
level user identifier, or a member name. With extra effort, it could be
used for both, but that needs some special syntax, such as Ada-style "A'odd"; I've never got around to it.
In Pascal (where I copied it from) it is a reserved word.
PS: BTW, I was always wondering why Pascal and Algol 68 supported
'odd' but not 'even'! - In the documents of the Genie compiler we
can read: "This is a relic of times long past.", but beyond that
it doesn't explain why it's a "relic". I can only guess that it's,
as a special case, considered just unnecessary in the presence of
the modulus operator.
Maybe because you can trivially define 'even' as 'not odd'.
...
That was basically also the background of my explanation; to my
knowledge "C" didn't want to introduce too many reserved words
that as a consequence then cannot be used as "language entity"
names (like variables, function names, etc.) any more. - That's
why introducing simple high-level functions unnecessarily may be
deprecated.
On 28/11/2025 23:23, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
On 28/11/2025 02:33, Janis Papanagnou wrote:[...]
<OT>In Pascal, "odd" is not a reserved word. It's the name of aYou can of course add as many commodity features to "your language"
as you like. I seem to recall that one of the design principles of
"C" was to not add too many keywords. (Not sure whether 'A.odd' is
a function or keyword above [in "your language"].)
It is a reserved word, which means it can't be used as either a
top-level user identifier, or a member name. With extra effort, it
could be used for both, but that needs some special syntax, such as
Ada-style "A'odd"; I've never got around to it.
In Pascal (where I copied it from) it is a reserved word.
predefined function.</OT>
So what's a 'reserved word' then? To me it is something not available
as a user-identifier because it has a special meaning in the language,
which may be that of a predefined function among other things.
bart <bc@freeuk.com> writes:
On 28/11/2025 23:23, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
On 28/11/2025 02:33, Janis Papanagnou wrote:[...]
<OT>In Pascal, "odd" is not a reserved word. It's the name of aYou can of course add as many commodity features to "your language"
as you like. I seem to recall that one of the design principles of
"C" was to not add too many keywords. (Not sure whether 'A.odd' is
a function or keyword above [in "your language"].)
It is a reserved word, which means it can't be used as either a
top-level user identifier, or a member name. With extra effort, it
could be used for both, but that needs some special syntax, such as
Ada-style "A'odd"; I've never got around to it.
In Pascal (where I copied it from) it is a reserved word.
predefined function.</OT>
So what's a 'reserved word' then? To me it is something not available
as a user-identifier because it has a special meaning in the language,
which may be that of a predefined function among other things.
Right. The name "odd" is available as a user-defined identifier.
If you define something named "odd" in Pascal, it hides the
predefined function of that name.
You can think of Pascal's predefined functions as being declared
in an outer scope, surrounding the main program.
On 29/11/2025 03.26, Janis Papanagnou wrote:
...
That was basically also the background of my explanation; to my
knowledge "C" didn't want to introduce too many reserved words
that as a consequence then cannot be used as "language entity"
names (like variables, function names, etc.) any more. - That's
why introducing simple high-level functions unnecessarily may be
deprecated.
Please ignore the last sentence. - I was speaking about reserved
words or keywords and not about function names in the context of
the paragraph. - So it depends in what way you introduce elements
like 'odd'. As a "C" function it wouldn't matter much. In case of
"your language" - where you say it's a keyword! - it would matter,
though!
On 29/11/2025 03:38, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
On 28/11/2025 23:23, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
On 28/11/2025 02:33, Janis Papanagnou wrote:[...]
<OT>In Pascal, "odd" is not a reserved word. It's the name of aYou can of course add as many commodity features to "your language" >>>>>> as you like. I seem to recall that one of the design principles of >>>>>> "C" was to not add too many keywords. (Not sure whether 'A.odd' is >>>>>> a function or keyword above [in "your language"].)
It is a reserved word, which means it can't be used as either a
top-level user identifier, or a member name. With extra effort, it
could be used for both, but that needs some special syntax, such as
Ada-style "A'odd"; I've never got around to it.
In Pascal (where I copied it from) it is a reserved word.
predefined function.</OT>
So what's a 'reserved word' then? To me it is something not available
as a user-identifier because it has a special meaning in the language,
which may be that of a predefined function among other things.
Right. The name "odd" is available as a user-defined identifier.
If you define something named "odd" in Pascal, it hides the
predefined function of that name.
I did test it with a toy Pascal compiler I have. Defining 'odd' as a variable didn't work, but that was for other reasons.
You can think of Pascal's predefined functions as being declared
in an outer scope, surrounding the main program.
I took 'predefined functions' to mean 'built-in functions' (effectively, operators with function-like syntax), that cannot be overridden.
So 'odd' is not a reserved word in Pascal; I was mistaken.
(My opinion is that being able to shadow fundamental language features
is undesirable. Being able to reuse them as user identifiers is another matter, but that would involve tricks with syntax or context to avoid ambiguity.)
On 29/11/2025 12:24, bart wrote:
On 29/11/2025 03:38, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
On 28/11/2025 23:23, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
On 28/11/2025 02:33, Janis Papanagnou wrote:[...]
<OT>In Pascal, "odd" is not a reserved word. It's the name of aYou can of course add as many commodity features to "your language" >>>>>>> as you like. I seem to recall that one of the design principles of >>>>>>> "C" was to not add too many keywords. (Not sure whether 'A.odd' is >>>>>>> a function or keyword above [in "your language"].)
It is a reserved word, which means it can't be used as either a
top-level user identifier, or a member name. With extra effort, it >>>>>> could be used for both, but that needs some special syntax, such as >>>>>> Ada-style "A'odd"; I've never got around to it.
In Pascal (where I copied it from) it is a reserved word.
predefined function.</OT>
So what's a 'reserved word' then? To me it is something not available
as a user-identifier because it has a special meaning in the language, >>>> which may be that of a predefined function among other things.
Right. The name "odd" is available as a user-defined identifier.
If you define something named "odd" in Pascal, it hides the
predefined function of that name.
I did test it with a toy Pascal compiler I have. Defining 'odd' as a
variable didn't work, but that was for other reasons.
You can think of Pascal's predefined functions as being declared
in an outer scope, surrounding the main program.
I took 'predefined functions' to mean 'built-in
functions' (effectively, operators with function-like syntax), that
cannot be overridden.
So 'odd' is not a reserved word in Pascal; I was mistaken.
(My opinion is that being able to shadow fundamental language features
is undesirable. Being able to reuse them as user identifiers is
another matter, but that would involve tricks with syntax or context
to avoid ambiguity.)
The issue is where you draw the line of what is a "fundamental language feature", and what is not. For Pascal, "begin" is a fundamental
language feature, part of the syntax. "odd" is not fundamental - it's
just a function in the Pascal's equivalent of the C standard library. So
no tricks or special syntax (like "stropping") are needed to re-use the identifier for other purposes.
I agree that using words that are "fundamental" is not good. But if a language provides built-in functions in a global namespace, then it is a serious limitation if these cannot be shadowed or overridden.
Basically,
it means that you are always at risk of conflicts with existing code if later language versions add new functions. So if someone wrote Pascal
code with a local variable called "even", and a later version introduced
a built-in function "even", then it is critical that this is an
overrideable or shadowable (if that is a real word!) identifier.
That's why C is very conservative about adding new keywords, and uses reserved namespaces for the purpose - thus C99 added "_Bool", not
"bool", to avoid conflict with existing code. Only now, over two
decades later, did the committee feel that uses of the identifier "bool" other than as a typedef for _Bool (usually via <stdbool.h>) are so rare
that C23 could finally have "bool" as a keyword for the type. And they still have challenges with good names for standard library functions -
now in C23, many new ones have names with a "stdc_" prefix.
On 28/11/2025 23:23, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
On 28/11/2025 02:33, Janis Papanagnou wrote:[...]
You can of course add as many commodity features to "your language"
as you like. I seem to recall that one of the design principles of
"C" was to not add too many keywords. (Not sure whether 'A.odd' is
a function or keyword above [in "your language"].)
It is a reserved word, which means it can't be used as either a
top-level user identifier, or a member name. With extra effort, it
could be used for both, but that needs some special syntax, such as
Ada-style "A'odd"; I've never got around to it.
In Pascal (where I copied it from) it is a reserved word.
<OT>In Pascal, "odd" is not a reserved word. It's the name of a
predefined function.</OT>
So what's a 'reserved word' then? To me it is something not available as
a user-identifier because it has a special meaning in the language,
which may be that of a predefined function among other things.
On 29/11/2025 13:45, David Brown wrote:
On 29/11/2025 12:24, bart wrote:
On 29/11/2025 03:38, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
On 28/11/2025 23:23, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
On 28/11/2025 02:33, Janis Papanagnou wrote:[...]
<OT>In Pascal, "odd" is not a reserved word. It's the name of aYou can of course add as many commodity features to "your language" >>>>>>>> as you like. I seem to recall that one of the design principles of >>>>>>>> "C" was to not add too many keywords. (Not sure whether 'A.odd' is >>>>>>>> a function or keyword above [in "your language"].)
It is a reserved word, which means it can't be used as either a
top-level user identifier, or a member name. With extra effort, it >>>>>>> could be used for both, but that needs some special syntax, such as >>>>>>> Ada-style "A'odd"; I've never got around to it.
In Pascal (where I copied it from) it is a reserved word.
predefined function.</OT>
So what's a 'reserved word' then? To me it is something not available >>>>> as a user-identifier because it has a special meaning in the language, >>>>> which may be that of a predefined function among other things.
Right. The name "odd" is available as a user-defined identifier.
If you define something named "odd" in Pascal, it hides the
predefined function of that name.
I did test it with a toy Pascal compiler I have. Defining 'odd' as a
variable didn't work, but that was for other reasons.
You can think of Pascal's predefined functions as being declared
in an outer scope, surrounding the main program.
I took 'predefined functions' to mean 'built-in
functions' (effectively, operators with function-like syntax), that
cannot be overridden.
So 'odd' is not a reserved word in Pascal; I was mistaken.
(My opinion is that being able to shadow fundamental language
features is undesirable. Being able to reuse them as user identifiers
is another matter, but that would involve tricks with syntax or
context to avoid ambiguity.)
The issue is where you draw the line of what is a "fundamental
language feature", and what is not. For Pascal, "begin" is a
fundamental language feature, part of the syntax. "odd" is not
fundamental - it's just a function in the Pascal's equivalent of the C
standard library. So no tricks or special syntax (like "stropping")
are needed to re-use the identifier for other purposes.
I agree that using words that are "fundamental" is not good. But if a
language provides built-in functions in a global namespace, then it is
a serious limitation if these cannot be shadowed or overridden.
I see it as an advantage. I can do this in Python:
len = 42
print(len("abc"))
Now len() no longer works as expected. In Algo68 you can do this:
OP + = (INT a, b)INT: a - b;
print(2 + 3)
This prints -1. (Or, more subtly, you can redefine the precedence of '+'
to be the opposite side of '*'.)
With different scopes in effect, different parts of a program can see different versions of what many might not realise are user-overrideable features.
Basically, it means that you are always at risk of conflicts with
existing code if later language versions add new functions. So if
someone wrote Pascal code with a local variable called "even", and a
later version introduced a built-in function "even", then it is
critical that this is an overrideable or shadowable (if that is a real
word!) identifier.
If 'even' is implemented as though it can be defined via a user-
function) then shadowing is normally allowed (unless the language likes
to warn you about such things).
But if it's a true built-in that just happens to use function syntax,
then the user sees an error. Then they can choose to update their
codebase, when possible.
For somebody else's code, or for legacy code, that becomes harder, and
the implementation may need to strictly enforce language versions so
that such new features are disabled via the build info.
Alternately, such built-ins can use special syntax so they would never
clash with user-identifiers.
Note that such clashes can also occur when mixing libraries: maybe both library A and B can export 'even', even when everything is defined in user-code.
Then the language had better have a namespace feature to disasmbiguate (which I have, but C doesn't)
That's why C is very conservative about adding new keywords, and uses
reserved namespaces for the purpose - thus C99 added "_Bool", not
"bool", to avoid conflict with existing code. Only now, over two
decades later, did the committee feel that uses of the identifier
"bool" other than as a typedef for _Bool (usually via <stdbool.h>) are
so rare that C23 could finally have "bool" as a keyword for the type.
And they still have challenges with good names for standard library
functions - now in C23, many new ones have names with a "stdc_" prefix.
On 28/11/2025 14:33, Michael S wrote:
On Fri, 28 Nov 2025 12:45:58 +0100
David Brown <david.brown@hesbynett.no> wrote:
I can believe that. If you have to implement floating point
routines in general integer hardware (and I expect that is the
case for most of your implementation here) then I would think it
is better to start and end with the data in GPR's. On some
targets, moving data into and out of floating point or vector
registers is efficient enough that those registers can effectively
be used as caches, but it sounds like that is not the case here.
On Windows the problem is only of moving data between various types
of registers.
On SysV things are worse: there is also a problem of absence of caller-saved FP/SIMD registers. In theory, the problem could have
been solved by defining specialized ABI for support routines
(__addtf3, __subtf3, __multf3, etc...), but that was not done
either.
I think, that it all comes from the old mental model of soft
floating point routines being very slow; so slow that ABI impedance mismatches lost in noise. But in specific case of binary128 on
modern CPUs, it's simply not true - arithmetic itself is quite fast
so ABI mismatches are significant.
My only real experience with software floating point (using it, not
writing it) is on systems where they are either slow (like 32-bit
Cortex-M ARMs), or /very/ slow (like an 8-bit AVR). A little
inefficiency in the main ABI's is, as you say, just noise in these
cases.
But in those systems, the floating point arithmetic routines were
part of the compiler support library. Functions there don't have to
abide by the platform ABI - they can use different registers
according to what suits best. Were you working on a library that
integrates into the compiler, or was it more "user level" (like a C++ "binary128" class with operator overrides) ?
ABI's are obviously useful for standardisation and intermixing of
code from different tools. But they can also be a pain, especially
when they are old and outdated or designed to be efficient on
different processors or with different kinds of code. I am finding
the EABI for 32-bit ARM to be a serious performance drain for some
kinds of work. It doesn't support passing anything bigger than
32-bit in registers, except for "long long int" and "unsigned long
long int". It has the same restriction on return values. That means
if you have something like a C++ optional<uint32_t> type, or
equivalent struct in C, it's all passed back and forth on the stack.
And unlike the AMD processors you mention, on a Cortex-M core that is
a lot slower!
Michael S <already5chosen@yahoo.com> wrote:
Zen3 has its own can of worms in the area of moving data between
GPR and SIMD. The issues here are more subtle than those mentioned
above. And unfortunately almost completely non-documented in the
manuals. And despite that issues are subtle, performance impact can
be very significant.
I encountered these things when implementing alternative
(to those currently in use by gcc) IEEE binary128 arithmetic
routines. My conclusion was that designers of binary128 ABI in
general and of ABI of support routines in particular made a serious
mistake by treating binary128 (a.k.a. __float128, a.k.a _Float128,
a.k.a. 'long double' on ARM64) as "floating-point" type that is
passed around in XMM registers (or Neon registers on ARM64). Both
passing it in pair of GPRs and via memory would be significantly
faster on AMD processors and detectably faster on Intel processors.
If they want to handle that type in hardware on some future model,
then ABI must use floating point (that is XMM) register.
Upper limit is implementation-defined.
On both existing implementations the limit (on 64-bit targets) appears
to be 2**16 or 2**16-1. I don't remember which one.
On 24/11/2025 11:17, David Brown wrote:
On 24/11/2025 01:30, bart wrote:
Saving memory was mentioned. To achieve that means having bitfields
that may not start at bit 0 of a byte, and may cross byte- or word-
boundaries.
No, that is incorrect.
The proposal mentions saving /space/ as relevant in FPGAs - not
saving / memory/.
But I was responding to a suggestion here that one use of _BitInts - presumably for ordinary hardware - was to save memory.
That's not going to happen if they are simply rounded up to the next power-of-two type.
I had been responding to the claim that those smaller types save
memory, compared to using sizes 8/16/32 bits which are commonly
available and have better hardware support.
I don't recall any such claim. Do you have a citation (other than
the FPGA-specific wording in N2709)?
This is where it came up in this thread:
On 23/11/2025 11:46, Philipp Klaus Krause wrote:
Am 22.10.25 um 14:45 schrieb Thiago Adams:
Is anyone using or planning to use this new C23 feature?
What could be the motivation?
Saving memory by using the smallest multiple-of-8 N that will do. Also being able to use bit-fields wider than int.
Saving memory for two reasons:
* On small embedded systems where there is very little memory
* For code that needs to be very fast on big systems to make data structures fit into cache
Although this doesn't go as far as using odd bit-sizes: it would mean
using sizes like 24, 40, 48, and 56 bits instead of 32 or 64 bits.
The savings would be sparse.
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
bart <bc@freeuk.com> writes:[...]
[...]OK, so why are you not allowed to have _BitInt(1)? That is, a 1-bit
signed integer. It might only have two values of 0 and -1; doesn't
nobody want that particular combination?
I don't know. The language allows 1-bit signed bit-fields, so
_BitInt(1) would make some sense, but the language requires N to
be at least 1 for unsigned _BitInt and 2 for signed _BitInt.
It doesn't bother me too much, since I'm unlikely to have a
use for signed _BitInt(1). But it's an arbitrary restriction.
I just learned that there's a proposal to allow _BitInt(1) in C2y.
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3699.pdf
The current restriction apparently was for historical reasons.
Prior to C23, C didn't require two's complement for signed types,
and signed _BitInt(1) doesn't make much sense for one's complement
or sign-and-magnitude (it could only hold +0 and -0).
bart <bc@freeuk.com> wrote:
On 24/11/2025 20:26, David Brown wrote:
On 24/11/2025 19:35, bart wrote:
But now there is this huge leap, not only to 128/256/512/1024 bits,
but to conceivably millions, plus the ability to specify any weird
type you like, like 182 bits (eg. somebody makes a typo for
_BitInt(128), but they silently get a viable type that happens to be a >>>> little less efficient!).
And this huge leap also lets you have 128-bit, 256-bit, 512-bit, etc.,
And 821 bits. This is what I don't get. Why is THAT so important?
Why couldn't 128/256/etc have been added first, and then those funny
ones if the demand was still there?
If the proposal had instead been simply to extend the 'u8 u16 u32 u64'
set of types by a few more entries on the right, say 'u128 u256 u512',
would anyone have been clamouring for types like 'u1187'? I doubt it.
For sub-64-bit types on conventional hardware, I simply can't see the
point, not if they are rounded up anyway. Either have a full range-based
types like Ada, or not at all.
First, _BitInt(821) (and _BitInt(1187)) are really unimportant. You
simple get them as a byproduct of general rules.
Second, C has standard process and each proposal must go trough
this process. It is much easier to standarize one simple general
frature (like _BitInt) than a bunch of separate proposals.
Also, note that 64 bits is maximum guaranteed by the standard.
If stanard mandated presence of say 'int128_t' that could cause
opposiotion from some influential parties (like a rich company
in north-west USA). Optional 'int128_t' and 'uint128_t' adds little
value.
'_BitInt' is a simple interface, for example 'intnnn_t' for
some 'nnn' would add multiple slots in compiler symbol table
or need multiple entries in header files. '_BitInt' is a single
identifier.
_BitInt(8) makes a lot of sense for 8-bit processors. The
requirement for number of bits not divisible by 8 came from
requiremnt of portablity to FPGA, where hardware may use
odd width.
Even if you limit attention to mainstream hardware, 32-bit
machines are still popular enough so that C must support
them. So specifying size as multiple of 64 bits would
be unacceptable. Specifying size as multiple of 32 bits
would be unnatural too. You may think that specifying
size in bytes is natural, but clearly for efficiency
implementation my wish to round up size. So, why not
specify size with maximal possible resolution, that is
in bits?
BTW: you complained that C has many identifiers naming
integer types. Now C has uniform way to form names of
integer types that covers most needs for fixed size
integer types (sometimes you may wish promotion to
integer, so 'int8_t', 'int16_t' and possibly 'int32_t'
and usigned variants still may be preferable). So now
you complian that this scheme is too flexible and
C should use separate names...
Am 23.11.25 um 16:06 schrieb Michael S:
Upper limit is implementation-defined.
On both existing implementations the limit (on 64-bit targets) appears
to be 2**16 or 2**16-1. I don't remember which one.
Thisis comp.lang.c, not comp.lang.c++. There still are implementations
of C other than GCC and clang. E.g. SDCC has a limit of 64.
Philipp
But do SDCC support any non-8 bit processor? I hope that gcc 8-bit
targets will also use minimal number of bytes.
Note: in C2023, the predefined Identifiers section says: "Any other predefined macro names: shall begin with a leading underscore followed
by an uppercase letter; or, a second underscore...". For earlier
versions of the standard, user code should avoid using such identifiers because they were reserved for all purposes, but that's no longer the
case. Now, they should be avoided because they may be pre-defined by the implementation, which means that any attempt to use them might have unpredictable results.
On 29/11/2025 20:24, Waldek Hebisch wrote:
bart <bc@freeuk.com> wrote:
On 24/11/2025 20:26, David Brown wrote:First, _BitInt(821) (and _BitInt(1187)) are really unimportant. You
On 24/11/2025 19:35, bart wrote:
But now there is this huge leap, not only to 128/256/512/1024 bits,
but to conceivably millions, plus the ability to specify any weird
type you like, like 182 bits (eg. somebody makes a typo for
_BitInt(128), but they silently get a viable type that happens to be a >>>>> little less efficient!).
And this huge leap also lets you have 128-bit, 256-bit, 512-bit, etc.,
And 821 bits. This is what I don't get. Why is THAT so important?
Why couldn't 128/256/etc have been added first, and then those funny
ones if the demand was still there?
If the proposal had instead been simply to extend the 'u8 u16 u32 u64'
set of types by a few more entries on the right, say 'u128 u256 u512',
would anyone have been clamouring for types like 'u1187'? I doubt it.
For sub-64-bit types on conventional hardware, I simply can't see the
point, not if they are rounded up anyway. Either have a full range-based >>> types like Ada, or not at all.
simple get them as a byproduct of general rules.
That they are allowed is the problem. People use them and expect the
compiler to waste its time generating bit-precise code.
You can have general _BitInt(N) syntax and have constraints on the
values of N, not just an upper limit.
_BitInt(8) makes a lot of sense for 8-bit processors. The
requirement for number of bits not divisible by 8 came from
requiremnt of portablity to FPGA, where hardware may use
odd width.
Wouldn't 'char' have a different width there anyway? Or can it be even
odder where char is 7 bits and int is 19?
Apparently _BitInt(8) is incompatible with int8_t.
Am 23.11.25 um 16:06 schrieb Michael S:
Upper limit is implementation-defined.
On both existing implementations the limit (on 64-bit targets) appears
to be 2**16 or 2**16-1. I don't remember which one.
Thisis comp.lang.c, not comp.lang.c++. There still are implementations
of C other than GCC and clang. E.g. SDCC has a limit of 64.
bart <bc@freeuk.com> writes:
On 29/11/2025 20:24, Waldek Hebisch wrote:
bart <bc@freeuk.com> wrote:
On 24/11/2025 20:26, David Brown wrote:First, _BitInt(821) (and _BitInt(1187)) are really unimportant. You
On 24/11/2025 19:35, bart wrote:And 821 bits. This is what I don't get. Why is THAT so important?
But now there is this huge leap, not only to 128/256/512/1024 bits, >>>>>> but to conceivably millions, plus the ability to specify any weird >>>>>> type you like, like 182 bits (eg. somebody makes a typo for
_BitInt(128), but they silently get a viable type that happens to be a >>>>>> little less efficient!).
And this huge leap also lets you have 128-bit, 256-bit, 512-bit, etc., >>>>
Why couldn't 128/256/etc have been added first, and then those funny
ones if the demand was still there?
If the proposal had instead been simply to extend the 'u8 u16 u32 u64' >>>> set of types by a few more entries on the right, say 'u128 u256 u512', >>>> would anyone have been clamouring for types like 'u1187'? I doubt it.
For sub-64-bit types on conventional hardware, I simply can't see the
point, not if they are rounded up anyway. Either have a full range-based >>>> types like Ada, or not at all.
simple get them as a byproduct of general rules.
That they are allowed is the problem. People use them and expect the
compiler to waste its time generating bit-precise code.
You are literally the only person I've seen complain about it. And you
can avoid any such problem by not using unusual sizes in your code.
You want to impose your arbitrary restrictions on the rest of us.
Do you even use _BitInt types?
Oh no, I can type (n + 1187), and it will yield the sum of n and 1187.
Why would anyone want to add 1187 to an integer? The language should be changed (made more complicated) to forbid operations that don't make
obvious sense!!
You can have general _BitInt(N) syntax and have constraints on the
values of N, not just an upper limit.
No you can't, because the language does not allow the arbitrary
restrictions you want. If an implementer finds _BitInt(1187)
too difficult, they can set BITINT_MAXWIDTH to 64.
One more time: Both gcc and llvm/clang have already implemented
bit-precise types, with very large values of BITINT_MAXWIDTH.
What actual problems has this fact caused for you, other than giving
you something to complain about?
[...]
_BitInt(8) makes a lot of sense for 8-bit processors. The
requirement for number of bits not divisible by 8 came from
requiremnt of portablity to FPGA, where hardware may use
odd width.
Wouldn't 'char' have a different width there anyway? Or can it be even
odder where char is 7 bits and int is 19?
char is at least 8 bits wide, and the size of int must be a multiple of CHAR_BIT (though its width needn't be if there are padding bits).
I don't know about C implementations for FPGAs, but I presume they
still obey the rules of the language.
[...]
Apparently _BitInt(8) is incompatible with int8_t.
Yes, it is. char, signed char, and unsigned char are also incompatible
with each other. How is that a problem?
They're both scalar types, so
they're implicitly converted when needed.
[...]
store 1 into a singed integer type was confusing to users, and those
James Kuyper <jameskuyper@alumni.caltech.edu> writes:
[...]
Note: in C2023, the predefined Identifiers section says: "Any other
predefined macro names: shall begin with a leading underscore followed
by an uppercase letter; or, a second underscore...". For earlier
versions of the standard, user code should avoid using such identifiers
because they were reserved for all purposes, but that's no longer the
case. Now, they should be avoided because they may be pre-defined by the
implementation, which means that any attempt to use them might have
unpredictable results.
That's in the "Predefined macro names" section (N3220 6.10.10.1).
The "Predefine identifiers" section (6.4.3.2) documents __func__.
There are no other predefined identifiers.
On 30/11/2025 00:46, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
You can have general _BitInt(N) syntax and have constraints on the
values of N, not just an upper limit.
No you can't, because the language does not allow the arbitrary
restrictions you want. If an implementer finds _BitInt(1187)
too difficult, they can set BITINT_MAXWIDTH to 64.
One more time: Both gcc and llvm/clang have already implemented
bit-precise types, with very large values of BITINT_MAXWIDTH.
What actual problems has this fact caused for you, other than giving
you something to complain about?
What problem would there be if BitInt sizes above the machine word sizes
had to be multiples of the word sizes?
It what way would it inconvenience /you/?
I just don't unlike unnecessarily flexible, lax or over-ambitious
features in a language. I think that is as much poor design as underspecifying.
So I'm interested in what that one extra bit in a million buys you. Or
that one bit fewer.
[...]
On 30/11/2025 00:46, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
On 29/11/2025 20:24, Waldek Hebisch wrote:You are literally the only person I've seen complain about it. And
bart <bc@freeuk.com> wrote:
On 24/11/2025 20:26, David Brown wrote:First, _BitInt(821) (and _BitInt(1187)) are really unimportant. You
On 24/11/2025 19:35, bart wrote:And 821 bits. This is what I don't get. Why is THAT so important?
But now there is this huge leap, not only to 128/256/512/1024 bits, >>>>>>> but to conceivably millions, plus the ability to specify any weird >>>>>>> type you like, like 182 bits (eg. somebody makes a typo for
_BitInt(128), but they silently get a viable type that happens to be a >>>>>>> little less efficient!).
And this huge leap also lets you have 128-bit, 256-bit, 512-bit, etc., >>>>>
Why couldn't 128/256/etc have been added first, and then those funny >>>>> ones if the demand was still there?
If the proposal had instead been simply to extend the 'u8 u16 u32 u64' >>>>> set of types by a few more entries on the right, say 'u128 u256 u512', >>>>> would anyone have been clamouring for types like 'u1187'? I doubt it. >>>>>
For sub-64-bit types on conventional hardware, I simply can't see the >>>>> point, not if they are rounded up anyway. Either have a full range-based >>>>> types like Ada, or not at all.
simple get them as a byproduct of general rules.
That they are allowed is the problem. People use them and expect the
compiler to waste its time generating bit-precise code.
you can avoid any such problem by not using unusual sizes in your
code.
You want to impose your arbitrary restrictions on the rest of us.
Do you even use _BitInt types?
Oh no, I can type (n + 1187), and it will yield the sum of n and
1187. Why would anyone want to add 1187 to an integer? The language
should be changed (made more complicated) to forbid operations that
don't make obvious sense!!
You seem to be mixing up values and types. Or are arguing for there to
be nearly as many integer types as possible values.
Everyone in this group seems obsessed with not having any limitations
at all in the language.
For example, gcc allows identifiers up to 4 billion characters along,
or something (I think I've tested it with three 1-billion-character variables.)
There was a discussion here about it. Of course, even
million-character names would be totally impractical to work with. I'd
have trouble with 256 characters (my own cap).
The rationale for BitInts seems to be heading the same way. The work
for billion-character variables as already 'been done'. That doesn't
mean they are sensible or practical or efficient!
You can have general _BitInt(N) syntax and have constraints on the
values of N, not just an upper limit.
No you can't, because the language does not allow the arbitrary
restrictions you want. If an implementer finds _BitInt(1187)
too difficult, they can set BITINT_MAXWIDTH to 64.
One more time: Both gcc and llvm/clang have already implemented
bit-precise types, with very large values of BITINT_MAXWIDTH.
What actual problems has this fact caused for you, other than giving
you something to complain about?
What problem would there be if BitInt sizes above the machine word
sizes had to be multiples of the word sizes?
It what way would it inconvenience /you/?
I just don't unlike unnecessarily flexible, lax or over-ambitious
features in a language. I think that is as much poor design as underspecifying.
So I'm interested in what that one extra bit in a million buys you. Or
that one bit fewer.
Apparently _BitInt(8) is incompatible with int8_t.
Yes, it is. char, signed char, and unsigned char are also
incompatible with each other. How is that a problem?
Signed and unsigned char have ranges of -128..+127 and 0..255
respectively when they are 8 bits wide; they cannot be compatible.
But BitInt(8) also has a -128..+127 range, yet it is not compatible
with signed char or int8_t.
Why not? Under what circumstances would somebody choose BitInt(8)
those alternatives, and why?
When 'char' is signed, that means that a signed 8-bit type on PCs can
chosen amongst four incompatible types!
Am 29.11.25 um 23:41 schrieb Waldek Hebisch:
But do SDCC support any non-8 bit processor? I hope that gcc 8-bit
targets will also use minimal number of bytes.
I don't really know. I know most of the architectures targeted by
SDCC, but what is a "non-8 bit processor"?
Philipp
Philipp Klaus Krause <pkk@spth.de> wrote:
Am 24.11.25 um 15:21 schrieb bart:
I had been responding to the claim that those smaller types save
memory, compared to using sizes 8/16/32 bits which are commonly
available and have better hardware support.
I don't recall any such claim.? Do you have a citation (other than
the FPGA-specific wording in N2709)?
This is where it came up in this thread:
On 23/11/2025 11:46, Philipp Klaus Krause wrote:
Am 22.10.25 um 14:45 schrieb Thiago Adams:
Is anyone using or planning to use this new C23 feature?
What could be the motivation?
Saving memory by using the smallest multiple-of-8 N that will
do. Also being able to use bit-fields wider than int.
Saving memory for two reasons:
* On small embedded systems where there is very little memory
* For code that needs to be very fast on big systems to make
data structures fit into cache
Although this doesn't go as far as using odd bit-sizes: it would
mean using sizes like 24, 40, 48, and 56 bits instead of 32 or 64
bits.
The savings would be sparse.
"On small embedded systems" - those tend to be 8-bit systems, so
compilers targeting them would only round up to multiple of 8, i.e.
a BitInt(40) is exactly 5 bytes. Also "bit-fields wider than int" -
for bit-fields it can indeed make sense to have a width that is not
a multiple of 8, if the remaining bits of the last byte can be used
for other purposes.
I think it is better to say "8-bit systems". People here wrote
that RPi Pico with its 256 kB RAM and megabytes of flash is small.
I have CH32V003, 32-bit MCU which has 2 KB RAM and 16 kB flash,
I would call it small. MSP430 is 16-bit, and was available with
some tiny RAM and 2 kB flash, I would say that most embedded
systems (counting projects, not number chips/subsystems that were manufactured) is bigger. Clearly 8-bit MCU-s are used in some
high-volume projects, but now one can get relatively small
32-bit MCU-s and various statistics indicate that 32-bit
MCU-s get more use than 8-bit ones. So, claim that "small
embedded systems tend to be 8-bit systems" is debatable.
Philipp Klaus Krause <pkk@spth.de> writes:
Am 23.11.25 um 16:06 schrieb Michael S:
Upper limit is implementation-defined.
On both existing implementations the limit (on 64-bit targets)
appears to be 2**16 or 2**16-1. I don't remember which one.
Recent versions of gcc have BITINT_MAXWIDTH == 65535.
llvm/clang has BITINT_MAXWIDTH == 8388608 (2**23) (and some serious performance problems with multiplication and division for large
_BitInt types).
Thisis comp.lang.c, not comp.lang.c++. There still are
implementations of C other than GCC and clang. E.g. SDCC has a
limit of 64.
I didn't see any references to C++ in the parent article. But it's interesting that SDCC support _BitInt.
The list of supported targets on SDCC front page:
* Intel MCS51 based microprocessors
* Maxim (formerly Dallas) DS80C390 variants
* Freescale (formerly Motorola) HC08 based
* Zilog Z80 based MCUs
* Padauk (pdk14, pdk15)
* STMicroelectronics STM8
* MOS 6502 and WDC 65C02
Work is in progress:
* Rabbit 4000, 5000, 6000
* Padauk pdk13
and the f8 and f8l
Unmaintained:
* Microchip PIC16 and PIC18
I know nothing about Rabbit and Padauk. The rest of architectures in
the list are '8-bit processors'.
Now, if you ask me, I don't understand why Waldek Hebisch considers difference between 8-bit and [byte-addressable] 16-bit targets
important. As far as size of relevant C types goes, they look the same:
char - 8 bits
int - 16 bit
long - 32 bits
There is possibly difference in the size of 'short', but I don't
understand why it matters.
Philipp Klaus Krause <pkk@spth.de> wrote:
Am 29.11.25 um 23:41 schrieb Waldek Hebisch:
But do SDCC support any non-8 bit processor? I hope that gcc 8-bit
targets will also use minimal number of bytes.
I don't really know. I know most of the architectures targeted by SDCC,
but what is a "non-8 bit processor"?
Processor which is not 8 bit processor.
Thisis comp.lang.c, not comp.lang.c++. There still are implementations
of C other than GCC and clang. E.g. SDCC has a limit of 64.
I didn't see any references to C++ in the parent article. But it's interesting that SDCC support _BitInt.
On 29/11/2025 20:24, Waldek Hebisch wrote:
First, _BitInt(821) (and _BitInt(1187)) are really unimportant. You
simple get them as a byproduct of general rules.
That they are allowed is the problem. People use them and expect the compiler to waste its time generating bit-precise code.
I fail to see the difficulty for implementer.
For arithmetic ops, _BitInt(1187) is almost the same as _BitInt(1216).
You just add one 'and by constant' operation applied to MS word at the
very end. You only have do it for unsigned variant, since for signed
variant overflow is undefined, anyway. So, for signed, you can do
nothing or you can do the same as unsigned, if you fill that it's
simpler.
The same goes for left shift.
For right shift and for logical ops, _BitInt(1187) is exactly the same
as _BitInt(1216).
So what is all the fuss about?
David Brown <david.brown@hesbynett.no> wrote:
_BitInt's are not arrays, they are scalars - they are integer types.
There is no concept of a type "_BitInt" - they always have compile-time
fixed sizes, such as "_BitInt(12)". So the idea of passing around
generic _BitInt's makes no more sense than passing around any other kind
of generic integer types. (Of course you can have an array of _BitInt's
of any given size.)
There are languages which pass generic types, but C is not one
of them. So idea of passing around generic _BitInt's makes sense,
but this is not included in C.
Philipp Klaus Krause <pkk@spth.de> writes:
Am 23.11.25 um 16:06 schrieb Michael S:
Upper limit is implementation-defined.
On both existing implementations the limit (on 64-bit targets) appears
to be 2**16 or 2**16-1. I don't remember which one.
Recent versions of gcc have BITINT_MAXWIDTH == 65535.
llvm/clang has BITINT_MAXWIDTH == 8388608 (2**23) (and some serious performance problems with multiplication and division for large _BitInt types).
Thisis comp.lang.c, not comp.lang.c++. There still are implementations
of C other than GCC and clang. E.g. SDCC has a limit of 64.
I didn't see any references to C++ in the parent article. But it's interesting that SDCC support _BitInt.
Am 30.11.25 um 11:22 schrieb Michael S:
I fail to see the difficulty for implementer.
For arithmetic ops, _BitInt(1187) is almost the same as _BitInt(1216).
You just add one 'and by constant' operation applied to MS word at the
very end. You only have do it for unsigned variant, since for signed
variant overflow is undefined, anyway. So, for signed, you can do
nothing or you can do the same as unsigned, if you fill that it's
simpler.
The same goes for left shift.
For right shift and for logical ops, _BitInt(1187) is exactly the same
as _BitInt(1216).
So what is all the fuss about?
I see two implementation strategies:
* Just ignore the values of the padding bits. You don't need to and or anything after arithmetic operations. Makes arithmetic as fast as
possible. But you need special handling at comparisons and casts.
* Always keep the padding bits in line with the value, i.e. and after arithemetic operations for unsigned, copy value of sign bit for signed. Extra effort at arithmetic operations, but no extra effort at casts and comparisons.
Philipp Klaus Krause <pkk@spth.de> wrote:
Am 24.11.25 um 13:31 schrieb bart:
On 24/11/2025 11:17, David Brown wrote:
On 24/11/2025 01:30, bart wrote:
Saving memory was mentioned. To achieve that means having bitfields
that may not start at bit 0 of a byte, and may cross byte- or word-
boundaries.
No, that is incorrect.
The proposal mentions saving /space/ as relevant in FPGAs - not
saving / memory/.
But I was responding to a suggestion here that one use of _BitInts -
presumably for ordinary hardware - was to save memory.
That's not going to happen if they are simply rounded up to the next
power-of-two type.
SDCC has no padding bytes - a _BitInt(N) uses (N+7)/8 bytes. And for
SDCC, _BitInt is currently the only way to get adressable integers that
are not a "power-of-two type".
But do SDCC support any non-8 bit processor? I hope that gcc 8-bit
targets will also use minimal number of bytes.
On Sat, 29 Nov 2025 22:58:26 +0000
bart <bc@freeuk.com> wrote:
On 29/11/2025 20:24, Waldek Hebisch wrote:
First, _BitInt(821) (and _BitInt(1187)) are really unimportant. You
simple get them as a byproduct of general rules.
That they are allowed is the problem. People use them and expect the
compiler to waste its time generating bit-precise code.
I fail to see the difficulty for implementer.
For arithmetic ops, _BitInt(1187) is almost the same as _BitInt(1216).
You just add one 'and by constant' operation applied to MS word at the
very end. You only have do it for unsigned variant, since for signed
variant overflow is undefined, anyway. So, for signed, you can do
nothing or you can do the same as unsigned, if you fill that it's
simpler.
The same goes for left shift.
For right shift and for logical ops, _BitInt(1187) is exactly the same
as _BitInt(1216).
So what is all the fuss about?
I see two implementation strategies:
* Just ignore the values of the padding bits. You don't need to and or
anything after arithmetic operations. Makes arithmetic as fast as
possible. But you need special handling at comparisons and casts.
* Always keep the padding bits in line with the value, i.e. and after
arithemetic operations for unsigned, copy value of sign bit for
signed. Extra effort at arithmetic operations, but no extra effort at
casts and comparisons.
That sounds about right. It's much the same as the implementation of
_Bool. You either ignore the padding bits while doing the calculations
and filter them out when they later get in the way, or you keep them
neat and consistent (signed or unsigned extended, as appropriate) during calculations and it's all fine for other operations. I have no idea
what might be the most efficient choice overall - it could vary by application, but I expect implementations to have one fixed strategy.
On 2025-11-30 03:30:42, bart wrote:
On 30/11/2025 00:46, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
You can have general _BitInt(N) syntax and have constraints on the
values of N, not just an upper limit.
No you can't, because the language does not allow the arbitrary
restrictions you want. If an implementer finds _BitInt(1187)
too difficult, they can set BITINT_MAXWIDTH to 64.
One more time: Both gcc and llvm/clang have already implemented
bit-precise types, with very large values of BITINT_MAXWIDTH.
What actual problems has this fact caused for you, other than giving
you something to complain about?
What problem would there be if BitInt sizes above the machine word
sizes had to be multiples of the word sizes?
Not sure whether you're talking at cross-purposes. (I haven't
followed all postings or details.)
If, on a 64-bit-word system, _BitInt(80) would produce an error
(because it "had to be multiples of the word sizes", as you say),
that would obviously be a problem, don't you think?
It what way would it inconvenience /you/?
If I couldn't define _BitInt(80) I'd indeed consider that a problem.
Or any other constant value that stems from my application domain.
I just don't unlike unnecessarily flexible, lax or over-ambitious
features in a language. I think that is as much poor design as
underspecifying.
You mean you have problems with, say, "char a[81]" as well?
So I'm interested in what that one extra bit in a million buys you. Or
that one bit fewer.
It's not about extra bits. It's about making it possible to define
what your tasks (that are to be implemented) ask for. (In my book.
YMMV.)
Am 30.11.25 um 10:05 schrieb Michael S:
* Zilog Z80 based MCUs
This one gets complicated. The original Z80 had a 4-bit ALU, but is
widely considered 8-bit, and I'd agree.
It has an 8-bit data bus, a 16-
bit address bus. Most instructions operate on 8-bit data, but there are
some that operate on 16 bits.
Am 30.11.25 um 12:28 schrieb David Brown:
I see two implementation strategies:
* Just ignore the values of the padding bits. You don't need to and
or anything after arithmetic operations. Makes arithmetic as fast as
possible. But you need special handling at comparisons and casts.
* Always keep the padding bits in line with the value, i.e. and after
arithemetic operations for unsigned, copy value of sign bit for
signed. Extra effort at arithmetic operations, but no extra effort at
casts and comparisons.
That sounds about right. It's much the same as the implementation of
_Bool. You either ignore the padding bits while doing the
calculations and filter them out when they later get in the way, or
you keep them neat and consistent (signed or unsigned extended, as
appropriate) during calculations and it's all fine for other
operations. I have no idea what might be the most efficient choice
overall - it could vary by application, but I expect implementations
to have one fixed strategy.
_Bool is a bit different, since it promotes to int, so we don't really
have arithemetic directly on _Bool:
I can definitely see an
implementation going one way for _BitInt, and the other for _Bool.
On 30/11/2025 09:51, Philipp Klaus Krause wrote:
Am 30.11.25 um 10:05 schrieb Michael S:
* Zilog Z80 based MCUs
This one gets complicated. The original Z80 had a 4-bit ALU, but is
widely considered 8-bit, and I'd agree.
That's news to me. Are you thinking of the 4040 as the original? Z80 was
a souped-up version of 8080: a superset with better technical specs.
Am 30.11.25 um 14:10 schrieb bart:
On 30/11/2025 09:51, Philipp Klaus Krause wrote:
Am 30.11.25 um 10:05 schrieb Michael S:
* Zilog Z80 based MCUs
This one gets complicated. The original Z80 had a 4-bit ALU, but is
widely considered 8-bit, and I'd agree.
That's news to me. Are you thinking of the 4040 as the original? Z80
was a souped-up version of 8080: a superset with better technical specs.
Both the 4004 and the Z80 were designed by Masatoshi Shima. See this interview for details on the Z80 (he does call the Z80 an "8-bit microprocessor", just a few sentences before mentioning its 4-bit ALU):
https://archive.computerhistory.org/resources/text/Oral_History/ Zilog_Z80/102658073.05.01.pdf
On 30/11/2025 14:26, Philipp Klaus Krause wrote:
Am 30.11.25 um 14:10 schrieb bart:
On 30/11/2025 09:51, Philipp Klaus Krause wrote:
Am 30.11.25 um 10:05 schrieb Michael S:
* Zilog Z80 based MCUs
This one gets complicated. The original Z80 had a 4-bit ALU, but is
widely considered 8-bit, and I'd agree.
That's news to me. Are you thinking of the 4040 as the original? Z80
was a souped-up version of 8080: a superset with better technical specs.
Both the 4004 and the Z80 were designed by Masatoshi Shima. See this
interview for details on the Z80 (he does call the Z80 an "8-bit
microprocessor", just a few sentences before mentioning its 4-bit ALU):
https://archive.computerhistory.org/resources/text/Oral_History/
Zilog_Z80/102658073.05.01.pdf
OK, so the Z80 has a '4-bit pipelined' ALU. It's explained in more
detailed here:
https://www.righto.com/2013/09/the-z-80-has-4-bit-alu-heres-how-it.html
(It doesn't say why; presumably it uses fewer on-chip resources, or to
make a point of difference from the 8080.)
But that appears to be an implementation detail that is transparent to
the user.
Since it uses 8-bit registers, 8-bit instructions, and has an 8-bit
databus, I think it can pass for an 8-bit CPU!
On 30/11/2025 04:31, Janis Papanagnou wrote:
On 2025-11-30 03:30:42, bart wrote:
On 30/11/2025 00:46, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
You can have general _BitInt(N) syntax and have constraints on the
values of N, not just an upper limit.
No you can't, because the language does not allow the arbitrary
restrictions you want. If an implementer finds _BitInt(1187)
too difficult, they can set BITINT_MAXWIDTH to 64.
One more time: Both gcc and llvm/clang have already implemented
bit-precise types, with very large values of BITINT_MAXWIDTH.
What actual problems has this fact caused for you, other than giving
you something to complain about?
What problem would there be if BitInt sizes above the machine word
sizes had to be multiples of the word sizes?
Not sure whether you're talking at cross-purposes. (I haven't
followed all postings or details.)
If, on a 64-bit-word system, _BitInt(80) would produce an error
(because it "had to be multiples of the word sizes", as you say),
that would obviously be a problem, don't you think?
How did you represent an 80-bit type up to now? How much of a problem
has it actually been?
It what way would it inconvenience /you/?
If I couldn't define _BitInt(80) I'd indeed consider that a problem.
Or any other constant value that stems from my application domain.
Which domain is that, is it representing, as an integer, the 80-bit
float type from an x87 FPU?
So I'm interested in what that one extra bit in a million buys you.
Or that one bit fewer.
It's not about extra bits. It's about making it possible to define
what your tasks (that are to be implemented) ask for. (In my book.
YMMV.)
Yet over the past decades nobody has been screaming because they
couldn't have a 31-bit or 65-bit numeric type. But now suddenly EVERYONE wants to be able to do that, and on huge numbers!
I think there's more psychology at play here than anything else.
What actual problems you have (or could imagine) with that?
On 2025-11-30 13:51:22, bart wrote:
How did you represent an 80-bit type up to now? How much of a problem
has it actually been?
I had to use workarounds and/or more effort to achieve what I intended.
For the implementation I have to think in terms of technical entities unnecessarily, and not in [more natural] application domain entities.
Which domain is that, is it representing, as an integer, the 80-bit
float type from an x87 FPU?
That was an arbitrary number.
(Other posters provided extensive lists
of other numbers already, based on application domain requirements.)
I could provide even more "crude" numbers, say, like _BitInt(17), to
operate on coefficients of generator polynomials, just for example.
The point is not to discuss any specific number. - The point is that
they are bound to the application domain and express a sensible (not artificial, defined by technical CPU word-sizes) counterpart of the [application domain] items you work on.
Programming is about implementing solutions to application problems.
It's not primarily about focusing how the hardware looks like.
The fact that you obviously don't see that, and also don't understand
it after pointing that out, in addition with your squirming to *not*
*answer* the repeatedly formulated question what your actual problem
with it is,
Yet over the past decades nobody has been screaming because they
couldn't have a 31-bit or 65-bit numeric type. But now suddenly
EVERYONE wants to be able to do that, and on huge numbers!
You are again talking nonsense and exposing your non-sincere moves
to stupidly exaggerate ("screaming") and unreasonable generalize ("EVERYONE"). - Yet, you prove again that you are not willing to
be a serious discussion partner.
bart <bc@freeuk.com> wrote:
Yet what I said is pretty much true. Nobody care about BitInt until they
became aware of, and now it's must-have.
Well, you were told many times that regulars here know deficiencies
of C. "Nobody care about BitInt" in the sense that before _BitInt
people will say "this can not be expressed directly in C, you need
such and such workaround". People did not loudly complain
knowing that complaints would achive nothing. But say doing
language comparisons they could note that C lack such a feature.
There is also a psychological phenomenon: computers even in crude
form are quite useful. So people were willing to jump hops to
use them. But when better/easier approach is available people
wery strongly resist going to old ways. So, once w got _BitInt
you will not be able to take it back.
Now, if you ask me, I don't understand why Waldek Hebisch considers difference between 8-bit and [byte-addressable] 16-bit targets
important. As far as size of relevant C types goes, they look the same:
char - 8 bits
int - 16 bit
long - 32 bits
There is possibly difference in the size of 'short', but I don't
understand why it matters.
Yet over the past decades nobody has been screaming because they
couldn't have a 31-bit or 65-bit numeric type. But now suddenly
EVERYONE wants to be able to do that, and on huge numbers!
Michael S <already5chosen@yahoo.com> writes:
[...]
Now, if you ask me, I don't understand why Waldek Hebisch considers
difference between 8-bit and [byte-addressable] 16-bit targets
important. As far as size of relevant C types goes, they look the same:
char - 8 bits
int - 16 bit
long - 32 bits
There is possibly difference in the size of 'short', but I don't
understand why it matters.
Given 16-bit int, short is almost certain to be 16 bits as well.
char is requires to be at least 8 bits, short and int at least 16, and
long at least 32 (and long long at least 64).
Or is 8-bit short used in some non-conforming mode?
On 30/11/2025 17:17, Janis Papanagnou wrote:
On 2025-11-30 13:51:22, bart wrote:
I've said many times that it's a poorly designed feature.
Read the
thread, as I'm not going to repeat things.
Yet over the past decades nobody has been screaming because they
couldn't have a 31-bit or 65-bit numeric type. But now suddenly
EVERYONE wants to be able to do that, and on huge numbers!
You are again talking nonsense and exposing your non-sincere moves
to stupidly exaggerate ("screaming") and unreasonable generalize
("EVERYONE"). - Yet, you prove again that you are not willing to
be a serious discussion partner.
Yet what I said is pretty much true. Nobody care about BitInt until they became aware of, and now it's must-have.
On 30/11/2025 18:55, bart wrote:
On 30/11/2025 17:17, Janis Papanagnou wrote:
On 2025-11-30 13:51:22, bart wrote:
<snip>
I've said many times that it's a poorly designed feature.
You have said continuously that you think everything about C, along with everything you believe to be related to C (however tenuous the
connection may be in reality) is poorly designed.
absolutely everything that you did not design personally, is poorly
designed in your eyes. I'm not even sure you think your own languages
are well designed, given the number of times you've found new features
or limitations that you didn't know they had.
Perhaps you just have a different scale of what is "poor design" and
"good design". Or maybe you don't understand that things don't have to
be perfect to be good enough in practice.
You certainly don't seem to
understand that when there is more than one person involved - and for C, there are millions involved - compromises are inevitable, and elegance
of design must bow to compatibility requirements.
Read the thread, as I'm not going to repeat things.
Is that a promise?
The answer, it seems, is that many people do think they can use _BitInt
to make their code better in some way. It doesn't matter if one person thinks _BitInt(128) will be useful, while another thinks _BitInt(12) is something they'd use.
It doesn't matter if they will use them for FPGA
programming, small-systems embedded programming, cryptography, neater bitfield structs, or whatever. And most importantly, it does not matter
in the slightest if someone does /not/ want to use a particular size of _BitInt.
Philipp Klaus Krause <pkk@spth.de> wrote:
Am 22.10.25 um 14:45 schrieb Thiago Adams:
Is anyone using or planning to use this new C23 feature?
What could be the motivation?
Saving memory by using the smallest multiple-of-8 N that will do.
IIUC nothing in the standard says that it is smallest multiple-of-8.
Using gcc-15.1 on AMD-64 is get 'sizeof(_BitInt(22))' equal to 4,
while the number cound fit in 3 bytes.
Also
being able to use bit-fields wider than int.
For me main gain is reasonably standard syntax for integers bigger
that 64 bits.
On 01/12/2025 02:32, Keith Thompson wrote:
Michael S <already5chosen@yahoo.com> writes:
[...]
Now, if you ask me, I don't understand why Waldek Hebisch considers
difference between 8-bit and [byte-addressable] 16-bit targets
important. As far as size of relevant C types goes, they look the same:
char - 8 bits
int - 16 bit
long - 32 bits
There is possibly difference in the size of 'short', but I don't
understand why it matters.
Given 16-bit int, short is almost certain to be 16 bits as well.
char is requires to be at least 8 bits, short and int at least 16, and
long at least 32 (and long long at least 64).
Or is 8-bit short used in some non-conforming mode?
Some C compilers for 8-bit devices have non-conforming modes with 8-bit
int. (I've seen one that, by default, had 16-bit int but did not
promote 8-bit types to int for arithmetic. That caused some subtle
problems for us.) I don't know if SDCC has such a mode (avr-gcc does).
On 26/11/2025 19:42, bart wrote:
On 26/11/2025 16:37, David Brown wrote:
On 26/11/2025 16:44, bart wrote:
Well, it would be a minority. Grown-up languages with decent syntax
exist such as Ada and Fortran; those are not that popular. People
prefer brace-based languages such as C, Java, Go, Zig, Rust.
Anything without braces isn't taken as seriously, eg. scripting
languages.
What a /very/ strange way to distinguish or classify languages. And
what a bizarre way to generalise what people think, as though all programmers share the same opinions.
On 01/12/2025 10:33, David Brown wrote:
On 30/11/2025 18:55, bart wrote:
On 30/11/2025 17:17, Janis Papanagnou wrote:
On 2025-11-30 13:51:22, bart wrote:
<snip>
I've said many times that it's a poorly designed feature.
You have said continuously that you think everything about C, along
with everything you believe to be related to C (however tenuous the
connection may be in reality) is poorly designed.
With C it is 75% about syntax. Some of it is about type systems, but
mostly it is similar to what I have. The new _BitInt (which as you know
I don't much like) makes the divergence greater.
It seems that
absolutely everything that you did not design personally, is poorly
designed in your eyes. I'm not even sure you think your own languages
are well designed, given the number of times you've found new features
or limitations that you didn't know they had.
Yeah, I'm a bit of a perfectionist. So what?
Perhaps you just have a different scale of what is "poor design" and
"good design". Or maybe you don't understand that things don't have
to be perfect to be good enough in practice.
And my own designs are also full of compromises. One big one is that the systems language knows its place; I keep it simple and don't try and
make it one or two levels higher. Other modern 'systems' language are
much higher level, but also harder to use, more complicated, and less efficient to process.
You certainly don't seem to understand that when there is more than
one person involved - and for C, there are millions involved -
compromises are inevitable, and elegance of design must bow to
compatibility requirements.
Read the thread, as I'm not going to repeat things.
Is that a promise?
How a look at the first post I made in the thread. I've no idea how to
do links, so a copy of it is pasted below.
My other posts are mostly defending that view.
The answer, it seems, is that many people do think they can use
_BitInt to make their code better in some way. It doesn't matter if
one person thinks _BitInt(128) will be useful, while another thinks
_BitInt(12) is something they'd use.
How much of this feature came about because of LLVM's support for
integer types up to 2**23 or 2**24 /bits/? I thought /that/ was crass.
It doesn't matter if they will use them for FPGA programming,
small-systems embedded programming, cryptography, neater bitfield
structs, or whatever. And most importantly, it does not matter in the
slightest if someone does /not/ want to use a particular size of _BitInt.
That's like saying we should all be using C++ compilers for C programs,
and just ignore all the features we don't want.
So why /do/ C-only compilers still exist?
=============================================================================
bart:
On 23/11/2025 13:32, Waldek Hebisch wrote:
Philipp Klaus Krause <pkk@spth.de> wrote:
Am 22.10.25 um 14:45 schrieb Thiago Adams:
Is anyone using or planning to use this new C23 feature?
What could be the motivation?
Saving memory by using the smallest multiple-of-8 N that will do.
IIUC nothing in the standard says that it is smallest multiple-of-8.
Using gcc-15.1 on AMD-64 is get 'sizeof(_BitInt(22))' equal to 4,
while the number cound fit in 3 bytes.
The rationale mentions a use-case where there is a custom processor that might actually have a 22-bit hardware types.
Implementing such odd-size types on regular 8/16/32/64-bit hardware is
full of problems if you want to do it without padding (in order to get
the savings). On even with padding (to get the desired overflow semantics).
Such as working out how pointers to them will work.
Also
being able to use bit-fields wider than int.
For me main gain is reasonably standard syntax for integers bigger
that 64 bits.
Standard syntax I guess would be something like int128_t and int256_t.
Such wider integers tend to be powers of two.
But there are two problems with _BitInt:
* Any odd sizes are allowed, such as _BitInt(391)
* There appears to be no upper limit on size, so _BitInt(2997901) is a
valid type
So what is the result type of multiplying values of those two types?
Integer sizes greater than 1K or 2K bits should use an arbitrary
precision type (which is how large _BitInts will likely be implemented anyway), where the precision is a runtime attribute.
On Wed, 26 Nov 2025 21:43:59 +0100
David Brown <david.brown@hesbynett.no> wrote:
On 26/11/2025 19:42, bart wrote:
On 26/11/2025 16:37, David Brown wrote:
On 26/11/2025 16:44, bart wrote:
Well, it would be a minority. Grown-up languages with decent syntax
exist such as Ada and Fortran; those are not that popular. People
prefer brace-based languages such as C, Java, Go, Zig, Rust.
Anything without braces isn't taken as seriously, eg. scripting
languages.
What a /very/ strange way to distinguish or classify languages. And
what a bizarre way to generalise what people think, as though all
programmers share the same opinions.
I think that Bart is spot on.
Curly languages are much more likely to be widely accepted than others.
The difference between Bart and me is that I like it.
I strongly prefer VHDL over Verilog, but that's due to semantics and
despite too wordy syntax of the former.
On 01/12/2025 07:36, David Brown wrote:
On 01/12/2025 02:32, Keith Thompson wrote:
Michael S <already5chosen@yahoo.com> writes:
[...]
Now, if you ask me, I don't understand why Waldek Hebisch considers
difference between 8-bit and [byte-addressable] 16-bit targets
important. As far as size of relevant C types goes, they look the same: >>>> char - 8 bits
int - 16 bit
long - 32 bits
There is possibly difference in the size of 'short', but I don't
understand why it matters.
Given 16-bit int, short is almost certain to be 16 bits as well.
char is requires to be at least 8 bits, short and int at least 16, and
long at least 32 (and long long at least 64).
Or is 8-bit short used in some non-conforming mode?
Some C compilers for 8-bit devices have non-conforming modes with
8-bit int. (I've seen one that, by default, had 16-bit int but did
not promote 8-bit types to int for arithmetic. That caused some
subtle problems for us.) I don't know if SDCC has such a mode
(avr-gcc does).
That sounds sensible to me. It's how my language for Z80 worked (and
that carried on into x86 until I introduced promotions).
If performing arithmetic on two 8-bit variables, on a machine with poor 16-bit support, you don't want the inefficiency of promoting both to
16-bit (needing extra instructions), doing the operation at 16 bits
(which may need extra instructions), and then probably discarding the
high byte anyway.
There were some issues with that, that you had to be aware of:
byte a := 255
print a + 1
This would show 0 not 256.
I'd be happy if C did not have integer promotions.
bart <bc@freeuk.com> wrote:
On 01/12/2025 00:08, Waldek Hebisch wrote:
bart <bc@freeuk.com> wrote:
Yet what I said is pretty much true. Nobody care about BitInt until they >>>> became aware of, and now it's must-have.
Well, you were told many times that regulars here know deficiencies
of C. "Nobody care about BitInt" in the sense that before _BitInt
people will say "this can not be expressed directly in C, you need
such and such workaround". People did not loudly complain
knowing that complaints would achive nothing. But say doing
language comparisons they could note that C lack such a feature.
There is also a psychological phenomenon: computers even in crude
form are quite useful. So people were willing to jump hops to
use them. But when better/easier approach is available people
wery strongly resist going to old ways. So, once w got _BitInt
you will not be able to take it back.
I've been claiming that _BitInt was a poor fit for a language at the
level of C which lacks some more fundamental features.
But I think I was wrong: the way _BitInt has been devised and presented
is actually completely in line with the haphazard way C has evolved up
to now.
I made the mistake in this thread of thinking that people cared about
measured language design; obviously if they're using C, they don't.
unsigned char* p;
uint8_t* q; // only exists when stdint.h used
unsigned _BitInt(8)* r;
char* s;
p and q are probably compatible. p and r are not; q and r and not. s is
incompatible with p, q, r even if it is unsigned.
Do you understand that uint8_t and _BitInt(8) are different types?
And the difference is not an accident, but they have different
properties (uint8_t in expressions promotes to int, _BitInt(8)
is not subject to this promotion).
On 01/12/2025 04:10, Waldek Hebisch wrote:
bart <bc@freeuk.com> wrote:
On 01/12/2025 00:08, Waldek Hebisch wrote:
bart <bc@freeuk.com> wrote:
Yet what I said is pretty much true. Nobody care about BitInt until >>>>> they
became aware of, and now it's must-have.
Well, you were told many times that regulars here know deficiencies
of C. "Nobody care about BitInt" in the sense that before _BitInt
people will say "this can not be expressed directly in C, you need
such and such workaround". People did not loudly complain
knowing that complaints would achive nothing. But say doing
language comparisons they could note that C lack such a feature.
There is also a psychological phenomenon: computers even in crude
form are quite useful. So people were willing to jump hops to
use them. But when better/easier approach is available people
wery strongly resist going to old ways. So, once w got _BitInt
you will not be able to take it back.
I've been claiming that _BitInt was a poor fit for a language at the
level of C which lacks some more fundamental features.
But I think I was wrong: the way _BitInt has been devised and presented
is actually completely in line with the haphazard way C has evolved up
to now.
I made the mistake in this thread of thinking that people cared about
measured language design; obviously if they're using C, they don't.
unsigned char* p;
uint8_t* q; // only exists when stdint.h used
unsigned _BitInt(8)* r;
char* s;
p and q are probably compatible. p and r are not; q and r and not. s is
incompatible with p, q, r even if it is unsigned.
Do you understand that uint8_t and _BitInt(8) are different types?
Well, apparently they aren't. It's not immediately obvious why, but as I explained above, I realise this is entirely in keeping with how C works.
And the difference is not an accident, but they have different
properties (uint8_t in expressions promotes to int, _BitInt(8)
is not subject to this promotion).
This is another little rule that is not obvious, and out of keeping with
how other types work. Yet add _BitInt(8) to _BitInt(16), and one side
/is/ promoted.
My example was just to highlight the plethora of type denotations that exist, even for the same machine type. The rules for type-compatibility
and promotions (and the ugly syntax) is just icing on top.
This ungainly way to evolve a language is how C works (just look at all
the things wrong with how stdint.h types were handled).
The following table for example shows the rules for mixed sign
arithmetic: S means the result (32 or 64 bits) has signed type, and u
means it is unsigned:
u8 u16 u32 u64 i8 i16 i32 i64
u8 S S u u S S S S
u16 S S u u S S S S
u32 u u u u u u u S
u64 u u u u u u u u
i8 S S u u S S S S
i16 S S u u S S S S
i32 S S u u S S S S
i64 S S S u S S S S
But of course, every C programmer knows this and doesn't need such a chart!
Here, i8 + u8 gives a signed result; but 'unsigned _BitInt(8 ) +
_Bitint(8)' apparently gives an unsigned result (tested using _Generic).
So another plus point for staying close to the C spirit!
On 01/12/2025 13:37, David Brown wrote:
I'd be happy if C did not have integer promotions.
Well, now it doesn't with _BitInt types. So this stores 0 in 'a', not 256:
int a;
unsigned _BitInt(8) b = 255, c = 1;
a = b + c;
(I am not entirely sure, but I think it is standards-conforming for an implementation to haev BITINT_MAXWIDTH set to 64 and support all[...]
_BitInts up size 64, and then also support _BitInts of multiples of 64 thereafter. Use of _BitInt greater than BITINT_MAXWIDTH is UB in the standard - so an implementation can choose to give that a defined
behaviour for specific sizes.)
On 01/12/2025 15:41, bart wrote:
No, _BitInt's never use integer promotion. Perhaps you mean that they
are included in the rules for "usual arithmetic conversions" ? These
are different from the "integer promotion" rules. One would think that someone who claims to have implemented a C compiler would be familiar
with the types of implicit conversions required by the language.
My example was just to highlight the plethora of type denotations that
exist, even for the same machine type. The rules for type-
compatibility and promotions (and the ugly syntax) is just icing on top.
C is not an abstraction for a processor. It is a programming language.
It does not differentiate between types nearly as much as I would like,
but it still does so more than an untyped language like assembly.
This ungainly way to evolve a language is how C works (just look at
all the things wrong with how stdint.h types were handled).
The following table for example shows the rules for mixed sign
arithmetic: S means the result (32 or 64 bits) has signed type, and u
means it is unsigned:
u8 u16 u32 u64 i8 i16 i32 i64
C programmers know that C does not have types of these names.
And I
expect most C programmers figure things out using the very simple and consistent rules of the language.
Here, i8 + u8 gives a signed result; but 'unsigned _BitInt(8 ) +
_Bitint(8)' apparently gives an unsigned result (tested using _Generic).
Or you could learn the very simple rules, and then you would know
without testing.
On 01/12/2025 15:24, David Brown wrote:
On 01/12/2025 15:41, bart wrote:
No, _BitInt's never use integer promotion. Perhaps you mean that they
are included in the rules for "usual arithmetic conversions" ? These
are different from the "integer promotion" rules. One would think
that someone who claims to have implemented a C compiler would be
familiar with the types of implicit conversions required by the language.
I implement C via an IL. The IL doesn't use any automatic promotions.
There is only one instruction WIDEN to zero- or sign-extend any value.
So I think in those terms.
My example was just to highlight the plethora of type denotations
that exist, even for the same machine type. The rules for type-
compatibility and promotions (and the ugly syntax) is just icing on top. >>>
C is not an abstraction for a processor. It is a programming
language. It does not differentiate between types nearly as much as I
would like,
It seems to be doing a good job!
but it still does so more than an untyped language like assembly.
There is type-specific stuff going on, but it's done via the choices of instruction.
My IL supports the usual set of 8/16/32/64-bit-based types, and type-
info is a separate attribute for each instruction.
There is no direct provision for sub-byte/sub-word types, and the only
type bigger than a word (putting aside a reserved set of vector types),
is an anonymous data-block type, specified as so many bytes in size.
So you can see how an arbitrary [unsigned] _BitInt(N) bit-precise type
would be a poor fit, and a bit of a nightmare to implement on top. It's unnatural.
The IL is designed to be one abstraction level higher than typical
machine architectures, and you don't really want HLL-specific features
in it.
This is related to my remark about LLVM and its building-in of such
types. But LLVM is some 3-4 magnitudes bigger in scale than what I do,
and famously slow and cumbersome in operation.
This ungainly way to evolve a language is how C works (just look at
all the things wrong with how stdint.h types were handled).
The following table for example shows the rules for mixed sign
arithmetic: S means the result (32 or 64 bits) has signed type, and u
means it is unsigned:
u8 u16 u32 u64 i8 i16 i32 i64
C programmers know that C does not have types of these names.
Unforunately, if I'd used 'unsigned long long int' and so on, the chart becomes impossibly large.
While with 'uint64_t' etc, such types don't exist at all in C, unless a particular header is used (and they might still result in line-wrapping).
Fortunately I'm 100% certain that no one reading this is scratching
their heads about what those types could possible mean.
And I expect most C programmers figure things out using the very
simple and consistent rules of the language.
The chart demonstratates a number of inconsistencies.
Here, i8 + u8 gives a signed result; but 'unsigned _BitInt(8 ) +
_Bitint(8)' apparently gives an unsigned result (tested using _Generic). >>>
Or you could learn the very simple rules, and then you would know
without testing.
So you're not commenting on the fact that mixed 8-bit arithmetic has opposite signedness between existing types and _BitInt types.
(My language also has rules, but the equivalent chart is simpler: every entry has 'S' except for u64/u64 (and I'm working on that!).
While with my decimal big-number library, all numbers are signed so the issue doesn't come up at all.)
David Brown <david.brown@hesbynett.no> writes:
[...]
(I am not entirely sure, but I think it is standards-conforming for an[...]
implementation to haev BITINT_MAXWIDTH set to 64 and support all
_BitInts up size 64, and then also support _BitInts of multiples of 64
thereafter. Use of _BitInt greater than BITINT_MAXWIDTH is UB in the
standard - so an implementation can choose to give that a defined
behaviour for specific sizes.)
No, _BitInt(N) where N > BITINT_MAXWIDTH is a constraint violation.
N3220 6.7.3.1p2 ("Constraints") :
The parenthesized constant expression that follows the _BitInt
keyword shall be an integer constant expression N that specifies
the width (6.2.6.2) of the type. The value of N for unsigned
_BitInt shall be greater than or equal to 1. The value of N
for _BitInt shall be greater than or equal to 2. The value of
N shall be less than or equal to the value of BITINT_MAXWIDTH
(see 5.2.5.3.2).
As I mentioned before, there's a proposal for C2y to allow
signed _BitInt(1).
Of course an implementation could do what you suggest as an extension.
Well, it would be a minority. Grown-up languages with decent syntax[...]
exist such as Ada and Fortran; those are not that popular. People
prefer brace-based languages such as C, Java, Go, Zig, Rust.
Anything without braces isn't taken as seriously, eg. scripting languages.
On 01/12/2025 18:19, bart wrote:
If you are implementing only a partial sort-of-C compiler, with a view
to acting as a limited tool for a specific close subset of C, then it's
fine to change or skip rules. Perhaps your tool is never called upon to
do arithmetic on types that are not of size "int" or above, or perhaps
you actively decide to have different rules. That's okay - but it would
be misleading to call it a "C compiler".
but it still does so more than an untyped language like assembly.
There is type-specific stuff going on, but it's done via the choices
of instruction.
We are talking about C, here in a C newsgroup.
The thing you have to remember about implementations, is that nobody
really cares how hard it is to implement a particular C feature. The C standards folk care that it is /possible/ to implement it, not if it
happens to be easy or difficult for any particular compiler (especially
an obscure private sort-of-C compiler that is only used by one person).
They do provide some escape hatches for implementers who feel a
particular feature is too difficult to make for the amount of use it
would get from their users - a number of C features are optional, and
for _BitInt, you can limit the size to just 64 bits.
This is related to my remark about LLVM and its building-in of such
types. But LLVM is some 3-4 magnitudes bigger in scale than what I do,
and famously slow and cumbersome in operation.
The _BitInt feature is not designed arount LLVM. You are mistaken in believing so. It took inspiration from clang's _ExtInt feature and how
it could be used, not how it was implemented.
But if you want to have types like that in C, they are called "uint8_t", etc. You know this. You only use these silly abbreviations for provocation,
irrelevant language uses them and somehow feel smug about it all.
They exist in the C standard regardless of any headers, because the C standard - the definition of the C language - is independent of any
program.
Fortunately I'm 100% certain that no one reading this is scratching
their heads about what those types could possible mean.
And I expect most C programmers figure things out using the very
simple and consistent rules of the language.
The chart demonstratates a number of inconsistencies.
No, it does not - at least, not inconsistencies in C. It may
demonstrate inconsistencies in your understanding and misunderstanding
of the language.
No. As I said, the rules are clear and simple. Anyone so incapable of learning that they have trouble with them, is likely to have a great
deal of difficulty doing much programming.
(And again, I point out that I do not think the C rules here are the
best options for a programming language. But that is my personal
opinion, and it does not affect my ability to understand the rules and
write correct C code using them.)
(My language also has rules, but the equivalent chart is simpler:
every entry has 'S' except for u64/u64 (and I'm working on that!).
That would be an alternative set of rules that would also be easy to
learn. And it would be, IMHO, at least as bad as C's choices. There is
no set of rules that can handle mixed signed arithmetic well and
implicitly without expanding type sizes. Your language is different
from C's, not better.
On 01/12/2025 04:10, Waldek Hebisch wrote:
bart <bc@freeuk.com> wrote:
On 01/12/2025 00:08, Waldek Hebisch wrote:
bart <bc@freeuk.com> wrote:
Yet what I said is pretty much true. Nobody care about BitInt until they >>>>> became aware of, and now it's must-have.
Well, you were told many times that regulars here know deficiencies
of C. "Nobody care about BitInt" in the sense that before _BitInt
people will say "this can not be expressed directly in C, you need
such and such workaround". People did not loudly complain
knowing that complaints would achive nothing. But say doing
language comparisons they could note that C lack such a feature.
There is also a psychological phenomenon: computers even in crude
form are quite useful. So people were willing to jump hops to
use them. But when better/easier approach is available people
wery strongly resist going to old ways. So, once w got _BitInt
you will not be able to take it back.
I've been claiming that _BitInt was a poor fit for a language at the
level of C which lacks some more fundamental features.
But I think I was wrong: the way _BitInt has been devised and presented
is actually completely in line with the haphazard way C has evolved up
to now.
I made the mistake in this thread of thinking that people cared about
measured language design; obviously if they're using C, they don't.
unsigned char* p;
uint8_t* q; // only exists when stdint.h used
unsigned _BitInt(8)* r;
char* s;
p and q are probably compatible. p and r are not; q and r and not. s is
incompatible with p, q, r even if it is unsigned.
Do you understand that uint8_t and _BitInt(8) are different types?
Well, apparently they aren't. It's not immediately obvious why, but as
I explained above, I realise this is entirely in keeping with how C
works.
And the difference is not an accident, but they have different
properties (uint8_t in expressions promotes to int, _BitInt(8)
is not subject to this promotion).
This is another little rule that is not obvious, and out of keeping
with how other types work. Yet add _BitInt(8) to _BitInt(16), and one
side /is/ promoted.
My example was just to highlight the plethora of type denotations that
exist, even for the same machine type. The rules for
type-compatibility and promotions (and the ugly syntax) is just icing
on top.
This ungainly way to evolve a language is how C works (just look at
all the things wrong with how stdint.h types were handled).
The following table for example shows the rules for mixed sign
arithmetic: S means the result (32 or 64 bits) has signed type, and u
means it is unsigned:
u8 u16 u32 u64 i8 i16 i32 i64
u8 S S u u S S S S
u16 S S u u S S S S
u32 u u u u u u u S
u64 u u u u u u u u
i8 S S u u S S S S
i16 S S u u S S S S
i32 S S u u S S S S
i64 S S S u S S S S
But of course, every C programmer knows this and doesn't need such a chart!
Here, i8 + u8 gives a signed result; but 'unsigned _BitInt(8 ) +
_Bitint(8)' apparently gives an unsigned result (tested using
_Generic).
So another plus point for staying close to the C spirit!
On 01/12/2025 17:56, Keith Thompson wrote:
David Brown <david.brown@hesbynett.no> writes:
[...]
(I am not entirely sure, but I think it is standards-conforming for an[...]
implementation to haev BITINT_MAXWIDTH set to 64 and support all
_BitInts up size 64, and then also support _BitInts of multiples of 64
thereafter. Use of _BitInt greater than BITINT_MAXWIDTH is UB in the
standard - so an implementation can choose to give that a defined
behaviour for specific sizes.)
No, _BitInt(N) where N > BITINT_MAXWIDTH is a constraint violation.
N3220 6.7.3.1p2 ("Constraints") :
The parenthesized constant expression that follows the _BitInt
keyword shall be an integer constant expression N that specifies
the width (6.2.6.2) of the type. The value of N for unsigned
_BitInt shall be greater than or equal to 1. The value of N
for _BitInt shall be greater than or equal to 2. The value of
N shall be less than or equal to the value of BITINT_MAXWIDTH
(see 5.2.5.3.2).
As I mentioned before, there's a proposal for C2y to allow
signed _BitInt(1).
Of course an implementation could do what you suggest as an
extension.
Yes, of course - violating a constraint is UB, but it also requires a diagnostic.
So while an implementation could implement a limited
selection of _BitInt's larger than BITINT_MAXWIDTH, if it is to
conform to the C standards it would have to at least give a warning
message when you used these _BitInt's. As an extension (perhaps
enabled by a flag), it could then suppress such diagnostics.
bart <bc@freeuk.com> writes:
The following table for example shows the rules for mixed sign
arithmetic: S means the result (32 or 64 bits) has signed type, and u
means it is unsigned:
u8 u16 u32 u64 i8 i16 i32 i64
u8 S S u u S S S S
u16 S S u u S S S S
u32 u u u u u u u S
u64 u u u u u u u u
i8 S S u u S S S S
i16 S S u u S S S S
i32 S S u u S S S S
i64 S S S u S S S S
But of course, every C programmer knows this and doesn't need such a chart!
I'm not going to take the time to confirm that your chart is correct.
It assumes that int is 32 bits; an implementation with 16-bit or
64-bit int would require a different chart.
But the fact that you were able to generate the chart means that
*you already understand the rules*. You just choose to express
those rules in a way that's more confusing.
The use of curly braces vs. begin/end is IMHO trivial. [...]
Someone who dislikes C for whatever reasons will probably dislike
most other languages that use curly braces, and not necessarily
because of that one syntactic detail.
On 01/12/2025 20:34, Keith Thompson wrote:[...]
I'm not going to take the time to confirm that your chart is
correct.
It assumes that int is 32 bits; an implementation with 16-bit or
64-bit int would require a different chart.
But the fact that you were able to generate the chart means that
*you already understand the rules*. You just choose to express
those rules in a way that's more confusing.
That doesn't follow; the chart was created by a C program (by
submitting 64 combinations of typed variables (eg. issigned(x * y)) to
the macro below, compiled with gcc.
My own C compiler produces a quite different chart, but I'm not
interested at this point in rewriting the front-end, considering the
many other ways it doesn't conform.
Fortunately this doesn't seem to affect too many things.
As example, the above says that i8 * u32 (or u32 * i8) is unsigned; my
chart says it's signed. The difference can be demonstrated here:
signed char a = -2;
unsigned int b = 13;
printf("%f\n", (double)(a*b));
The output is:
gcc 4294967270.000000
bcc -26.000000
I don't know about you, but to me, my result looks a lot more
intuitive! So I also didn't /want/ to change it to something I didn't
agree with.
-------------------------------------------------
#define issigned(x) _Generic((x),\
int8_t: "S",\
int16_t: "S",\
int32_t: "S",\
int64_t: "S",\
uint8_t: "u",\
uint16_t: "u",\
uint32_t: "u",\
uint64_t: "u",\
default: "other")
I see nothing in the standard that prevents '_BitInt(64)' and 'long'
to be the same type. And AFICS promotion rules are the only thing
which prevents 'uint8_t' and '_BitInt(8)' to be the same type.
Maybe I missed something, but I have read posts that appeared here
and I saw nothing indicating otherwise.
On 2025-12-01 21:06:13, Keith Thompson wrote:
The use of curly braces vs. begin/end is IMHO trivial. [...]
Someone who dislikes C for whatever reasons will probably dislike
most other languages that use curly braces, and not necessarily
because of that one syntactic detail.
There may also be just simple practical real-life facts that
influence the preferences of languages with curly braces (or
brackets). I want to remind that keyboards from other domains
may not have the simple access to the [ ] { } characters! On
my US keyboard [ and ] are adjacent and directly accessible,
and { and } are on the same keys reachable simply with 'Shift'.
That's extremely convenient if you're programming C-like syntax!
Though on my German keyboard these characters are placed on the
top numbers row in one line, ordered as { [ ] }, and reachable
only through the 'Alt Gr' key. This is really a pain to type.
For _very common characters_ in a fairly common and rich family
of programming languages it's an issue [in such non-US domains].
But in days gone by if anyone ever needed to use trigraphs for C programming, then I am sure they would happily switch to a word-based language given half a chance. I find "{ }" nicer than "begin end", but
I'd pick "begin end" over "??< ??>" any day!
On 01/12/2025 23:59, Janis Papanagnou wrote:
On 2025-12-01 21:06:13, Keith Thompson wrote:
The use of curly braces vs. begin/end is IMHO trivial. [...]
Someone who dislikes C for whatever reasons will probably dislike
most other languages that use curly braces, and not necessarily
because of that one syntactic detail.
There may also be just simple practical real-life facts that
influence the preferences of languages with curly braces (or
brackets). I want to remind that keyboards from other domains
may not have the simple access to the [ ] { } characters! On
my US keyboard [ and ] are adjacent and directly accessible,
and { and } are on the same keys reachable simply with 'Shift'.
That's extremely convenient if you're programming C-like syntax!
Though on my German keyboard these characters are placed on the
top numbers row in one line, ordered as { [ ] }, and reachable
only through the 'Alt Gr' key. This is really a pain to type.
For _very common characters_ in a fairly common and rich family
of programming languages it's an issue [in such non-US domains].
My Norwegian keyboard needs AltGr for {[]}, but I don't find it a burden
- it's habit, I suppose.
But in days gone by if anyone ever needed to use trigraphs for C programming, then I am sure they would happily switch to a word-based language given half a chance. I find "{ }" nicer than "begin end", but
I'd pick "begin end" over "??< ??>" any day!
On 02/12/2025 07:31, David Brown wrote:
On 01/12/2025 23:59, Janis Papanagnou wrote:
On 2025-12-01 21:06:13, Keith Thompson wrote:
The use of curly braces vs. begin/end is IMHO trivial. [...]
Someone who dislikes C for whatever reasons will probably dislike
most other languages that use curly braces, and not necessarily
because of that one syntactic detail.
There may also be just simple practical real-life facts that
influence the preferences of languages with curly braces (or
brackets). I want to remind that keyboards from other domains
may not have the simple access to the [ ] { } characters! On
my US keyboard [ and ] are adjacent and directly accessible,
and { and } are on the same keys reachable simply with 'Shift'.
That's extremely convenient if you're programming C-like syntax!
Though on my German keyboard these characters are placed on the
top numbers row in one line, ordered as { [ ] }, and reachable
only through the 'Alt Gr' key. This is really a pain to type.
For _very common characters_ in a fairly common and rich family
of programming languages it's an issue [in such non-US domains].
My Norwegian keyboard needs AltGr for {[]}, but I don't find it a
burden - it's habit, I suppose.
But in days gone by if anyone ever needed to use trigraphs for C
programming, then I am sure they would happily switch to a word-based
language given half a chance. I find "{ }" nicer than "begin end",
but I'd pick "begin end" over "??< ??>" any day!
So:
if .. then begin ... end else begin ... end
... represents multiple statements.
Even I would see braces in a more favourable light. I wonder why it took some years for language designers to realise you could simply have:
if .. then ... else ... end
Unfortunately that didn't really work for braces:
if (..) ... else ... }
On 01/12/2025 23:59, Janis Papanagnou wrote:
On 2025-12-01 21:06:13, Keith Thompson wrote:
The use of curly braces vs. begin/end is IMHO trivial. [...]
Someone who dislikes C for whatever reasons will probably dislike
most other languages that use curly braces, and not necessarily
because of that one syntactic detail.
There may also be just simple practical real-life facts that
influence the preferences of languages with curly braces (or
brackets). I want to remind that keyboards from other domains
may not have the simple access to the [ ] { } characters! On
my US keyboard [ and ] are adjacent and directly accessible,
and { and } are on the same keys reachable simply with 'Shift'.
That's extremely convenient if you're programming C-like syntax!
Though on my German keyboard these characters are placed on the
top numbers row in one line, ordered as { [ ] }, and reachable
only through the 'Alt Gr' key. This is really a pain to type.
For _very common characters_ in a fairly common and rich family
of programming languages it's an issue [in such non-US domains].
My Norwegian keyboard needs AltGr for {[]}, but I don't find it a burden
- it's habit, I suppose.
But in days gone by if anyone ever needed to use trigraphs for C programming, then I am sure they would happily switch to a word-based language given half a chance. I find "{ }" nicer than "begin end", but
I'd pick "begin end" over "??< ??>" any day!
So:
if .. then begin ... end else begin ... end
... represents multiple statements.
Even I would see braces in a more favourable light. I wonder why it took some years for language designers to realise you could simply have:
if .. then ... else ... end
[...]
On 2025-12-02 08:31:53, David Brown wrote:
On 01/12/2025 23:59, Janis Papanagnou wrote:
On 2025-12-01 21:06:13, Keith Thompson wrote:
The use of curly braces vs. begin/end is IMHO trivial.? [...]
Someone who dislikes C for whatever reasons will probably dislike
most other languages that use curly braces, and not necessarily
because of that one syntactic detail.
There may also be just simple practical real-life facts that
influence the preferences of languages with curly braces (or
brackets). I want to remind that keyboards from other domains
may not have the simple access to the [ ] { } characters! On
my US keyboard [ and ] are adjacent and directly accessible,
and { and } are on the same keys reachable simply with 'Shift'.
That's extremely convenient if you're programming C-like syntax!
Though on my German keyboard these characters are placed on the
top numbers row in one line, ordered as { [ ] }, and reachable
only through the 'Alt Gr' key. This is really a pain to type.
For _very common characters_ in a fairly common and rich family
of programming languages it's an issue [in such non-US domains].
My Norwegian keyboard needs AltGr for {[]}, but I don't find it a
burden
- it's habit, I suppose.
Well, given that I'm using these keyboards since decades I'm
(sort of) "used" to that layout. Nonetheless its "complexity"
I'm feeling as burden; these _standard characters_ are far off
(upper row), non adjacent (with room for typos), and the key
to access them is available just on the right side (as opposed
to the Shift or Ctrl keys, or the useless "Window" key). It's
also practically a burden; my fingers get [literally] twisted
when typing, and the physis of the fingers is strained; at the
moment I'm suffering from aching sinews. The "typing ergonomy"
is extremely reduced when using these characters. For me that
really is (and ever was) a concrete burden, not only a little
nuisance.
But in days gone by if anyone ever needed to use trigraphs for C programming, then I am sure they would happily switch to a
word-based language given half a chance.? I find "{ }" nicer than
"begin end", but I'd pick "begin end" over "??< ??>" any day!
I've never even had considered using trigraphs.
Janis
I never used Western European keyboard, so probably don't understand something very basic.
Suppose, you use Greek/Latin keyboard (or InScript/Latin, or
Cyrillic/Latin, or Hebrew/Latin, Arabic/Latin, Thai/Latin,
Vietnamese//Latin, etc ...). When you right code, you just switch the keyboard layout from Greek to English (US) or English (UK). It's easy.
Tens of millions of programmers do it all the time, instinctively.
Why Western Europeans can't do exactly the same?
Just because their
native scripts are also Latin-based? To me it does not sound as a
meaningful reason :(
I never used Western European keyboard, so probably don't understand something very basic.
Suppose, you use Greek/Latin keyboard (or InScript/Latin, or
Cyrillic/Latin, or Hebrew/Latin, Arabic/Latin, Thai/Latin,
Vietnamese//Latin, etc ...). When you right code, you just switch the keyboard layout from Greek to English (US) or English (UK). It's easy.
Tens of millions of programmers do it all the time, instinctively.
Why Western Europeans can't do exactly the same? Just because their
native scripts are also Latin-based? To me it does not sound as a
meaningful reason :(
But given that several compilers have already implemented bit-precise
integer types *without* either allowing N > BITINT_MAXWDITH or
disallowing "odd" values, I don't think it will be an issue in
practice.
Am 02.12.25 um 08:31 schrieb David Brown:
But in days gone by if anyone ever needed to use trigraphs for C
programming, then I am sure they would happily switch to a
word-based language given half a chance. I find "{ }" nicer than
"begin end", but I'd pick "begin end" over "??< ??>" any day!
AFAIK, there never was a real user of trigraphs (unless you count
compiler test suites). AFAIK for all real-world use digraphs were
sufficient.
There have been actual uses of trigraphs. Richard Heathfield posted
this on this newsgroup in 2010 :
Yes, they are still needed, for example in some mainframe
environments. They make the code look astoundingly ugly, but
they do at least make it work. It is not uncommon for "normal"
C code to be written and tested on PCs, then run through
a conversion program to replace monographs with trigraphs
where required before transfer to the mainframe for final
testing. That way, you get the readability where it matters,
and the usability where /that/ matters.
But trigraphs have been removed in C23.
Am 01.12.25 um 21:42 schrieb Keith Thompson:
But given that several compilers have already implemented bit-precise
integer types *without* either allowing N > BITINT_MAXWDITH or
disallowing "odd" values, I don't think it will be an issue in
practice.
There are extensions regarding _BitInt in existing implementations: SDCC allows signed _BitInt(1), and allows the use of _BitInt as underlying
type for enum.
In --std-c23 mode, SDCC will emit a warning when encountering those, but
the semantics are of course the same as in --std-sdcc23 mode.
Philipp Klaus Krause <pkk@spth.de> writes:
Am 02.12.25 um 08:31 schrieb David Brown:
But in days gone by if anyone ever needed to use trigraphs for C
programming, then I am sure they would happily switch to a
word-based language given half a chance. I find "{ }" nicer than
"begin end", but I'd pick "begin end" over "??< ??>" any day!
AFAIK, there never was a real user of trigraphs (unless you count
compiler test suites). AFAIK for all real-world use digraphs were
sufficient.
There have been actual uses of trigraphs. Richard Heathfield posted
this on this newsgroup in 2010 :
Yes, they are still needed, for example in some mainframe
environments. They make the code look astoundingly ugly, but
they do at least make it work. It is not uncommon for "normal"
C code to be written and tested on PCs, then run through
a conversion program to replace monographs with trigraphs
where required before transfer to the mainframe for final
testing. That way, you get the readability where it matters,
and the usability where /that/ matters.
But trigraphs have been removed in C23.
On 02/12/2025 23:33, Keith Thompson wrote:[...]
But trigraphs have been removed in C23.
Then so, in some mainframe environments, have curly braces. I suppose
their fix will be to not adopt C23.
David Brown <david.brown@hesbynett.no> writes:...
On 01/12/2025 17:56, Keith Thompson wrote:
Undefined behavior is indicated in only three ways:No, _BitInt(N) where N > BITINT_MAXWIDTH is a constraint violation.
N3220 6.7.3.1p2 ("Constraints") :
The parenthesized constant expression that follows the _BitInt
keyword shall be an integer constant expression N that specifies
the width (6.2.6.2) of the type. The value of N for unsigned
_BitInt shall be greater than or equal to 1. The value of N
for _BitInt shall be greater than or equal to 2. The value of
N shall be less than or equal to the value of BITINT_MAXWIDTH
(see 5.2.5.3.2).
As I mentioned before, there's a proposal for C2y to allow
signed _BitInt(1).
Of course an implementation could do what you suggest as an
extension.
Yes, of course - violating a constraint is UB, but it also requires a
diagnostic.
I'd place a very different emphasis on that. Violating a constraint
requires a diagnostic, which needn't necessarily be fatal. If an implementation chooses to accept the code anyway, the resulting
behavior is probably undefined (though the standard doesn't say
so explicitly).
On 2025-12-01 15:42, Keith Thompson wrote:
David Brown <david.brown@hesbynett.no> writes:...
On 01/12/2025 17:56, Keith Thompson wrote:
No, _BitInt(N) where N > BITINT_MAXWIDTH is a constraint violation.
N3220 6.7.3.1p2 ("Constraints") :
The parenthesized constant expression that follows the _BitInt
keyword shall be an integer constant expression N that specifies
the width (6.2.6.2) of the type. The value of N for unsigned
_BitInt shall be greater than or equal to 1. The value of N
for _BitInt shall be greater than or equal to 2. The value of
N shall be less than or equal to the value of BITINT_MAXWIDTH
(see 5.2.5.3.2).
As I mentioned before, there's a proposal for C2y to allow
signed _BitInt(1).
Of course an implementation could do what you suggest as an
extension.
Yes, of course - violating a constraint is UB, but it also requires a
diagnostic.
I'd place a very different emphasis on that. Violating a constraint
requires a diagnostic, which needn't necessarily be fatal. If an
implementation chooses to accept the code anyway, the resulting
behavior is probably undefined (though the standard doesn't say
so explicitly).
Undefined behavior is indicated in only three ways:
"If a "shall" or "shall not" requirement that appears outside of a
constraint or runtime-constraint is violated, the behavior is undefined. Undefined behavior is otherwise indicated in this document by the words "undefined behavior" or by the omission of any explicit definition of behavior." (4p2).
Which of those three methods do you think applies? This "shall" occurs
inside a constraint. There's no explicit statement that it is undefined behavior. There is an explicit definition for the behavior, provided by
what the standard says about _BitInt outside of this constraint.
bart <bc@freeuk.com> writes:[...]
On 28/11/2025 00:39, Keith Thompson wrote:
Can you *please* do something about the way your newsreader
(apparently Mozilla Thunderbird) mangles quoted text? That first
quoted line, starting with "> How exactly", would have been just
74 columns, but your newsreader folded it, making it more difficult
to read. It also deletes blank lines between paragraphs.
I don't recall similar problems from other Thunderbird users.
I don't see anything amiss with quoted content in my own posts. My
last post looks like this to me:
https://github.com/sal55/langs/blob/master/tbird.png
In any case, I've no idea how to fix the problem, assuming it is at my end.
My apologies, the problem doesn't appear to be on your end.
| Sysop: | Tetrazocine |
|---|---|
| Location: | Melbourne, VIC, Australia |
| Users: | 14 |
| Nodes: | 8 (0 / 8) |
| Uptime: | 116:51:22 |
| Calls: | 184 |
| Files: | 21,502 |
| Messages: | 81,790 |