• Re: A Famous Security Bug

    From Chris M. Thomasson@3:633/280.2 to All on Fri Mar 22 06:37:58 2024
    On 3/21/2024 10:41 AM, Kaz Kylheku wrote:
    On 2024-03-21, David Brown <david.brown@hesbynett.no> wrote:
    On 20/03/2024 19:54, Kaz Kylheku wrote:
    On 2024-03-20, Stefan Ram <ram@zedat.fu-berlin.de> wrote:
    A "famous security bug":

    void f( void )
    { char buffer[ MAX ];
    /* . . . */
    memset( buffer, 0, sizeof( buffer )); }

    . Can you see what the bug is?

    I don't know about "the bug", but conditions can be identified under
    which that would have a problem executing, like MAX being in excess
    of available automatic storage.

    If the /*...*/ comment represents the elision of some security sensitive >>> code, where the memset is intended to obliterate secret information,
    of course, that obliteration is not required to work.

    After the memset, the buffer has no next use, so the all the assignments >>> performed by memset to the bytes of buffer are dead assignments that can >>> be elided.

    To securely clear memory, you have to use a function for that purpose
    that is not susceptible to optimization.

    If you're not doing anything stupid, like link time optimization, an
    external function in another translation unit (a function that the
    compiler doesn't recognize as being an alias or wrapper for memset)
    ought to suffice.

    Using LTO is not "stupid". Relying on people /not/ using LTO, or not
    using other valid optimisations, is "stupid".

    LTO is a nonconforming optimization. It destroys the concept that
    when a translation unit is translated, the semantic analysis is
    complete, such that the only remaining activity is resolution of
    external references (linkage), and that the semantic analysis of one translation unit deos not use information about another translation
    unit.
    [...]

    Side note:

    Actually, way back (pre c/c++ 11), I was worried about LTO messing up my custom, highly sensitive sync code.

    https://web.archive.org/web/20070509044340/http://appcore.home.comcast.net/

    Notice the externally assembled functions comment?

    "All of its “critical-sequences” are contained in externally assembled functions ( read all ) in order to prevent a rouge C compiler from
    reordering anything that would corrupt the data-structure. The queue
    allocates its nodes from a three-level cache"

    If a damn "rogue" compiler can mess with my custom ASM then things are
    going to be broken...

    ;^)



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Keith Thompson@3:633/280.2 to All on Fri Mar 22 06:42:29 2024
    Anton Shepelev <anton.txt@gmail.moc> writes:
    Kaz Kylheku to Stefan Ram:

    A "famous security bug":

    void f( void )
    { char buffer[ MAX ];
    /* . . . */
    memset( buffer, 0, sizeof( buffer )); }

    . Can you see what the bug is?

    I don't know about "the bug", but conditions can be
    identified under which that would have a problem
    executing, like MAX being in excess of available automatic
    storage.

    If the /*...*/ comment represents the elision of some
    security sensitive code, where the memset is intended to
    obliterate secret information, of course, that
    obliteration is not required to work.

    After the memset, the buffer has no next use, so the all
    the assignments performed by memset to the bytes of buffer
    are dead assignments that can be elided.

    To securely clear memory, you have to use a function for
    that purpose that is not susceptible to optimization.

    I think this behavior (of a C compiler) rather stupid. In a
    low-level imperative language, the compiled program shall
    do whatever the programmer commands it to do. If he
    commands it to clear the buffer, it shall clear the buffer.
    This optimisation is too high-level, too counter-inituitive,
    even deceitful. The optimiser is free to perform the task
    in the fastest manner possible, but it shall not ignore the
    programmer's order to zero-fill the buffer, especially
    without emitting a warning about (potentially!) redundant
    code, which it is the programmer's reponsibility to confirm
    and remove.

    Redundant code shall be dealt with in the source, rather than
    in the executable.

    Then C is not what you call a "low-level imperative language", and none
    of your "shall"s apply to the language defined by the ISO C standard.

    C programs define behavior, not generated machine code. If an
    implementation can implement the behavior of a C program with a memset()
    call without invoking memset(), it's free to do so.

    The programmer's intent in the code snippet that started this thread was
    for the memset call to erase sensitive data in memory that, after the
    function returns, is not part of any object. C (prior to C23) doesn't
    provide a way to do that. Any program whose observable behavior differs depending on whether that memory was cleared has undefined behavior.

    Judicious use of "volatile" should avoid the problem.

    5.1.2.3p6:

    The least requirements on a conforming implementation are:

    - Volatile accesses to objects are evaluated strictly according
    to the rules of the abstract machine.

    - At program termination, all data written into files shall be
    identical to the result that execution of the program according
    to the abstract semantics would have produced.

    - The input and output dynamics of interactive devices shall take
    place as specified in 7.23.3. The intent of these requirements
    is that unbuffered or line-buffered output appear as soon as
    possible, to ensure that prompting messages appear prior to a
    program waiting for input.

    This is the *observable behavior* of the program.

    If you think that's stupid, I won't try to change your mind, but the
    meaning is clear.

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Working, but not speaking, for Medtronic
    void Void(void) { Void(); } /* The recursive call of the void */

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: None to speak of (3:633/280.2@fidonet)
  • From Scott Lurndal@3:633/280.2 to All on Fri Mar 22 07:21:13 2024
    Reply-To: slp53@pacbell.net

    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    "All of its “critical-sequences” are contained in externally assembled >functions ( read all ) in order to prevent a rouge C compiler from

    As opposed to a viridian C compiler?

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: UsenetServer - www.usenetserver.com (3:633/280.2@fidonet)
  • From Kaz Kylheku@3:633/280.2 to All on Fri Mar 22 07:21:14 2024
    On 2024-03-21, Anton Shepelev <anton.txt@gmail.moc> wrote:
    Kaz Kylheku to Stefan Ram:

    A "famous security bug":

    void f( void )
    { char buffer[ MAX ];
    /* . . . */
    memset( buffer, 0, sizeof( buffer )); }

    . Can you see what the bug is?

    I don't know about "the bug", but conditions can be
    identified under which that would have a problem
    executing, like MAX being in excess of available automatic
    storage.

    If the /*...*/ comment represents the elision of some
    security sensitive code, where the memset is intended to
    obliterate secret information, of course, that
    obliteration is not required to work.

    After the memset, the buffer has no next use, so the all
    the assignments performed by memset to the bytes of buffer
    are dead assignments that can be elided.

    To securely clear memory, you have to use a function for
    that purpose that is not susceptible to optimization.

    I think this behavior (of a C compiler) rather stupid. In a
    low-level imperative language, the compiled program shall
    do whatever the programmer commands it to do. If he
    commands it to clear the buffer, it shall clear the buffer.
    This optimisation is too high-level, too counter-inituitive,
    even deceitful. The optimiser is free to perform the task
    in the fastest manner possible, but it shall not ignore the
    programmer's order to zero-fill the buffer, especially
    without emitting a warning about (potentially!) redundant
    code, which it is the programmer's reponsibility to confirm
    and remove.

    If C compilers warned about every piece of dead code that is eliminated,
    you'd be up to your ears in diagnostics all day.

    Then there is the question of how to silence them on a case-by-case
    basis: I did want /that/ code to be eliminated, but not that one.

    If you do want the code deleted, that doesn't always mean
    you can do it yoruself. What gets eliminated can be target
    dependent:

    switch (sizeof (long)) {
    case 4: ...
    case 8: ..
    }

    Eliminating dead stores is a very basic dataflow-driven optimization.

    Because memset is part of the C language, the compiler knows
    exactly what effect it has (that it's equivalent to setting
    all the bytes to zero, like a sequence of assignments).

    If you don't want a call to be optimized away, call your
    own function in another translation unit. (And don't turn
    on nonconforming cross-translation-unit optimizations.)

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Keith Thompson@3:633/280.2 to All on Fri Mar 22 07:46:18 2024
    Kaz Kylheku <433-929-6894@kylheku.com> writes:
    On 2024-03-21, David Brown <david.brown@hesbynett.no> wrote:
    On 20/03/2024 19:54, Kaz Kylheku wrote:
    On 2024-03-20, Stefan Ram <ram@zedat.fu-berlin.de> wrote:
    A "famous security bug":

    void f( void )
    { char buffer[ MAX ];
    /* . . . */
    memset( buffer, 0, sizeof( buffer )); }

    . Can you see what the bug is?

    I don't know about "the bug", but conditions can be identified under
    which that would have a problem executing, like MAX being in excess
    of available automatic storage.

    If the /*...*/ comment represents the elision of some security sensitive >>> code, where the memset is intended to obliterate secret information,
    of course, that obliteration is not required to work.

    After the memset, the buffer has no next use, so the all the assignments >>> performed by memset to the bytes of buffer are dead assignments that can >>> be elided.

    To securely clear memory, you have to use a function for that purpose
    that is not susceptible to optimization.

    If you're not doing anything stupid, like link time optimization, an
    external function in another translation unit (a function that the
    compiler doesn't recognize as being an alias or wrapper for memset)
    ought to suffice.

    Using LTO is not "stupid". Relying on people /not/ using LTO, or not
    using other valid optimisations, is "stupid".

    LTO is a nonconforming optimization. It destroys the concept that
    when a translation unit is translated, the semantic analysis is
    complete, such that the only remaining activity is resolution of
    external references (linkage), and that the semantic analysis of one translation unit deos not use information about another translation
    unit.

    This has not yet changed in last April's N3096 draft, where
    translation phases 7 and 8 are:

    7. White-space characters separating tokens are no longer significant.
    Each preprocessing token is converted into a token. The resulting
    tokens are syntactically and semantically analyzed and translated
    as a translation unit.

    8. All external object and function references are resolved. Library
    components are linked to satisfy external references to functions
    and objects not defined in the current translation. All such
    translator output is collected into a program image which contains
    information needed for execution in its execution environment.

    and before that, the Program Structure section says:

    The separate translation units of a program communicate by (for
    example) calls to functions whose identifiers have external linkage,
    manipulation of objects whose identifiers have external linkage, or
    manipulation of data files. Translation units may be separately
    translated and then later linked to produce an executable program.

    LTO deviates from the the model that translation units are separate,
    and the conceptual steps of phases 7 and 8.
    [...]

    Link time optimization is as valid as cross-function optimization *as
    long as* it doesn't change the defined behavior of the program.

    Say I have a call to foo in main, and the definition of foo is in
    another translation unit. In the absence of LTO, the compiler will have
    to generate a call to foo. If LTO is able to determine that foo doesn't
    do anything, it can remove the code for the function call, and the
    resulting behavior of the linked program is unchanged.

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Working, but not speaking, for Medtronic
    void Void(void) { Void(); } /* The recursive call of the void */

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: None to speak of (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Fri Mar 22 08:31:26 2024
    On 3/21/2024 1:21 PM, Scott Lurndal wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    "All of its “critical-sequences” are contained in externally assembled >> functions ( read all ) in order to prevent a rouge C compiler from

    As opposed to a viridian C compiler?

    I was worried about "overly aggressive" LTO messing around with my ASM.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Fri Mar 22 11:38:58 2024
    On 3/21/2024 4:19 PM, Scott Lurndal wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
    On 3/21/2024 1:21 PM, Scott Lurndal wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    "All of its “critical-sequences” are contained in externally assembled >>>> functions ( read all ) in order to prevent a rouge C compiler from

    As opposed to a viridian C compiler?

    I was worried about "overly aggressive" LTO messing around with my ASM.

    And you missed the oblique reference to the mispelling of 'rogue' as 'rouge'.

    Yup! I sure did. I have red on my face!

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Fri Mar 22 23:51:53 2024
    On 21/03/2024 18:41, Kaz Kylheku wrote:
    On 2024-03-21, David Brown <david.brown@hesbynett.no> wrote:
    On 20/03/2024 19:54, Kaz Kylheku wrote:
    On 2024-03-20, Stefan Ram <ram@zedat.fu-berlin.de> wrote:
    A "famous security bug":

    void f( void )
    { char buffer[ MAX ];
    /* . . . */
    memset( buffer, 0, sizeof( buffer )); }

    . Can you see what the bug is?

    I don't know about "the bug", but conditions can be identified under
    which that would have a problem executing, like MAX being in excess
    of available automatic storage.

    If the /*...*/ comment represents the elision of some security sensitive >>> code, where the memset is intended to obliterate secret information,
    of course, that obliteration is not required to work.

    After the memset, the buffer has no next use, so the all the assignments >>> performed by memset to the bytes of buffer are dead assignments that can >>> be elided.

    To securely clear memory, you have to use a function for that purpose
    that is not susceptible to optimization.

    If you're not doing anything stupid, like link time optimization, an
    external function in another translation unit (a function that the
    compiler doesn't recognize as being an alias or wrapper for memset)
    ought to suffice.

    Using LTO is not "stupid". Relying on people /not/ using LTO, or not
    using other valid optimisations, is "stupid".

    LTO is a nonconforming optimization.

    Really? That is news to me, and I suspect to the folks at gcc and
    clang/llvm that developed LTO for these compilers. (I have worked with embedded compilers that have had LTO-type optimisations for decades, but
    these are not often concerned with the minutiae of the standards.)

    It destroys the concept that
    when a translation unit is translated, the semantic analysis is
    complete, such that the only remaining activity is resolution of
    external references (linkage), and that the semantic analysis of one translation unit deos not use information about another translation
    unit.

    Where is it described in the C standards that semantic information from
    one translation unit cannot be used (for optimisation, for static error checking, for other analysis or any other purposes) in another
    translation unit?

    What makes you think that LTO, as implemented in compilers like gcc and clang/llvm, do not generate code according to the "as if" rules? (That
    is, they can generate code that is more optimal, but produces the same observable effects "as if" they were strict dumb translators of the functioning of the C abstract machine.)

    I believe there is very little where the behaviour of a C program is
    different if parts of the code are in one translation unit, or if they
    are in several. There are utilities that merge multiple C files into
    single C files (for easier deployment, or for better optimisation).
    They have to take into account renaming static objects and functions to file-local names, and remove duplicate type definitions, but as long as certain reasonable rules are followed by the programmer, it all goes
    fine. (You could, I suppose, hit complications if you relied on
    compatibility of struct or union types across translation units where
    the identifiers were different and they are compatible across TU's but
    not within TU's, according to the 6.2.7p1 rules. But that would be
    unlikely, and I expect LTO compilers to handle those cases.)


    This has not yet changed in last April's N3096 draft, where
    translation phases 7 and 8 are:

    7. White-space characters separating tokens are no longer significant.
    Each preprocessing token is converted into a token. The resulting
    tokens are syntactically and semantically analyzed and translated
    as a translation unit.

    8. All external object and function references are resolved. Library
    components are linked to satisfy external references to functions
    and objects not defined in the current translation. All such
    translator output is collected into a program image which contains
    information needed for execution in its execution environment.

    and before that, the Program Structure section says:

    The separate translation units of a program communicate by (for
    example) calls to functions whose identifiers have external linkage,
    manipulation of objects whose identifiers have external linkage, or
    manipulation of data files. Translation units may be separately
    translated and then later linked to produce an executable program.


    All of that is irrelevant. It says nothing against sharing other
    information.

    LTO deviates from the the model that translation units are separate,
    and the conceptual steps of phases 7 and 8.

    No, it does not. These paragraphs are requirements, not limitations.


    The translation unit that is prepared for LTO is not fully cooked. You
    have no idea what its code will turn into when the interrupted
    compilation is resumed during linkage, under the influence of other tranlation units it is combined with.

    You have as much and as little idea of what the generated code will be
    as you always do during compilation. Compilers can do all kinds of manipulations of the source code you write - as long as the observable behaviour of the program is the same as a dumb translation. They can,
    and do, use all kinds of inter-procedural optimisations for inlining
    code, outlining it, breaking functions into pieces, cloning them, using constant propagation, and so on. LTO lets them do this across
    translation units.


    So in fact, the language allows us to take it for granted that, given

    my_memset(array, 0, sizeof(array)); }

    at the end of a function, and my_memset is an external definition
    provided by another translation unit, the call may not be elided.


    No, the C language standards make no such guarantee.

    The one who may be acting recklessly is he who turns on nonconforming optimizations that are not documented as supported by the code base.

    Another example would be something like gcc's -ffast-math.

    That is /completely/ different. That option is clearly documented as potentially violating some of the rules of the ISO C standards. This is
    why it is not enabled by default or by any common optimisation levels
    (except "-Ofast", which is also documented as potentially violating standards).

    You wouldn't unleash that on numerical code written by experts,
    and expect the same correct results.


    I would not expect identical results to floating point calculations, no.

    Depending on the code in question, I would still expect correct results.
    I use "-ffast-math" in all my code in order to get correct results a
    good deal faster (for my targets, and my type of code) than I would get without it.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Sat Mar 23 00:38:17 2024
    On 21/03/2024 21:21, Kaz Kylheku wrote:

    Eliminating dead stores is a very basic dataflow-driven optimization.

    Because memset is part of the C language, the compiler knows
    exactly what effect it has (that it's equivalent to setting
    all the bytes to zero, like a sequence of assignments).


    Yes.

    If you don't want a call to be optimized away, call your
    own function in another translation unit.

    No.

    There are several ways that guarantee your code will carry out the
    writes here (though none that guarantee the secret data is not also
    stored elsewhere). Using a function in a different TU is not one of
    these techniques. You do people a disfavour by recommending it.

    (And don't turn
    on nonconforming cross-translation-unit optimizations.)


    If I knew of any non-conforming cross-translation-unit optimisations in
    a compiler, I would avoid using them until the compiler vendor had fixed
    the bug in question.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Sat Mar 23 01:50:09 2024
    Anton Shepelev <anton.txt@gmail.moc> writes:

    Kaz Kylheku to Stefan Ram:

    A "famous security bug":

    void f( void )
    { char buffer[ MAX ];
    /* . . . */
    memset( buffer, 0, sizeof( buffer )); }

    . Can you see what the bug is?

    I don't know about "the bug", but conditions can be
    identified under which that would have a problem
    executing, like MAX being in excess of available automatic
    storage.

    If the /*...*/ comment represents the elision of some
    security sensitive code, where the memset is intended to
    obliterate secret information, of course, that
    obliteration is not required to work.

    After the memset, the buffer has no next use, so the all
    the assignments performed by memset to the bytes of buffer
    are dead assignments that can be elided.

    To securely clear memory, you have to use a function for
    that purpose that is not susceptible to optimization.

    I think this behavior (of a C compiler) rather stupid. In a
    low-level imperative language, the compiled program shall
    do whatever the programmer commands it to do. If he
    commands it to clear the buffer, it shall clear the buffer.
    This optimisation is too high-level, too counter-inituitive,
    even deceitful. The optimiser is free to perform the task
    in the fastest manner possible, but it shall not ignore the
    programmer's order to zero-fill the buffer, especially
    without emitting a warning about (potentially!) redundant
    code, which it is the programmer's reponsibility to confirm
    and remove.

    Redundant code shall be dealt with in the source, rather than
    in the executable.

    I have a couple of reactions.

    One is that the ship has sailed. Somewhere between 35 and 40
    years ago the people who wrote the C standard decided on a
    semantic model that allows optimizations like this, and that is
    not going to change. Certainly there are people who would prefer
    to think of C as being a "low-level imperative language" like
    what you describe, but the C community as a whole has accepted
    the view taken by the authors of the C standard.

    The second reaction is that, to be somewhat blunt, what is being
    suggested is naive. I expect you do not yet appreciate the
    ramifications of what you are suggesting. It is hard, indeed I
    would say very hard, to define a semantic model that faithfully
    represents the behaviors you would like to impose. You might
    want to look into that, and what has been tried previously along
    these lines, before pursuing this advocacy any further.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Kaz Kylheku@3:633/280.2 to All on Sat Mar 23 02:33:03 2024
    On 2024-03-22, David Brown <david.brown@hesbynett.no> wrote:
    On 21/03/2024 21:21, Kaz Kylheku wrote:

    Eliminating dead stores is a very basic dataflow-driven optimization.

    Because memset is part of the C language, the compiler knows
    exactly what effect it has (that it's equivalent to setting
    all the bytes to zero, like a sequence of assignments).


    Yes.

    If you don't want a call to be optimized away, call your
    own function in another translation unit.

    No.

    There are several ways that guarantee your code will carry out the
    writes here (though none that guarantee the secret data is not also
    stored elsewhere). Using a function in a different TU is not one of
    these techniques. You do people a disfavour by recommending it.

    It demonstrably is.

    (And don't turn
    on nonconforming cross-translation-unit optimizations.)


    If I knew of any non-conforming cross-translation-unit optimisations in
    a compiler, I would avoid using them until the compiler vendor had fixed
    the bug in question.

    They are not fixable. Translation units are separate, subject
    to separate semantic analysis, which is settled prior to linkage.

    The semantic analysis of one translation unit must be carried out in the absence of any information about what is in another translation unit.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Kaz Kylheku@3:633/280.2 to All on Sat Mar 23 02:50:00 2024
    On 2024-03-21, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    Kaz Kylheku <433-929-6894@kylheku.com> writes:
    On 2024-03-21, David Brown <david.brown@hesbynett.no> wrote:
    On 20/03/2024 19:54, Kaz Kylheku wrote:
    On 2024-03-20, Stefan Ram <ram@zedat.fu-berlin.de> wrote:
    A "famous security bug":

    void f( void )
    { char buffer[ MAX ];
    /* . . . */
    memset( buffer, 0, sizeof( buffer )); }

    . Can you see what the bug is?

    I don't know about "the bug", but conditions can be identified under
    which that would have a problem executing, like MAX being in excess
    of available automatic storage.

    If the /*...*/ comment represents the elision of some security sensitive >>>> code, where the memset is intended to obliterate secret information,
    of course, that obliteration is not required to work.

    After the memset, the buffer has no next use, so the all the assignments >>>> performed by memset to the bytes of buffer are dead assignments that can >>>> be elided.

    To securely clear memory, you have to use a function for that purpose
    that is not susceptible to optimization.

    If you're not doing anything stupid, like link time optimization, an
    external function in another translation unit (a function that the
    compiler doesn't recognize as being an alias or wrapper for memset)
    ought to suffice.

    Using LTO is not "stupid". Relying on people /not/ using LTO, or not
    using other valid optimisations, is "stupid".

    LTO is a nonconforming optimization. It destroys the concept that
    when a translation unit is translated, the semantic analysis is
    complete, such that the only remaining activity is resolution of
    external references (linkage), and that the semantic analysis of one
    translation unit deos not use information about another translation
    unit.

    This has not yet changed in last April's N3096 draft, where
    translation phases 7 and 8 are:

    7. White-space characters separating tokens are no longer significant.
    Each preprocessing token is converted into a token. The resulting
    tokens are syntactically and semantically analyzed and translated
    as a translation unit.

    8. All external object and function references are resolved. Library
    components are linked to satisfy external references to functions
    and objects not defined in the current translation. All such
    translator output is collected into a program image which contains
    information needed for execution in its execution environment.

    and before that, the Program Structure section says:

    The separate translation units of a program communicate by (for
    example) calls to functions whose identifiers have external linkage,
    manipulation of objects whose identifiers have external linkage, or
    manipulation of data files. Translation units may be separately
    translated and then later linked to produce an executable program.

    LTO deviates from the the model that translation units are separate,
    and the conceptual steps of phases 7 and 8.
    [...]

    Link time optimization is as valid as cross-function optimization *as
    long as* it doesn't change the defined behavior of the program.

    It always does; the interaction of a translation unit with another
    is an externally visible aspect of the C program. (That can be inferred
    from the rules which forbid semantic analysis across translation
    units, only linkage.)

    That's why we can have a real world security issue caused by zeroing
    being optimized away.

    The rules spelled out in ISO C allow us to unit test a translation
    unit by linking it to some harness, and be sure it has exactly the
    same behaviors when linked to the production program.

    If I have some translation unit in which there is a function foo, such
    that when I call foo, it then calls an external function bar, that's observable. I can link that unit to a program which supplies bar,
    containing a printf call, then call foo and verify that the printf call
    is executed.

    Since ISO C says that the semantic analysis has been done (that
    unit having gone through phase 7), we can take it for granted as a done-and-dusted property of that translation unit that it calls bar
    whenever its foo is invoked.

    Say I have a call to foo in main, and the definition of foo is in
    another translation unit. In the absence of LTO, the compiler will have
    to generate a call to foo. If LTO is able to determine that foo doesn't
    do anything, it can remove the code for the function call, and the
    resulting behavior of the linked program is unchanged.

    There always situations in which optimizations that have been forbidden
    don't cause a problem, and are even desirable.

    If you have LTO turned on, you might be programming in GNU C or Clang C
    or whatever, not standard C.

    Sometimes programs have the same interpretation in GNU C and standard
    C, or the same interpretation to someone who doesn't care about certain differences.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Keith Thompson@3:633/280.2 to All on Sat Mar 23 03:31:03 2024
    Kaz Kylheku <433-929-6894@kylheku.com> writes:
    On 2024-03-21, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    Kaz Kylheku <433-929-6894@kylheku.com> writes:
    On 2024-03-21, David Brown <david.brown@hesbynett.no> wrote:
    On 20/03/2024 19:54, Kaz Kylheku wrote:
    On 2024-03-20, Stefan Ram <ram@zedat.fu-berlin.de> wrote:
    A "famous security bug":

    void f( void )
    { char buffer[ MAX ];
    /* . . . */
    memset( buffer, 0, sizeof( buffer )); }

    . Can you see what the bug is?

    I don't know about "the bug", but conditions can be identified under >>>>> which that would have a problem executing, like MAX being in excess
    of available automatic storage.

    If the /*...*/ comment represents the elision of some security sensitive >>>>> code, where the memset is intended to obliterate secret information, >>>>> of course, that obliteration is not required to work.

    After the memset, the buffer has no next use, so the all the assignments >>>>> performed by memset to the bytes of buffer are dead assignments that can >>>>> be elided.

    To securely clear memory, you have to use a function for that purpose >>>>> that is not susceptible to optimization.

    If you're not doing anything stupid, like link time optimization, an >>>>> external function in another translation unit (a function that the
    compiler doesn't recognize as being an alias or wrapper for memset)
    ought to suffice.

    Using LTO is not "stupid". Relying on people /not/ using LTO, or not >>>> using other valid optimisations, is "stupid".

    LTO is a nonconforming optimization. It destroys the concept that
    when a translation unit is translated, the semantic analysis is
    complete, such that the only remaining activity is resolution of
    external references (linkage), and that the semantic analysis of one
    translation unit deos not use information about another translation
    unit.

    This has not yet changed in last April's N3096 draft, where
    translation phases 7 and 8 are:

    7. White-space characters separating tokens are no longer significant. >>> Each preprocessing token is converted into a token. The resulting
    tokens are syntactically and semantically analyzed and translated
    as a translation unit.

    8. All external object and function references are resolved. Library
    components are linked to satisfy external references to functions
    and objects not defined in the current translation. All such
    translator output is collected into a program image which contains
    information needed for execution in its execution environment.

    and before that, the Program Structure section says:

    The separate translation units of a program communicate by (for
    example) calls to functions whose identifiers have external linkage,
    manipulation of objects whose identifiers have external linkage, or
    manipulation of data files. Translation units may be separately
    translated and then later linked to produce an executable program.

    LTO deviates from the the model that translation units are separate,
    and the conceptual steps of phases 7 and 8.
    [...]

    Link time optimization is as valid as cross-function optimization *as
    long as* it doesn't change the defined behavior of the program.

    It always does; the interaction of a translation unit with another
    is an externally visible aspect of the C program. (That can be inferred
    from the rules which forbid semantic analysis across translation
    units, only linkage.)

    That's why we can have a real world security issue caused by zeroing
    being optimized away.

    The rules spelled out in ISO C allow us to unit test a translation
    unit by linking it to some harness, and be sure it has exactly the
    same behaviors when linked to the production program.

    If I have some translation unit in which there is a function foo, such
    that when I call foo, it then calls an external function bar, that's observable. I can link that unit to a program which supplies bar,
    containing a printf call, then call foo and verify that the printf call
    is executed.

    Since ISO C says that the semantic analysis has been done (that
    unit having gone through phase 7), we can take it for granted as a done-and-dusted property of that translation unit that it calls bar
    whenever its foo is invoked.

    We can take it for granted that the output performed by the printf call
    will be performed, because output is observable behavior. If the
    external function bar is modified, the LTO step has to be redone.

    Say I have a call to foo in main, and the definition of foo is in
    another translation unit. In the absence of LTO, the compiler will have
    to generate a call to foo. If LTO is able to determine that foo doesn't
    do anything, it can remove the code for the function call, and the
    resulting behavior of the linked program is unchanged.

    There always situations in which optimizations that have been forbidden
    don't cause a problem, and are even desirable.

    If you have LTO turned on, you might be programming in GNU C or Clang C
    or whatever, not standard C.

    Sometimes programs have the same interpretation in GNU C and standard
    C, or the same interpretation to someone who doesn't care about certain differences.

    Are you claiming that a function call is observable behavior?

    Consider:

    main.c:
    #include "foo.h"
    int main(void) {
    foo();
    }


    foo.h:
    #ifndef FOO_H
    #define FOO_H
    void foo(void);
    #endif


    foo.c:
    void foo(void) {
    // do nothing
    }


    Are you saying that the "call" instruction generated for the function
    call is *observable behavior*? If an implementation doesn't generate
    that "call" instruction because it's able to determine at link time that
    the call does nothing, that optimization is forbidden?

    I presume you'd agree that omitting the "call" instruction is allowed if
    the call and the function definition are in the same translation unit.
    What wording in the standard requires a "call" instruction to be
    generated if they're in different translation units?

    That's a trivial example, but other link time optimizations that don't
    change a program's observable behavior (insert weasel words about
    unspecified behavior) are also allowed.

    In phase 8:
    All external object and function references are resolved. Library
    components are linked to satisfy external references to functions
    and objects not defined in the current translation. All such
    translator output is collected into a program image which contains
    information needed for execution in its execution environment.

    I don't see anything about required CPU instructions.

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Working, but not speaking, for Medtronic
    void Void(void) { Void(); } /* The recursive call of the void */

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: None to speak of (3:633/280.2@fidonet)
  • From James Kuyper@3:633/280.2 to All on Sat Mar 23 03:35:52 2024
    On 3/21/24 16:46, Keith Thompson wrote:
    ....
    Link time optimization is as valid as cross-function optimization *as
    long as* it doesn't change the defined behavior of the program.

    Minor adjustment: due to unspecified behavior, some code can have
    multiple permitted behaviors. LTO could be conforming even if it changed
    the behavior, as long as it changes it to one of the other permitted
    behaviors. For implementation-defined behavior, the fact that the change
    could happen would have to be documented.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From James Kuyper@3:633/280.2 to All on Sat Mar 23 04:05:49 2024
    On 3/22/24 11:50, Kaz Kylheku wrote:
    On 2024-03-21, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    ....
    Link time optimization is as valid as cross-function optimization *as
    long as* it doesn't change the defined behavior of the program.

    It always does; the interaction of a translation unit with another
    is an externally visible aspect of the C program.

    The standard makes no use of the concept of "externally visible aspects".

    "The least requirements on a conforming implementation are:
    — Volatile accesses to objects are evaluated strictly according to the
    rules of the abstract machine.
    — At program termination, all data written into files shall be identical
    to the result that execution of the program according to the abstract
    semantics would have produced.
    — The input and output dynamics of interactive devices shall take place
    as specified in 7.23.3.
    The intent of these requirements is that unbuffered or line-buffered
    output appear as soon as possible, to ensure that prompting messages
    appear prior to a program waiting for input.
    This is the observable behavior of the program." (5.1.2.3p6).

    The term "observable behavior" is italicized, an ISO convention
    indicating that the sentence in which that term is italicized
    constitutes the official definition of that term. Note, in particular,
    that the term does NOT mean "behavior which can be observed", which
    would otherwise be closely connected to your concept of "externally
    visible aspects".

    Note that "observable behavior" does NOT include function calls, not
    even calls to functions defined in different translation units.

    The standard explicitly permits optimizations which violate the abstract semantics, so long as they result in the same observable behavior as if
    the abstract semantics had been obeyed. Being able to express that
    concept is the only reason that the term "observable behavior" exists.

    ... (That can be inferred
    from the rules which forbid semantic analysis across translation
    units, only linkage.)

    I see no wording forbidding such analysis. The section you cite permits separate translation, but does not forbid whole-program translation.

    ....
    If I have some translation unit in which there is a function foo, such
    that when I call foo, it then calls an external function bar, that's observable.

    Not in the sense of "observable behavior" as that term is defined by the
    C standard.

    ....
    Since ISO C says that the semantic analysis has been done (that
    unit having gone through phase 7),

    A footnote makes it clear that the translation phases are purely
    conceptual, identifying the precedence between the different semantic
    rules that they specify. An implementation is not prohibited from
    intermingling the translation phases, so long as it produces the same observable behavior as if it had not intermingled them.


    ....
    If you have LTO turned on, you might be programming in GNU C or Clang C
    or whatever, not standard C.

    True, but you also could be programming in standard C.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From James Kuyper@3:633/280.2 to All on Sat Mar 23 04:14:49 2024
    On 3/21/24 14:13, Anton Shepelev wrote:
    ....
    I think this behavior (of a C compiler) rather stupid. In a
    low-level imperative language, the compiled program shall
    do whatever the programmer commands it to do.

    C is NOT that low a level of language. The standard explicitly allows implementations to use any method they find convenient to produce
    observable behavior which is consistent with the requirements of the
    standard. Despite describing how that behavior might be produced by the abstract machine, it explicitly allows an implementation to achieve that behavior by other means.

    If you want to tell a system not only what a program must do, but also
    how it must do it, you need to use a lower-level language than C. That's
    not what C is for.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From James Kuyper@3:633/280.2 to All on Sat Mar 23 04:15:15 2024
    On 3/22/24 11:33, Kaz Kylheku wrote:
    ....
    They are not fixable. Translation units are separate, subject
    to separate semantic analysis, which is settled prior to linkage.>
    The semantic analysis of one translation unit must be carried out in the absence of any information about what is in another translation unit.

    The standard imposes no such requirement. It permits separate
    compilation. It does not mandate it.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Kaz Kylheku@3:633/280.2 to All on Sat Mar 23 04:20:03 2024
    On 2024-03-22, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    Kaz Kylheku <433-929-6894@kylheku.com> writes:
    Since ISO C says that the semantic analysis has been done (that
    unit having gone through phase 7), we can take it for granted as a
    done-and-dusted property of that translation unit that it calls bar
    whenever its foo is invoked.

    We can take it for granted that the output performed by the printf call
    will be performed, because output is observable behavior. If the
    external function bar is modified, the LTO step has to be redone.

    That's what undeniably has to be done in the LTO world. Nothing that
    is done brings that world into conformance, though.

    Say I have a call to foo in main, and the definition of foo is in
    another translation unit. In the absence of LTO, the compiler will have >>> to generate a call to foo. If LTO is able to determine that foo doesn't >>> do anything, it can remove the code for the function call, and the
    resulting behavior of the linked program is unchanged.

    There always situations in which optimizations that have been forbidden
    don't cause a problem, and are even desirable.

    If you have LTO turned on, you might be programming in GNU C or Clang C
    or whatever, not standard C.

    Sometimes programs have the same interpretation in GNU C and standard
    C, or the same interpretation to someone who doesn't care about certain
    differences.

    Are you claiming that a function call is observable behavior?

    Yes. It is the observable behavior of an unlinked translation unit.

    It can be observed by linking a harness to it, with a main() function
    and all else that is required to make it a complete program.

    That harness becomes an instrument for observation.

    Consider:

    main.c:
    #include "foo.h"
    int main(void) {
    foo();
    }


    foo.h:
    #ifndef FOO_H
    #define FOO_H
    void foo(void);
    #endif


    foo.c:
    void foo(void) {
    // do nothing
    }


    Are you saying that the "call" instruction generated for the function
    call is *observable behavior*?

    Of course; it can be observed externally, without doing any reverse
    engineering on the translated unit.

    External linkage is called "external" for a reason!

    If an implementation doesn't generate
    that "call" instruction because it's able to determine at link time that
    the call does nothing, that optimization is forbidden?

    The text says so. Translation units are separate; semantic analysis is
    finished in translation phase 7; linking in 8.

    Out of translation phases 1-7 we get a concrete artifact: the translated
    unit. That has externally visible features, like what symbols it
    requires. Its behavior with regard to those symbols can be empirically observed, validated by tests and expected to hold thereafter.

    Since semantic analysis is complete, any observable behavior can be
    taken to be a fact about that translated unit, a property of it, which
    will not change when it is subject to linkage. The truth cannot be
    clawed back, according to the way things are defined in the standard,
    and this is a good thing.

    I presume you'd agree that omitting the "call" instruction is allowed if
    the call and the function definition are in the same translation unit.

    Yes.

    And that's a way to get the effect of LTO portably, in a conforming
    way, in any implementation going back decades. Instead of linkage use
    #include "foo.c", #include "bar.c" (taking steps to ensure your internal
    names don't clash).

    LTO is more convenient in that you don't have to use an unusual
    program structure, and keeps your internal linkage scopes separate.
    Just don't pretend it's conforming to standard C, any more than
    -ffast-math.

    LTO is "vooodoo" though. The translation units contain intermediate
    code, not target code. The intermediate code continues to be subject
    to compiler passes when the translation units are brought together.
    Thus translation is going on, but the units are gone.

    What wording in the standard requires a "call" instruction to be
    generated if they're in different translation units?

    That's a trivial example, but other link time optimizations that don't
    change a program's observable behavior (insert weasel words about
    unspecified behavior) are also allowed.

    An example would be the removal of material that is not referenced,
    like functions not called anywhere, or entire translation units
    whose external names are not referenced. That can cause issues too,
    and I've run into them, but I can't call that nonconforming.
    Nothing is semantically analyzed across translation units, only the
    linkage graph itself, which may be found to be disconnected.

    In phase 8:
    All external object and function references are resolved. Library
    components are linked to satisfy external references to functions
    and objects not defined in the current translation. All such
    translator output is collected into a program image which contains
    information needed for execution in its execution environment.

    I don't see anything about required CPU instructions.

    I don't see anything about /removing/ instructions that have to be
    there according to the semantic analysis performed in order to
    translate those units from phases 1 - 7, and that can be confirmed
    to be present with a test harness.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Kaz Kylheku@3:633/280.2 to All on Sat Mar 23 04:28:08 2024
    On 2024-03-22, James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
    On 3/21/24 16:46, Keith Thompson wrote:
    ...
    Link time optimization is as valid as cross-function optimization *as
    long as* it doesn't change the defined behavior of the program.

    Minor adjustment: due to unspecified behavior, some code can have
    multiple permitted behaviors. LTO could be conforming even if it changed
    the behavior, as long as it changes it to one of the other permitted behaviors. For implementation-defined behavior, the fact that the change could happen would have to be documented.

    Some unspecified behaviors can change at execution time, like the
    unspecified value of an uninitialized unsigned char object in
    a malloc-ed block.

    If the unspecified behavior a translation unit is changed to another in
    a way that obviously requires semantic analysis (such that a change
    occurs in the translated unit that amounts to it having been
    re-translated) then that appears to violate the requirements in ISO C
    about semantic analysis being done in phase 7, and not any later.

    I think translation units can be retained in a form that has not
    completely gone through translation phase 7. Such that before linkage,
    analysis can take place which completes phase 7, before 8 begins.

    However, that analysis has to be done in isolation. The standard
    describes translation units as being separate.

    If we take N translation units from phases 1 to 6, and halfway through
    7, and then to complete the semantic analysis of phase 7, the translator
    peeks across all N units, then that is no longer proper separation of translation units right through phase 7. Combination of translation
    units can only begin in 8, by which time semantic analysis is done.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From James Kuyper@3:633/280.2 to All on Sat Mar 23 04:38:13 2024
    On 3/22/24 13:20, Kaz Kylheku wrote:
    On 2024-03-22, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    ....
    Are you claiming that a function call is observable behavior?

    Yes. It is the observable behavior of an unlinked translation unit.

    In the context of the C standard, "observable behavior" is a term with a precisely specified meaning which is NOT "behavior which can be
    observed". That definition does not cover function calls, not even those
    with external linkage. What the standard says about what optimizations
    are permitted is in terms of "observable behavior", NOT "behavior which
    can be observed".

    Are you saying that the "call" instruction generated for the function
    call is *observable behavior*?

    Of course; it can be observed externally, without doing any reverse engineering on the translated unit.

    And the C standard imposes no requirement that such behavior occur as
    described by the abstract semantics. Only actual observable behavior, as
    that term is defined by the C standard, must occur as if those semantics
    were followed - whether or not they actually were.

    ....
    If an implementation doesn't generate
    that "call" instruction because it's able to determine at link time that
    the call does nothing, that optimization is forbidden?

    The text says so. Translation units are separate; semantic analysis is finished in translation phase 7; linking in 8.

    Translation phases are specified solely for the purpose of expressing
    the precedence of the corresponding semantic rules. The standard
    explicitly allows for the phases to be intermingled or even done out of
    order, so long as the observable behavior is behavior that would be
    permitted if they had been done in the order specified.

    Out of translation phases 1-7 we get a concrete artifact: the translated unit. That has externally visible features, like what symbols it
    requires. Its behavior with regard to those symbols can be empirically observed, validated by tests and expected to hold thereafter.

    And the standard imposes no requirements on those externally visible
    features, only on some (but not ALL) of the behavior that results from executing the program.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From James Kuyper@3:633/280.2 to All on Sat Mar 23 04:38:40 2024
    On 3/22/24 13:28, Kaz Kylheku wrote:
    ....
    If the unspecified behavior a translation unit is changed to another in
    a way that obviously requires semantic analysis (such that a change
    occurs in the translated unit that amounts to it having been
    re-translated) then that appears to violate the requirements in ISO C
    about semantic analysis being done in phase 7, and not any later.

    There is no such requirement. The translation phases are explicitly not required to be done in the specified order, so long as the result is one
    that would be permitted by doing them in that order.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Sat Mar 23 04:42:19 2024
    On 22/03/2024 16:50, Kaz Kylheku wrote:
    On 2024-03-21, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    Kaz Kylheku <433-929-6894@kylheku.com> writes:
    On 2024-03-21, David Brown <david.brown@hesbynett.no> wrote:
    On 20/03/2024 19:54, Kaz Kylheku wrote:
    On 2024-03-20, Stefan Ram <ram@zedat.fu-berlin.de> wrote:
    A "famous security bug":

    void f( void )
    { char buffer[ MAX ];
    /* . . . */
    memset( buffer, 0, sizeof( buffer )); }

    . Can you see what the bug is?

    I don't know about "the bug", but conditions can be identified under >>>>> which that would have a problem executing, like MAX being in excess
    of available automatic storage.

    If the /*...*/ comment represents the elision of some security sensitive >>>>> code, where the memset is intended to obliterate secret information, >>>>> of course, that obliteration is not required to work.

    After the memset, the buffer has no next use, so the all the assignments >>>>> performed by memset to the bytes of buffer are dead assignments that can >>>>> be elided.

    To securely clear memory, you have to use a function for that purpose >>>>> that is not susceptible to optimization.

    If you're not doing anything stupid, like link time optimization, an >>>>> external function in another translation unit (a function that the
    compiler doesn't recognize as being an alias or wrapper for memset)
    ought to suffice.

    Using LTO is not "stupid". Relying on people /not/ using LTO, or not
    using other valid optimisations, is "stupid".

    LTO is a nonconforming optimization. It destroys the concept that
    when a translation unit is translated, the semantic analysis is
    complete, such that the only remaining activity is resolution of
    external references (linkage), and that the semantic analysis of one
    translation unit deos not use information about another translation
    unit.

    This has not yet changed in last April's N3096 draft, where
    translation phases 7 and 8 are:

    7. White-space characters separating tokens are no longer significant. >>> Each preprocessing token is converted into a token. The resulting
    tokens are syntactically and semantically analyzed and translated
    as a translation unit.

    8. All external object and function references are resolved. Library
    components are linked to satisfy external references to functions
    and objects not defined in the current translation. All such
    translator output is collected into a program image which contains >>> information needed for execution in its execution environment.

    and before that, the Program Structure section says:

    The separate translation units of a program communicate by (for
    example) calls to functions whose identifiers have external linkage,
    manipulation of objects whose identifiers have external linkage, or
    manipulation of data files. Translation units may be separately
    translated and then later linked to produce an executable program.

    LTO deviates from the the model that translation units are separate,
    and the conceptual steps of phases 7 and 8.
    [...]

    Link time optimization is as valid as cross-function optimization *as
    long as* it doesn't change the defined behavior of the program.

    It always does; the interaction of a translation unit with another
    is an externally visible aspect of the C program.

    The C standards don't define a term "externally visible". They define "observable behaviour", and require that a conforming implementation
    generates a program that matches the "observable behaviour". This is in 5.1.2.2.2p6. Interaction between translation units is not part of the observable behaviour of a program, because it is not relevant to the
    concept of /running/ a program - it is only relevant when translating
    the source to the program image.

    Thus the "as if" rules apply - the compiler can do whatever it wants -
    up to and including asking ChatGPT for an exe file - as long as the
    result is a /program/ that gives the same "observable behaviour" as you
    would get from an abstract machine.

    You should read the footnotes to 5.1.1.2 "Translation phases".
    Footnotes are not normative, but they are helpful in explaining the
    meaning of the text. They note that compilers don't have to follow the details of the translation phases, and that source files, translation
    units, and translated translation units don't have to have one-to-one correspondences.

    The standard also does not say what the output of "translation" is - it
    does not have to be assembly or machine code. It can happily be an
    internal format, as used by gcc and clang/llvm. It does not define what "linking" is, or how the translated translation units are "collected
    into a program image" - combining the partially compiled units,
    optimising, and then generating a program image is well within that definition.

    (That can be inferred
    from the rules which forbid semantic analysis across translation
    units, only linkage.)

    The rules do not forbid semantic analysis across translation units -
    they merely do not /require/ it. You are making an inference without
    any justification that I can see.


    That's why we can have a real world security issue caused by zeroing
    being optimized away.

    No, it is not. We have real-world security issues for all sorts of
    reasons, including people mistakenly thinking they can force particular
    types of code generation by calling functions in different source files.

    (To be clear here, before LTO became common, that was a strategy that
    worked. There is a long history in C programming of dilemmas between
    writing code that you know works efficiently on current tools, or
    writing code that you know is guaranteed correct by the standards but is inefficient with current tools.)


    The rules spelled out in ISO C allow us to unit test a translation
    unit by linking it to some harness, and be sure it has exactly the
    same behaviors when linked to the production program.


    No, they don't.

    If the unit you are testing calls something outside that unit, you may
    get different behaviours when testing and when used in production. The
    only thing you can be sure of from testing is that if you find a bug
    during testing, you have a bug in the code. You can never use testing
    to be sure that the code works (with the exception of exhaustive testing
    of all possible inputs, which is rarely practical).

    If I have some translation unit in which there is a function foo, such
    that when I call foo, it then calls an external function bar, that's observable.

    5.1.2.2.1p6 lists the three things that C defines as "observable
    behaviour". Function calls - internal or external - are not amongst these.

    I can link that unit to a program which supplies bar,
    containing a printf call, then call foo and verify that the printf call
    is executed.

    Yes, you can. The printf call - or, more exactly, the "input and output dynamics" - are observable behaviour. The call to "bar", however, is not.

    The compiler, when compiling the source of "foo", will include a call to
    "bar" when it does not have the source code (or other detailed semantic information) for "bar" available at the time. But you are mistaken to
    think it does so because the call is "observable" or required by the C standard. It does so because it cannot prove that /running/ the
    function "bar" contains no observable behaviour, or otherwise affects
    the observable behaviour of the program. The compiler cannot skip the
    call unless it can be sure it is safe to do so - and if it knows nothing
    about the implementation of "bar", it must assume the worst.

    Sometimes the compiler may have additional information - such as if it
    is declared the gcc "const" or "pure" attributes (or the standardised "unsequenced" and "reproducible" attributes in the draft for the next C version after C23). This may allow a compiler to re-arrange calls, duplicating them, eliminating them, or re-ordering them in various ways.
    (The C2y draft includes running such functions once at startup for
    each input value, and preserving the results for later use, as a
    permissible optimisation. It does this without having changed the
    description of translation phases or observable behaviour. But of
    course it is still just a draft.)


    Since ISO C says that the semantic analysis has been done (that
    unit having gone through phase 7), we can take it for granted as a done-and-dusted property of that translation unit that it calls bar
    whenever its foo is invoked.

    No, we can't - see above. Nothing in the C standards forbids any
    additional analysis, or using other information in code generation.


    Say I have a call to foo in main, and the definition of foo is in
    another translation unit. In the absence of LTO, the compiler will have
    to generate a call to foo. If LTO is able to determine that foo doesn't
    do anything, it can remove the code for the function call, and the
    resulting behavior of the linked program is unchanged.

    There always situations in which optimizations that have been forbidden
    don't cause a problem, and are even desirable.


    Can you give examples?

    You already mentioned "-fast-math" (and by implication, its various
    subflags in gcc, clang and icc). These are clearly documented as
    allowing some violations of the C standards (and not least, the IEEE
    floating point standards, which are stricter than those of C).


    If you have LTO turned on, you might be programming in GNU C or Clang C
    or whatever, not standard C.

    You might be programming in GNU C. You might be programming in a
    standard C version (modulo bugs in the compiler).


    Sometimes programs have the same interpretation in GNU C and standard
    C, or the same interpretation to someone who doesn't care about certain differences.


    (While I don't much like an "appeal to authority" argument, I think it's
    worth noting that the major C / C++ compilers, gcc, clang/llvm and MSVC,
    all support link-time optimisation. They also all work together with
    both the C and C++ standards committees. It would be quite the scandal
    if there were any truth in your claims and these compiler vendors were
    all breaking the rules of the languages they help to specify!)



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Sat Mar 23 04:50:26 2024
    On 22/03/2024 16:33, Kaz Kylheku wrote:
    On 2024-03-22, David Brown <david.brown@hesbynett.no> wrote:
    On 21/03/2024 21:21, Kaz Kylheku wrote:

    Eliminating dead stores is a very basic dataflow-driven optimization.

    Because memset is part of the C language, the compiler knows
    exactly what effect it has (that it's equivalent to setting
    all the bytes to zero, like a sequence of assignments).


    Yes.

    If you don't want a call to be optimized away, call your
    own function in another translation unit.

    No.

    There are several ways that guarantee your code will carry out the
    writes here (though none that guarantee the secret data is not also
    stored elsewhere). Using a function in a different TU is not one of
    these techniques. You do people a disfavour by recommending it.

    It demonstrably is.

    It depends on your compiler and the options you use. That is not a good choice - especially when better ones are available.


    (And don't turn
    on nonconforming cross-translation-unit optimizations.)


    If I knew of any non-conforming cross-translation-unit optimisations in
    a compiler, I would avoid using them until the compiler vendor had fixed
    the bug in question.

    They are not fixable. Translation units are separate, subject
    to separate semantic analysis, which is settled prior to linkage.

    The semantic analysis of one translation unit must be carried out in the absence of any information about what is in another translation unit.


    "Proof by repeated assertion" does not hold.

    I have tried to explain the reality of what the C standards say in a
    couple of posts (including one that I had not posted before you wrote
    this one). I have tried to make things as clear as possible, and
    hopefully you will see the point.

    If not, then you must accept that you interpret the C standards in a
    different manner from the main compile vendors, as well as some "big
    names" in this group. That is, of course, not proof in itself - but you
    must realise that for practical purposes you need to be aware of how
    others interpret the standard, both for your own coding and for the
    advice or recommendations you give to others.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Sat Mar 23 05:13:19 2024
    On 22/03/2024 18:20, Kaz Kylheku wrote:
    On 2024-03-22, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    Kaz Kylheku <433-929-6894@kylheku.com> writes:


    Are you claiming that a function call is observable behavior?

    Yes. It is the observable behavior of an unlinked translation unit.

    It can be observed by linking a harness to it, with a main() function
    and all else that is required to make it a complete program.

    That harness becomes an instrument for observation.

    That is "observable" in the same sense that the size of a compiled
    object file is "observable" by executing "ls -l". It is not "observable behaviour" as defined by the C standards.

    C defines "observable behaviour" for /programs/. Not for translation
    units, or translated translation units (what one might call an "object
    file" - be it assembly, machine code, or internal compiler-specific
    formats).

    For C, it makes no sense to talk about "observable behaviour" for a
    unit. It is only by linking the unit to your test harness that you get
    a "program", which then has "observable behaviour".




    Are you saying that the "call" instruction generated for the function
    call is *observable behavior*?

    Of course; it can be observed externally, without doing any reverse engineering on the translated unit.

    The contents of an object file - or the instructions used in a complete program - are not "observable behaviour" in C. Again, I refer you to 5.1.2.2.2p6.


    If an implementation doesn't generate
    that "call" instruction because it's able to determine at link time that
    the call does nothing, that optimization is forbidden?

    The text says so. Translation units are separate; semantic analysis is finished in translation phase 7; linking in 8.

    The text also says (in footnotes) that the phases are for conceptual description only, and in practice they are typically folded together.


    What wording in the standard requires a "call" instruction to be
    generated if they're in different translation units?

    That's a trivial example, but other link time optimizations that don't
    change a program's observable behavior (insert weasel words about
    unspecified behavior) are also allowed.

    An example would be the removal of material that is not referenced,
    like functions not called anywhere, or entire translation units
    whose external names are not referenced. That can cause issues too,
    and I've run into them, but I can't call that nonconforming.
    Nothing is semantically analyzed across translation units, only the
    linkage graph itself, which may be found to be disconnected.


    Removal of unreferenced material at link time is very common. In some
    fields, it is standard practice to use compiler and linker flags geared
    at making this easier. It is not really any different than using static libraries - the linker will load all requested static libraries, then
    throw out all parts that are not transitively reachable from non-library
    code.

    The inclusion or not of material in the program image is not directly observable behaviour in C - there is no way to write portable C code to determine if the function "foo" has been included in the image despite
    never being referenced. (You can, of course, have the linker include information about the image inside the image itself and read that with volatile accesses from within the program.)

    In small-systems embedded programming, "-ffunction-sections" and "-fdata-sections", along with "-Wl,--gc-sections", are almost invariably
    used for gcc to reduce the size of the final image. It makes it much
    more practical to write re-usable code even if not all functions are
    used in any given application. I have never heard of it "causing
    issues", and I cannot see how it might be non-conforming. (And if it is
    not a conformance issue, how is it relevant here?)

    In phase 8:
    All external object and function references are resolved. Library
    components are linked to satisfy external references to functions
    and objects not defined in the current translation. All such
    translator output is collected into a program image which contains
    information needed for execution in its execution environment.

    I don't see anything about required CPU instructions.

    I don't see anything about /removing/ instructions that have to be
    there according to the semantic analysis performed in order to
    translate those units from phases 1 - 7, and that can be confirmed
    to be present with a test harness.


    The C standard doesn't deal with CPU instructions. It does not have a
    concept of "running" a translated translation unit - you can only run a complete program, at which point there is no distinction between the translation units that are "collected" into the program image. It's all
    fused together into one big lump, with one set of observable behaviours.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Keith Thompson@3:633/280.2 to All on Sat Mar 23 05:21:40 2024
    Kaz Kylheku <433-929-6894@kylheku.com> writes:
    On 2024-03-22, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    Kaz Kylheku <433-929-6894@kylheku.com> writes:
    Since ISO C says that the semantic analysis has been done (that
    unit having gone through phase 7), we can take it for granted as a
    done-and-dusted property of that translation unit that it calls bar
    whenever its foo is invoked.

    We can take it for granted that the output performed by the printf call
    will be performed, because output is observable behavior. If the
    external function bar is modified, the LTO step has to be redone.

    That's what undeniably has to be done in the LTO world. Nothing that
    is done brings that world into conformance, though.

    Say I have a call to foo in main, and the definition of foo is in
    another translation unit. In the absence of LTO, the compiler will have >>>> to generate a call to foo. If LTO is able to determine that foo doesn't >>>> do anything, it can remove the code for the function call, and the
    resulting behavior of the linked program is unchanged.

    There always situations in which optimizations that have been forbidden
    don't cause a problem, and are even desirable.

    If you have LTO turned on, you might be programming in GNU C or Clang C
    or whatever, not standard C.

    Sometimes programs have the same interpretation in GNU C and standard
    C, or the same interpretation to someone who doesn't care about certain
    differences.

    Are you claiming that a function call is observable behavior?

    Yes. It is the observable behavior of an unlinked translation unit.

    An unlinked translation unit has no observable behavior in the way that
    term is defined by the standard.

    It can be observed by linking a harness to it, with a main() function
    and all else that is required to make it a complete program.

    That harness becomes an instrument for observation.

    And a "call" instruction in a program consisting of a single translation
    unit can be observed in a variety of ways. That doesn't make it
    "observable behavior".

    Are you using the phrase "observable behavior" in a sense other than
    what's defined in N1570 5.1.2.3?

    [...]

    Are you saying that the "call" instruction generated for the function
    call is *observable behavior*?

    Of course; it can be observed externally, without doing any reverse engineering on the translated unit.

    Is the "call" instruction *observable behavior* as defined in 5.1.2.3?

    [...]

    In phase 8:
    All external object and function references are resolved. Library
    components are linked to satisfy external references to functions
    and objects not defined in the current translation. All such
    translator output is collected into a program image which contains
    information needed for execution in its execution environment.

    I don't see anything about required CPU instructions.

    I don't see anything about /removing/ instructions that have to be
    there according to the semantic analysis performed in order to
    translate those units from phases 1 - 7, and that can be confirmed
    to be present with a test harness.

    The standard doesn't mention either adding or removing instructions.

    Running a program under a test harness is effectively running a
    different program. Of course it can yield information about the
    original program, but in effect you're linking the program with a
    different set of libraries.

    I can use a test harness to observe whether a program uses an add or inc instruction to evaluate `i++` (assuming the CPU has both instructions).
    The standard doesn't care how the increment happens, as long as the
    result is correct. It doesn't care *whether* the increment happens
    unless the result affects the programs *observable behavior*.

    What in the description of translation phases 7 and 8 makes
    behavior-preserving optimizations valid in phase 7 and forbidden in
    phase 8? (Again, insert weasel words about unspecified behavior.)

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Working, but not speaking, for Medtronic
    void Void(void) { Void(); } /* The recursive call of the void */

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: None to speak of (3:633/280.2@fidonet)
  • From Kaz Kylheku@3:633/280.2 to All on Sat Mar 23 05:55:15 2024
    On 2024-03-22, David Brown <david.brown@hesbynett.no> wrote:
    You should read the footnotes to 5.1.1.2 "Translation phases".
    Footnotes are not normative, but they are helpful in explaining the
    meaning of the text. They note that compilers don't have to follow the details of the translation phases, and that source files, translation
    units, and translated translation units don't have to have one-to-one correspondences.

    Yes, I'm aware of that. For instance preprocessing can all be jumbled
    into one process. But it has to produce that result.

    Even if translation phases 7 and 8 are combined, the semantic analysis
    of the individual translation unit has to appear to be settled before
    linkage. So for instance a translation unit could incrementally emerge
    from the semantic analysis steps, and those parts of it already analyzed
    (phase 7) could start to be linked to other translation units (phase 8).

    I'm just saying that certain information leakage is clearly permitted, regardless of how the phases are integrated.

    The standard also does not say what the output of "translation" is - it
    does not have to be assembly or machine code. It can happily be an
    internal format, as used by gcc and clang/llvm. It does not define what "linking" is, or how the translated translation units are "collected
    into a program image" - combining the partially compiled units,
    optimising, and then generating a program image is well within that definition.

    (That can be inferred
    from the rules which forbid semantic analysis across translation
    units, only linkage.)

    The rules do not forbid semantic analysis across translation units -
    they merely do not /require/ it. You are making an inference without
    any justification that I can see.

    Translation phase 7 is clearly about a single translation unit in
    isolation:

    "The resulting tokens are syntactically and semantically analyzed
    and translated as a translation unit."

    Not: "as a combination of multiple translation uints".

    5.1.1.1 clearly refers to "[t]he separate translation units of a
    program".

    LTO pretends that the program is still divided into the same translation
    units, while minging them together in ways contrary to all those
    chapter 5 descriptions.

    The conforming way to obtain LTO is to actually combine multiple
    preprocessing translation units into one.

    That's why we can have a real world security issue caused by zeroing
    being optimized away.

    No, it is not. We have real-world security issues for all sorts of
    reasons, including people mistakenly thinking they can force particular types of code generation by calling functions in different source files.

    In fact, that code generation is forced, when people do not use LTO,
    which is not enabled by default.

    The rules spelled out in ISO C allow us to unit test a translation
    unit by linking it to some harness, and be sure it has exactly the
    same behaviors when linked to the production program.

    No, they don't.

    If the unit you are testing calls something outside that unit, you may
    get different behaviours when testing and when used in production.

    Yes; if you do nonconforming things.

    only thing you can be sure of from testing is that if you find a bug
    during testing, you have a bug in the code. You can never use testing
    to be sure that the code works (with the exception of exhaustive testing
    of all possible inputs, which is rarely practical).

    LTO will break translation units that are simple enough to be trivially
    proven to have a certain behavior.

    If I have some translation unit in which there is a function foo, such
    that when I call foo, it then calls an external function bar, that's
    observable.

    5.1.2.2.1p6 lists the three things that C defines as "observable
    behaviour". Function calls - internal or external - are not amongst these.

    External calls are de facto observable, because we have it for granted
    when we have a translation unit that calls a certain function, we can
    supply another translation unit which supplies that function. In
    that function we can communicate with the host environment to confirm
    that it was called.

    I can link that unit to a program which supplies bar,
    containing a printf call, then call foo and verify that the printf call
    is executed.

    Yes, you can. The printf call - or, more exactly, the "input and output dynamics" - are observable behaviour. The call to "bar", however, is not.

    If bar does not call the function, then the observable behavior of
    printf doesn't occur either; they linked by logic / cause-and-effect.

    A behavior that is not itself formally classified as observable can be discovered by logical linkage to be necessary for the production of
    observable behavior. It can be an "if, and only if" linkage.

    If an observable behavior B occurs if, and only if, some behavior A
    occurs, then the fact of whether A occurs or not is de facto observable.

    The compiler, when compiling the source of "foo", will include a call to "bar" when it does not have the source code (or other detailed semantic information) for "bar" available at the time.

    Translation phases 1 to 7 forbid processing material from another
    translation unit. Conforming semantic analysis of a translation unit has nothing but that translation unit.

    But you are mistaken to
    think it does so because the call is "observable" or required by the C standard.

    Sure; let's say that the call can be tied to observable behavior
    elsewhere such that the call occurs if and only if the observable
    behavior occurs.

    It does so because it cannot prove that /running/ the
    function "bar" contains no observable behaviour, or otherwise affects
    the observable behaviour of the program. The compiler cannot skip the
    call unless it can be sure it is safe to do so - and if it knows nothing about the implementation of "bar", it must assume the worst.

    The compiler cannot do any of this if it is in a conforming mode.

    But sure, in the nonconforming LTO paradigm, which does have to adhere
    to sane rules, that more or less follow what would have to happen if
    multiple preprocessing translation units were merged at the token level
    and thus analyzed together.

    Sometimes the compiler may have additional information - such as if it
    is declared the gcc "const" or "pure" attributes (or the standardised "unsequenced" and "reproducible" attributes in the draft for the next C version after C23).

    If the declarations are available only in another translation unit,
    they cannot be taken into account when analyzing this translation unit.

    Since ISO C says that the semantic analysis has been done (that
    unit having gone through phase 7), we can take it for granted as a
    done-and-dusted property of that translation unit that it calls bar
    whenever its foo is invoked.

    No, we can't - see above. Nothing in the C standards forbids any
    additional analysis, or using other information in code generation.

    Any semantic analysis performed be that which is stated in translation
    phase 7, which happens for one translation unit, before considering
    linkage to other translation units.

    What forbids is is that no semantic analysis activity is decribed as
    taking place in translation phase 8, other than linage.

    Say I have a call to foo in main, and the definition of foo is in
    another translation unit. In the absence of LTO, the compiler will have >>> to generate a call to foo. If LTO is able to determine that foo doesn't >>> do anything, it can remove the code for the function call, and the
    resulting behavior of the linked program is unchanged.

    There always situations in which optimizations that have been forbidden
    don't cause a problem, and are even desirable.


    Can you give examples?

    You already mentioned "-fast-math" (and by implication, its various
    subflags in gcc, clang and icc). These are clearly documented as
    allowing some violations of the C standards (and not least, the IEEE floating point standards, which are stricter than those of C).

    Yes, and some people want that, learn how it works, and get their
    programs working with it, all the while knowing that it's
    nonconforming to IEEE and ISO C.

    Another tool in the box.

    (While I don't much like an "appeal to authority" argument, I think it's worth noting that the major C / C++ compilers, gcc, clang/llvm and MSVC,
    all support link-time optimisation. They also all work together with
    both the C and C++ standards committees. It would be quite the scandal
    if there were any truth in your claims and these compiler vendors were
    all breaking the rules of the languages they help to specify!)

    Why would it be?

    In the first place, all the implementations you mention have to be
    explicitly put into a nondefault configuration in order to resemble
    conforming ISO C implementations.

    LTO is not even enabled by default (for good reasons).

    A few goofballs who maintain GNU/Linux distros are turning on LTO for
    compiling upstream packages whose development they know nothing about
    beyond ./configure && make. (Luckily, the projects themselves can take countermeasures to defend against this.)

    I think the fact that LTO is almost certainly nonconforming deserves
    more attention, but not panic or anything like that.

    LTO should be made into a conforming feature that is optional.
    Translation phase 8 can be split into 8 and 9. In 8, translation units
    would be optionally partitioned into subsets. Each subset containing
    two or more translation units would be be subjected to further semantic analysis, as a group, and turned into a subset translation unit.
    Phase 9 would be same as former 8.

    Whether an implementation supports subsetting and the manner in which
    units are indicated for subsetting would be implementation-defined, but
    it would be clear that there is a semantic difference, and that each implementation must support a translation mode in which the subsetting
    isn't performed.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Kaz Kylheku@3:633/280.2 to All on Sat Mar 23 06:27:32 2024
    On 2024-03-22, James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
    And the C standard imposes no requirement that such behavior occur as described by the abstract semantics. Only actual observable behavior, as
    that term is defined by the C standard, must occur as if those semantics
    were followed - whether or not they actually were.

    But there is something. Though not normative text, EXAMPLE 1 gives
    the range of possibilities for optimization:

    EXAMPLE 1 An implementation might define a one-to-one correspondence
    between abstract and actual semantics: at every sequence point, the
    values of the actual objects would agree with those specified by the
    abstract semantics. The keyword volatile would then be redundant.

    Alternatively, an implementation might perform various optimizations
    within each translation unit, such that the actual semantics would agree
    with the abstract semantics only when making function calls across
    translation unit boundaries.

    I believe the intent of this example is to give the two extremes
    representing the full range of what is envisioned as permissible.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Sat Mar 23 06:39:33 2024
    On 3/21/2024 5:38 PM, Chris M. Thomasson wrote:
    On 3/21/2024 4:19 PM, Scott Lurndal wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
    On 3/21/2024 1:21 PM, Scott Lurndal wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    "All of its “critical-sequences” are contained in externally assembled
    functions ( read all ) in order to prevent a rouge C compiler from

    As opposed to a viridian C compiler?

    I was worried about "overly aggressive" LTO messing around with my ASM.

    And you missed the oblique reference to the mispelling of 'rogue' as
    'rouge'.

    Yup! I sure did. I have red on my face!

    I wonder if I have a bit of dyslexia. Sometimes when I am typing along
    without looking at the keyboard, I can make a mistake that is backwards
    wrt two letters.

    For instance, spelling the word "careful" as "carfeul", car fuel? lol...
    The mistake I made with rogue vs rouge is that same swapping error as
    well. This is a "bad" one because spell checker does not flag it.

    It's strange because when I look at the keyboard while I am typing,
    well, that does not occur.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Kaz Kylheku@3:633/280.2 to All on Sat Mar 23 06:43:04 2024
    On 2024-03-22, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    Is the "call" instruction *observable behavior* as defined in 5.1.2.3?

    No it isn't. The Boolean fact whether or not that call is taken can be
    tied to observable behavior elsewhere, though.


    [...]

    In phase 8:
    All external object and function references are resolved. Library
    components are linked to satisfy external references to functions
    and objects not defined in the current translation. All such
    translator output is collected into a program image which contains
    information needed for execution in its execution environment.

    I don't see anything about required CPU instructions.

    I don't see anything about /removing/ instructions that have to be
    there according to the semantic analysis performed in order to
    translate those units from phases 1 - 7, and that can be confirmed
    to be present with a test harness.

    The standard doesn't mention either adding or removing instructions.

    Running a program under a test harness is effectively running a
    different program. Of course it can yield information about the
    original program, but in effect you're linking the program with a
    different set of libraries.

    It's a different program, but the retained translation unit must be the
    same, except that the external references it makes are resolved to
    different entities.

    If in one program we have an observable behavior which implies that a
    call took place (that itself not being directly observable, by
    definition, I again acknowledge) then under the same conditions in
    another program, that call also has to take place, by the fact that the translation unit has not changed.

    I can use a test harness to observe whether a program uses an add or inc instruction to evaluate `i++` (assuming the CPU has both instructions).
    The standard doesn't care how the increment happens, as long as the
    result is correct. It doesn't care *whether* the increment happens
    unless the result affects the programs *observable behavior*.

    If i is an object with external linkage defined outside of some
    tranlation unit and some function in the translation unit
    unconditionally increments i (without further using its value), then
    that has to happen, even in a program in which nothing else uses i.

    By this blackbox method I'm describing, no, we cannot confirm whether
    it's by an inc instruction or whatever. Just, does it happen.

    In one test program we can tie that to observable behavior, like
    printing the value of i before and after calling that function.

    Though the increment isn't observable behavior (unless i is volatile?),
    since it has been confirmed that the translation unit does that, it does
    that.

    What in the description of translation phases 7 and 8 makes behavior-preserving optimizations valid in phase 7 and forbidden in
    phase 8? (Again, insert weasel words about unspecified behavior.)

    That translation phase 7 is described as completing semantic analysis, resulting in a translated unit which may be retained. (Moreover,
    analysis of a single unit, not multiple.) and that 8 is described
    as only resolving references and linking.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Sat Mar 23 07:26:22 2024
    On 22/03/2024 19:55, Kaz Kylheku wrote:
    On 2024-03-22, David Brown <david.brown@hesbynett.no> wrote:
    You should read the footnotes to 5.1.1.2 "Translation phases".
    Footnotes are not normative, but they are helpful in explaining the
    meaning of the text. They note that compilers don't have to follow the
    details of the translation phases, and that source files, translation
    units, and translated translation units don't have to have one-to-one
    correspondences.

    Yes, I'm aware of that. For instance preprocessing can all be jumbled
    into one process. But it has to produce that result.

    Even if translation phases 7 and 8 are combined, the semantic analysis
    of the individual translation unit has to appear to be settled before linkage. So for instance a translation unit could incrementally emerge
    from the semantic analysis steps, and those parts of it already analyzed (phase 7) could start to be linked to other translation units (phase 8).


    Again, you are inferring far too much here. The standard is /not/
    limiting like this.

    Compilers can make use of all sorts of additional information. They
    have always been able to do so. They can use extra information provided
    by compiler extensions - such as gcc attributes. They can use
    information from profiling to optimise based on real-world usage. They
    can analyse source code files and use that analysis for optimisation
    (and hopefully also static error checking).


    Consider this:

    A compiler can happily analyse each source code file in all kinds of
    ways, completely independently of what the C standards (or perhaps, by
    happy coincidence, using the same types of pre-processing and
    interpretation). This analysis can be stored in files or some other
    storage place. Do you agree that this is allowed, or do you think the C standards somehow ban it? Note that we are calling this "analysis" -
    not C compilation.

    Now the compiler starts the "real" compilation, passing through the translation phases one by one. When it gets to phase 7, it reads all
    this stored analysis information. (Nothing in the standards says the
    compiler can't pull in extra information - it is quite normal, for
    example, to pull in code snippets as part of the compilation process.)
    For each translation unit, it produces two outputs (in one "fat" object
    file) - one part is a relatively dumb translation that does not make use
    of the analysis, the other uses the analysis information to generate
    more optimal code. Both parts make up the "translator output" for the translation unit. Again, can you point to anything in the C standards
    that would forbid this?

    Then we come to phase 8. The compiler (or linker) reads all the
    "translator output" files needed for the complete program. It checks
    that it has the same set of input files as were used during the pre-compilation analysis. If they are all the same, then the analysis information about the different units is valid, and thus the
    optimisations using that extra information are valid. The "dumb
    translation" versions can be used as a fallback if the analysis was not
    valid - otherwise they are thrown out, and the more optimised versions
    are linked together.

    There is nothing in the description of the translation phases that
    hinders this. All the compiler has to do is ensure that the final
    program - not any individual translation units - has correct observable behaviour.


    I would also refer you to section 1 of the C standards - "Scope". In particular, note that "This document does /not/ specify the mechanism by
    which C programs are transformed for use by a data-processing system". (Emphasis mine.) The workings of the compiler are not part of the standard.


    I'm just saying that certain information leakage is clearly permitted, regardless of how the phases are integrated.

    The standard also does not say what the output of "translation" is - it
    does not have to be assembly or machine code. It can happily be an
    internal format, as used by gcc and clang/llvm. It does not define what
    "linking" is, or how the translated translation units are "collected
    into a program image" - combining the partially compiled units,
    optimising, and then generating a program image is well within that
    definition.

    (That can be inferred
    from the rules which forbid semantic analysis across translation
    units, only linkage.)

    The rules do not forbid semantic analysis across translation units -
    they merely do not /require/ it. You are making an inference without
    any justification that I can see.

    Translation phase 7 is clearly about a single translation unit in
    isolation:

    "The resulting tokens are syntactically and semantically analyzed
    and translated as a translation unit."

    Not: "as a combination of multiple translation uints".

    The point is that many things are local to a translation unit, such as statics, type definitions, and so on. These are valid within the
    translation unit (within their scope, of course), and independent of identically named items in other translation units. It is about
    defining a kind of "unit of compilation" for the language semantics - it
    is /not/ restricting the behaviour of a compiler.

    LTO does not change the language semantics in any way. The language
    semantics determine the observable behaviour of the program, and we have already established that this must be unchanged. Generated instructions
    for a target are not part of the language semantics.



    5.1.1.1 clearly refers to "[t]he separate translation units of a
    program".

    It does so all in terms of what a compiler /may/ do.

    And there is never any specification of the result of a "translation".
    It can happily be byte-code, or internal toolchain-specific formats.


    LTO pretends that the program is still divided into the same translation units, while minging them together in ways contrary to all those
    chapter 5 descriptions.

    No.


    The conforming way to obtain LTO is to actually combine multiple preprocessing translation units into one.


    You could do that if you like (after manipulating things to handle
    statics, type definitions, etc.).

    And you would then find that if "foo()" in "foo.c" called "bar()" in
    "bar.c", the call to "bar()" might be inlined, or omitted, or otherwise optimised, just as it could be if they were both defined in the same translation unit.

    The result would be the same kind of object code as you get with LTO -
    one in which the observable behaviour is as expected, but you might get different details in the generated code.

    I don't know why you would think that this kind of combination of units
    is conforming, but LTO is not. It's all the same thing in principle -
    the only difference is that real-world implementations of LTO are
    designed to be scalable, do as much as possible in parallel, and avoid re-doing work for files that don't change.

    Some link-time optimisation or "whole program optimisation" toolchains
    are aimed at small code bases (such as might fit into a small
    microcontroller) and combine all the source code together then handle it
    all at once. Again, the principles and the semantics are not any
    different from gcc LTO - it's just a different way of splitting up the work.

    That's why we can have a real world security issue caused by zeroing
    being optimized away.

    No, it is not. We have real-world security issues for all sorts of
    reasons, including people mistakenly thinking they can force particular
    types of code generation by calling functions in different source files.

    In fact, that code generation is forced, when people do not use LTO,
    which is not enabled by default.


    No, it is not.

    The C standards don't talk about LTO, or whether or not it is enabled,
    or what is "default", or even what kind of code generation you get.

    If the compiler knows that a function call will not have or affect
    observable behaviour, it can omit that call. It does not matter how it
    knows this. LTO is a very practical way to get this information, but it
    might not be the only way. Profile-guided optimisation information may provide the same information. So could attributes given in the function declaration (and a future C standard will likely support such attributes).

    But if the compiler doesn't know for sure that it is safe to omit the
    call, then it must generate it. Correctness trumps optimisation!

    The rules spelled out in ISO C allow us to unit test a translation
    unit by linking it to some harness, and be sure it has exactly the
    same behaviors when linked to the production program.

    No, they don't.

    If the unit you are testing calls something outside that unit, you may
    get different behaviours when testing and when used in production.

    Yes; if you do nonconforming things.

    No one is suggesting doing "nonconforming things".

    To give a simple example, suppose your unit is intended to perform some calculations and then call a callback with the result. In a test
    harness, you would provide a callback that checks the result against the expected value, and provides a pass/fail log message. In production
    use, you would provide a callback that pops up a window with the value,
    or sends it in an email to the user. The observable behaviour of the production program and the test program is very different.

    In fact, unless you are testing the production version, or you are
    producing a test harness, you would normally expect very different
    observable behaviours from any unit testing and real usage of the code.


    only thing you can be sure of from testing is that if you find a bug
    during testing, you have a bug in the code. You can never use testing
    to be sure that the code works (with the exception of exhaustive testing
    of all possible inputs, which is rarely practical).

    LTO will break translation units that are simple enough to be trivially proven to have a certain behavior.


    Again, claiming this will not make it true. You need to update your
    ideas about what observable behaviour actually is.

    If I have some translation unit in which there is a function foo, such
    that when I call foo, it then calls an external function bar, that's
    observable.

    5.1.2.2.1p6 lists the three things that C defines as "observable
    behaviour". Function calls - internal or external - are not amongst these.

    External calls are de facto observable,

    The phrase "de facto" is an admission that you understand that none of
    this is part of the /actual/ standards. You have dropped from "the
    official standards make this clear" down to "I think this".

    because we have it for granted
    when we have a translation unit that calls a certain function, we can
    supply another translation unit which supplies that function. In
    that function we can communicate with the host environment to confirm
    that it was called.


    All such boundaries are lost in the link stage, before observable
    behaviour becomes relevant.

    I can link that unit to a program which supplies bar,
    containing a printf call, then call foo and verify that the printf call
    is executed.

    Yes, you can. The printf call - or, more exactly, the "input and output
    dynamics" - are observable behaviour. The call to "bar", however, is not.

    If bar does not call the function, then the observable behavior of
    printf doesn't occur either; they linked by logic / cause-and-effect.


    Nonsense.

    The compiler-generated code must produce the correct observable
    behaviour. It can do that however it likes. It can put a call to
    "printf" directly in "foo". It can replace the "printf" with a "puts"
    or a series of target-specific "write_a_char" calls if the results are
    the same.

    C is defined in terms of behaviour, not particular instruction
    sequences. If you write "x = y * 4;", the compiler can generate
    instructions that look like "x = y + y + y + y;", or "x = y * 2; x = x +
    y + y;", or "x = y << 8 - (2 * y + 3 * y - y)_;", or anything it likes
    as long as the result is correct (and obviously avoiding any extra
    overflows).


    A behavior that is not itself formally classified as observable can be discovered by logical linkage to be necessary for the production of observable behavior. It can be an "if, and only if" linkage.

    If an observable behavior B occurs if, and only if, some behavior A
    occurs, then the fact of whether A occurs or not is de facto observable.

    Calling it "de facto observable behaviour" is just confusing your understanding here. But you can well say that if B is observed, that
    means A must have happened.

    However, you have not in any way shown that A (in this case,
    instructions to call the function "bar") is the only way to result in
    the observable behaviour.


    The compiler, when compiling the source of "foo", will include a call to
    "bar" when it does not have the source code (or other detailed semantic
    information) for "bar" available at the time.

    Translation phases 1 to 7 forbid processing material from another
    translation unit.

    Nope.

    Conforming semantic analysis of a translation unit has
    nothing but that translation unit.


    Nope.

    But you are mistaken to
    think it does so because the call is "observable" or required by the C
    standard.

    Sure; let's say that the call can be tied to observable behavior
    elsewhere such that the call occurs if and only if the observable
    behavior occurs.


    That would be a better way to put it. But it is still not the case here.

    It does so because it cannot prove that /running/ the
    function "bar" contains no observable behaviour, or otherwise affects
    the observable behaviour of the program. The compiler cannot skip the
    call unless it can be sure it is safe to do so - and if it knows nothing
    about the implementation of "bar", it must assume the worst.

    The compiler cannot do any of this if it is in a conforming mode.

    The compiler can omit the call to "bar" if it is sure that it results in
    no observable behaviour. It cannot omit it if it is not sure of this.
    It is /that/ simple.


    But sure, in the nonconforming LTO paradigm, which does have to adhere
    to sane rules, that more or less follow what would have to happen if
    multiple preprocessing translation units were merged at the token level
    and thus analyzed together.

    Sometimes the compiler may have additional information - such as if it
    is declared the gcc "const" or "pure" attributes (or the standardised
    "unsequenced" and "reproducible" attributes in the draft for the next C
    version after C23).

    If the declarations are available only in another translation unit,
    they cannot be taken into account when analyzing this translation unit.


    Wrong.

    This is really the crux of your misunderstandings. You have read
    between the lines of the standard and imagined rules that don't exist.
    Once you realise that they are imaginary, I expect the rest to fall into place.


    Since ISO C says that the semantic analysis has been done (that
    unit having gone through phase 7), we can take it for granted as a
    done-and-dusted property of that translation unit that it calls bar
    whenever its foo is invoked.

    No, we can't - see above. Nothing in the C standards forbids any
    additional analysis, or using other information in code generation.

    Any semantic analysis performed be that which is stated in translation
    phase 7, which happens for one translation unit, before considering
    linkage to other translation units.

    What forbids is is that no semantic analysis activity is decribed as
    taking place in translation phase 8, other than linage.

    The C standards also don't describe drinking coffee while waiting for
    the compiler. Just because something is not mentioned, does not mean it
    is forbidden!


    Say I have a call to foo in main, and the definition of foo is in
    another translation unit. In the absence of LTO, the compiler will have >>>> to generate a call to foo. If LTO is able to determine that foo doesn't >>>> do anything, it can remove the code for the function call, and the
    resulting behavior of the linked program is unchanged.

    There always situations in which optimizations that have been forbidden
    don't cause a problem, and are even desirable.


    Can you give examples?

    You already mentioned "-fast-math" (and by implication, its various
    subflags in gcc, clang and icc). These are clearly documented as
    allowing some violations of the C standards (and not least, the IEEE
    floating point standards, which are stricter than those of C).

    Yes, and some people want that, learn how it works, and get their
    programs working with it, all the while knowing that it's
    nonconforming to IEEE and ISO C.

    Indeed. I am "some people" in this context.


    Another tool in the box.

    Agreed.

    But "-ffast-math" was already covered, and is irrelevant precisely
    because it is entirely clear that it is potentially standards-violating.
    (But it is not "forbidden". I have yet to see any ISO C police
    enforcers at my office door, waving a warrant.)

    I wanted to know if you had other examples of what you see as standards-violating optimisations that are not documented as such.


    (While I don't much like an "appeal to authority" argument, I think it's
    worth noting that the major C / C++ compilers, gcc, clang/llvm and MSVC,
    all support link-time optimisation. They also all work together with
    both the C and C++ standards committees. It would be quite the scandal
    if there were any truth in your claims and these compiler vendors were
    all breaking the rules of the languages they help to specify!)

    Why would it be?

    It would run counter to the whole point of having a standard.


    In the first place, all the implementations you mention have to be
    explicitly put into a nondefault configuration in order to resemble conforming ISO C implementations.

    Yes, but they are clear about that. (At least, gcc is - I haven't read
    the documentation for clang as thoroughly, and have barely touched MSVC.)

    It is absolutely fine for a compiler to have conforming and
    non-conforming modes. But it is /not/ fine for it to have a major part
    of its optimisation that is as critically non-conforming as you seem to believe, and not even mention this fact.


    LTO is not even enabled by default (for good reasons).

    The good reasons are that not all setups support it (it needs particular linkers), it can significantly increase build times, it makes some kinds
    of debugging nearly impossible, it plays badly with other tools such as profilers and code coverage analysis, and you can have trouble if you
    are doing weird things with compiler and linker file interaction or some
    other kinds of non-standard C coding.

    And like many optimisations, it can change the behaviour of incorrect
    code that happens to work by luck with different choices of optimisation settings.

    Those are all very good reasons for not enabling it for default, when
    the results are often only a few percent improvement in efficiency (for
    some code, it can be a lot more helpful).

    Most compilers don't enable /any/ significant optimisation by default.


    A few goofballs who maintain GNU/Linux distros are turning on LTO for compiling upstream packages whose development they know nothing about
    beyond ./configure && make. (Luckily, the projects themselves can take countermeasures to defend against this.)

    I think the fact that LTO is almost certainly nonconforming deserves
    more attention, but not panic or anything like that.

    If it /were/ nonconforming, I think that would deserve huge attention.
    But it is not.


    LTO should be made into a conforming feature that is optional.
    Translation phase 8 can be split into 8 and 9. In 8, translation units
    would be optionally partitioned into subsets. Each subset containing
    two or more translation units would be be subjected to further semantic analysis, as a group, and turned into a subset translation unit.
    Phase 9 would be same as former 8.

    Whether an implementation supports subsetting and the manner in which
    units are indicated for subsetting would be implementation-defined, but
    it would be clear that there is a semantic difference, and that each implementation must support a translation mode in which the subsetting
    isn't performed.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From bart@3:633/280.2 to All on Sat Mar 23 08:41:43 2024
    On 22/03/2024 17:14, James Kuyper wrote:
    On 3/21/24 14:13, Anton Shepelev wrote:
    ...
    I think this behavior (of a C compiler) rather stupid. In a
    low-level imperative language, the compiled program shall
    do whatever the programmer commands it to do.

    C is NOT that low a level of language. The standard explicitly allows implementations to use any method they find convenient to produce
    observable behavior which is consistent with the requirements of the standard. Despite describing how that behavior might be produced by the abstract machine, it explicitly allows an implementation to achieve that behavior by other means.

    If you want to tell a system not only what a program must do, but also
    how it must do it, you need to use a lower-level language than C.

    Which one?

    I don't think anyone seriously wants to switch to assembly for the sort
    of tasks they want to use C for.

    I agree with AS that a program should do what it's told by the
    programmer and the compiler should not get too smart.

    When /I/ implement such a language, then that's pretty much what happens.

    However, people also expect a reasonable amount of optimisation, which
    can involve take some short-cuts or not doing precisely what the
    programmer wrote, in the detail.

    So the line isn't clearly defined as to what is or isn't acceptable.

    But in this example where somebody has clearly requested an object to be zeroed, ignoring that instruction has crossed the line to unacceptable IMO.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Keith Thompson@3:633/280.2 to All on Sat Mar 23 10:30:32 2024
    bart <bc@freeuk.com> writes:
    On 22/03/2024 17:14, James Kuyper wrote:
    [...]
    If you want to tell a system not only what a program must do, but
    also how it must do it, you need to use a lower-level language than
    C.

    Which one?

    Good question.

    I don't think anyone seriously wants to switch to assembly for the
    sort of tasks they want to use C for.

    Agreed. What some people seem to be looking for is a language that's
    about as portable as C, but where every language construct is required
    to result in generated code that performs the specified operation.
    There's a lot of handwaving in that description. "C without
    optimization", maybe?

    I'm not aware that any such language exists, at least in the mainstream
    (and I've looked at a *lot* of programming languages). I conclude that
    there just isn't enough demand for that kind of thing.

    [...]

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Working, but not speaking, for Medtronic
    void Void(void) { Void(); } /* The recursive call of the void */

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: None to speak of (3:633/280.2@fidonet)
  • From Kaz Kylheku@3:633/280.2 to All on Sat Mar 23 11:09:46 2024
    On 2024-03-22, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    bart <bc@freeuk.com> writes:
    On 22/03/2024 17:14, James Kuyper wrote:
    [...]
    If you want to tell a system not only what a program must do, but
    also how it must do it, you need to use a lower-level language than
    C.

    Which one?

    Good question.

    I don't think anyone seriously wants to switch to assembly for the
    sort of tasks they want to use C for.

    Agreed. What some people seem to be looking for is a language that's
    about as portable as C, but where every language construct is required
    to result in generated code that performs the specified operation.
    There's a lot of handwaving in that description. "C without
    optimization", maybe?

    I'm not aware that any such language exists, at least in the mainstream
    (and I've looked at a *lot* of programming languages). I conclude that
    there just isn't enough demand for that kind of thing.

    I think you can more or less get something like that with the following strategy:

    - all memory accesses through pointers are performed as written.
    - local variables are aggressively optimized into registers.
    - basic optimizations:
    - constant folding, dead code elimination.
    - basic control flow ones: jump threading and the like.
    - basic data flow optimizations.
    - peephole, good instruction selection.

    In that environment, the way the programmer writes the code is the rest
    of the optimization. Want loop unrolling? Write it yourself.


    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From James Kuyper@3:633/280.2 to All on Sat Mar 23 18:26:11 2024
    bart <bc@freeuk.com> writes:
    On 22/03/2024 17:14, James Kuyper wrote:
    [...]
    If you want to tell a system not only what a program must do, but
    also how it must do it, you need to use a lower-level language than
    C.

    Which one?

    That's up to you. The point is, C is NOT that language.

    I don't think anyone seriously wants to switch to assembly for the
    sort of tasks they want to use C for.

    Why not? Assembly provides the kind of control you're looking for; C
    does not. If that kind of control is important to you, you have to find
    a language which provides it. If not assembler or C, what would you use?

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Richard Kettlewell@3:633/280.2 to All on Sat Mar 23 20:20:43 2024
    David Brown <david.brown@hesbynett.no> writes:
    I have tried to explain the reality of what the C standards say in a
    couple of posts (including one that I had not posted before you wrote
    this one). I have tried to make things as clear as possible, and
    hopefully you will see the point.

    If not, then you must accept that you interpret the C standards in a different manner from the main compile vendors, as well as some "big
    names" in this group. That is, of course, not proof in itself - but
    you must realise that for practical purposes you need to be aware of
    how others interpret the standard, both for your own coding and for
    the advice or recommendations you give to others.

    Agreed that the ship has sailed on whether LTO is a valid optimization.
    But it’s understandable why someone might reach a different conclusion.

    - Phase 7 says the tokens are “semantically analyzed and translated as a
    translation unit”.

    - Phase 8 does not use either verb, “analyzed” or “translated”.

    - At least two steps (in the abstract, as-if model) are explicitly
    happening in the “as a translation unit” level but not in any wider
    context.

    - The result of those two steps (“translator output”) is than
    “collected”.

    - Unless you somehow understand that “collected” implicitly includes
    further analysis and translation, it’s does not seem unnatural to
    conclude that many of the whole-program optimizations done by LTO
    implementations would be outside the spec.

    This would be very easy to address, by replacing “collected” with a word
    or phrase that makes clear that further analysis and translation can
    happen outside the “as a translation unit” context.

    Obviously this would violate the principle from the rationale that
    existing code (that uses TU boundaries to get memset to “work”) is important and existing implementations (LTO) are not, but C
    standardization has never actually behaved as if that is true anyway.

    --
    https://www.greenend.org.uk/rjk/

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: terraraq NNTP server (3:633/280.2@fidonet)
  • From bart@3:633/280.2 to All on Sat Mar 23 22:26:03 2024
    On 23/03/2024 07:26, James Kuyper wrote:
    bart <bc@freeuk.com> writes:
    On 22/03/2024 17:14, James Kuyper wrote:
    [...]
    If you want to tell a system not only what a program must do, but
    also how it must do it, you need to use a lower-level language than
    C.

    Which one?

    That's up to you. The point is, C is NOT that language.

    I'm asking which /mainstream/ HLL is lower level than C. So specifically ruling out assembly.

    If there is no such choice, then this is the problem: it has to be C or nothing.

    I don't think anyone seriously wants to switch to assembly for the
    sort of tasks they want to use C for.

    Why not? Assembly provides the kind of control you're looking for; C
    does not. If that kind of control is important to you, you have to find
    a language which provides it. If not assembler or C, what would you use?

    Among non-mainstream ones, my own would fit the bill. Since I write the implementations, I can ensure the compiler doesn't have a mind of its own.

    However if somebody else tried to implement it, then I can't guarantee
    the same behaviour. This would need to somehow be enforced with a
    precise language spec, or mine would need to be a reference
    implementation with a lot of test cases.


    -----------------

    Take this program:

    #include <stdio.h>
    int main(void) {
    goto L;
    0x12345678;
    L:
    printf("Hello, World!\n");
    }

    If I use my compiler, then that 12345678 pattern gets compiled into the
    binary (because it is loaded into a register then discarded). That means
    I can use that value as a marker or sentinel which can be searched for.

    However no other compiler I tried will do that. If I instead change that
    line to:

    int a = 0x12345678;

    then a tcc-compiled binary will contain that value. So will
    lccwin32-compiled (with a warning). But not DMC or gcc.

    If I get rid of the 'goto' , then gcc-O0 will work, but still not DMC or gcc-O3.

    Here I can use `volatile` to ensure that value stays in, but not if I
    put the 'goto' back in!

    It's all too unpredictable.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Sun Mar 24 02:36:02 2024
    On 22/03/2024 20:43, Kaz Kylheku wrote:
    On 2024-03-22, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    Is the "call" instruction *observable behavior* as defined in 5.1.2.3?



    Running a program under a test harness is effectively running a
    different program. Of course it can yield information about the
    original program, but in effect you're linking the program with a
    different set of libraries.

    It's a different program, but the retained translation unit must be the
    same, except that the external references it makes are resolved to
    different entities.

    That is true - /if/ you make the restriction that the translation unit
    is complied completely to linkable machine code or assembly, and that it
    is not changed in any way when it is combined into the new program.
    Such a setup is common in practice, but it is in no way required by the
    C standards and does not apply for more advanced compilation and build scenarios.


    If in one program we have an observable behavior which implies that a
    call took place (that itself not being directly observable, by
    definition, I again acknowledge) then under the same conditions in
    another program, that call also has to take place, by the fact that the translation unit has not changed.

    Yes - again, /if/ you restrict your tools and build processes to make
    this true. (And though the call may still be there, it is still not observable behaviour, and it may no longer lead to any observable
    behaviour in the new program.)

    Basically, what you are saying is that if you have a compiler and build
    system that compiles individual translation units into fixed individual
    object files of linkable machine code, and these units are not
    recompiled when you link them again in new programs, then the machine
    code in for the externally linked functions defined in those translation
    units is not changed.

    I don't think anyone will argue with that - it is quite solid, and does
    not come as news to anybody familiar with compilers and build processes.

    The thing you get wrong is believing that the C standards require such a compiler and build system. They don't - and thus all your beliefs
    (about interaction across translation units) which depend on such a requirement, fall apart.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Kaz Kylheku@3:633/280.2 to All on Sun Mar 24 03:06:34 2024
    On 2024-03-23, Richard Kettlewell <invalid@invalid.invalid> wrote:
    David Brown <david.brown@hesbynett.no> writes:
    I have tried to explain the reality of what the C standards say in a
    couple of posts (including one that I had not posted before you wrote
    this one). I have tried to make things as clear as possible, and
    hopefully you will see the point.

    If not, then you must accept that you interpret the C standards in a
    different manner from the main compile vendors, as well as some "big
    names" in this group. That is, of course, not proof in itself - but
    you must realise that for practical purposes you need to be aware of
    how others interpret the standard, both for your own coding and for
    the advice or recommendations you give to others.

    Agreed that the ship has sailed on whether LTO is a valid optimization.

    There is no question that LTO is a "valid" optimization for reasonable definitions of valid.

    But it’s understandable why someone might reach a different conclusion.

    That alone is a problem.

    - Phase 7 says the tokens are “semantically analyzed and translated as a
    translation unit”.

    - Phase 8 does not use either verb, “analyzed” or “translated”.

    That adds up to requirements that are /obviously/ violated by LTO.

    Someone might reach a different conclusion simply by reading the black-and-white text, which obviously spells out what is required.

    When reading the standard, you can't just ignore bits you think
    are wrong.

    It may be the case that a strictly conforming program cannot tell
    whether these requirements are violated.

    Strictly conforming programs are not the be all and end all of what is important.

    In the academic paradigm of a strictly conforming program, a security
    problem of bytes not being nulled out (or any other such thing) does not
    exist.

    This would be very easy to address, by replacing “collected” with a word or phrase that makes clear that further analysis and translation can
    happen outside the “as a translation unit” context.

    No, it's not that easy to address. The standard should make explicit
    provisions for LTO. There should be an optional translation phase
    between the current 7 and 8 in which translation units may be
    partitioned into subsets, an then subject to semantic analysis
    and further translation within the subsets, prior to linking.

    The standard wouldn't describe how the partitioning is requested from
    the implementation, since it is part of the manner in which a program is presented to it. All implementations should support a translation mode
    in which no partitioning into subsets takes place.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Kaz Kylheku@3:633/280.2 to All on Sun Mar 24 03:07:47 2024
    On 2024-03-23, David Brown <david.brown@hesbynett.no> wrote:
    On 22/03/2024 20:43, Kaz Kylheku wrote:
    On 2024-03-22, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    Is the "call" instruction *observable behavior* as defined in 5.1.2.3?



    Running a program under a test harness is effectively running a
    different program. Of course it can yield information about the
    original program, but in effect you're linking the program with a
    different set of libraries.

    It's a different program, but the retained translation unit must be the
    same, except that the external references it makes are resolved to
    different entities.

    That is true - /if/ you make the restriction that the translation unit
    is complied completely to linkable machine code or assembly, and that it
    is not changed in any way when it is combined into the new program.
    Such a setup is common in practice, but it is in no way required by the
    C standards and does not apply for more advanced compilation and build scenarios.

    Well, it's only not required if you hand-wave away the sentences in
    section 5.

    You can't just do that!

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Sun Mar 24 03:08:48 2024
    On 23/03/2024 10:20, Richard Kettlewell wrote:
    David Brown <david.brown@hesbynett.no> writes:
    I have tried to explain the reality of what the C standards say in a
    couple of posts (including one that I had not posted before you wrote
    this one). I have tried to make things as clear as possible, and
    hopefully you will see the point.

    If not, then you must accept that you interpret the C standards in a
    different manner from the main compile vendors, as well as some "big
    names" in this group. That is, of course, not proof in itself - but
    you must realise that for practical purposes you need to be aware of
    how others interpret the standard, both for your own coding and for
    the advice or recommendations you give to others.

    Agreed that the ship has sailed on whether LTO is a valid optimization.
    But it’s understandable why someone might reach a different conclusion.

    I /do/ understand why Kaz thinks the way he does. I am just trying to
    show that his interpretation is wrong, so that he can better understand
    what is going on, and how to get the behaviour he wants.


    - Phase 7 says the tokens are “semantically analyzed and translated as a
    translation unit”.

    - Phase 8 does not use either verb, “analyzed” or “translated”.

    - At least two steps (in the abstract, as-if model) are explicitly
    happening in the “as a translation unit” level but not in any wider
    context.

    - The result of those two steps (“translator output”) is than
    “collected”.

    - Unless you somehow understand that “collected” implicitly includes
    further analysis and translation, it’s does not seem unnatural to
    conclude that many of the whole-program optimizations done by LTO
    implementations would be outside the spec.

    This would be very easy to address, by replacing “collected” with a word or phrase that makes clear that further analysis and translation can
    happen outside the “as a translation unit” context.


    I would be entirely happy to see clearer wording in the standards here,
    or at least some footnotes saying what is allowed or not allowed.

    Obviously this would violate the principle from the rationale that
    existing code (that uses TU boundaries to get memset to “work”) is important and existing implementations (LTO) are not, but C
    standardization has never actually behaved as if that is true anyway.


    Oh, I think the C standards committee have done quite well at that. But
    doing it /completely/ would clearly be impossible, as different people
    have different ideas about how they think C is defined, and how they
    think C compilers have to behave. In my line of work, I see plenty of
    old code that makes assumptions that are not remotely justified by the C standards, but which happened to work on the old or limited toolchain
    used by the person who wrote the code. If the C standards tried to
    codify such practices, or if C compilers tried to make sure that /all/
    code that worked with other compilers or older versions works on newer
    tools, progress on compilers would be completely stalled and we'd have
    no optimisations that weren't already in common use in the 1970's.

    What the standards committee try to say is that if code follows C
    standard N correctly, then when it is compiled under C standard N+1 it
    should have the same semantics and the same behaviour. And they do that reasonably, but not perfectly.

    It would be unreasonable to expect them to guarantee the behaviour of
    code under new standards when the code did not have guaranteed behaviour
    under the old standards. Using TU boundaries to "get memset to work"
    has never been guaranteed.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Sun Mar 24 03:25:40 2024
    On 23/03/2024 01:09, Kaz Kylheku wrote:
    On 2024-03-22, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    bart <bc@freeuk.com> writes:
    On 22/03/2024 17:14, James Kuyper wrote:
    [...]
    If you want to tell a system not only what a program must do, but
    also how it must do it, you need to use a lower-level language than
    C.

    Which one?

    Good question.

    I have no answer here either.


    I don't think anyone seriously wants to switch to assembly for the
    sort of tasks they want to use C for.

    One of the stated motivations for creating C was to save people from
    writing code in assembly!


    Agreed. What some people seem to be looking for is a language that's
    about as portable as C, but where every language construct is required
    to result in generated code that performs the specified operation.
    There's a lot of handwaving in that description. "C without
    optimization", maybe?

    I'm not aware that any such language exists, at least in the mainstream
    (and I've looked at a *lot* of programming languages). I conclude that
    there just isn't enough demand for that kind of thing.

    I think lack of demand combines with it actually being an extremely
    difficult task.

    Consider something as simple as "x++;" in C. How could that be
    implemented? Perhaps the cpu has an "increment" instruction. Perhaps
    it has an "add immediate" instruction. Perhaps it needs to load 1 into
    a register, then use an "add" instruction. Perhaps "x" is in memory.
    Some cpus can execute an increment directly on the memory address as an
    atomic instruction. Some can do so, but only using specific (and more expensive) instructions. Some can't do it at all without locking
    mechanisms and synchronisation loops.

    So what does this user of this mythical LLL expect when he/she writes
    "x++;" ? If the language had been created in the days of 8086 on DOS,
    perhaps it would have been defined as an atomic operation - and now
    doing this atomically on an AArch64 device would be extremely inefficient.

    The big trouble with saying that the compiler should "do what I say" is
    that people have very different ideas about what they mean when they
    write things. You either have to have quite high-level and abstract definitions about meanings and give compilers a fair amount of freedom
    when implementing them (thus you get high-level languages defined by behaviours on abstract machines - like C and just about every other programming language), or you have to tie it tightly to the target
    processor (and you get assembly), or the language designer, the compiler implementers and the programmers all have to think exactly the same way
    (which really means one-person languages, like Bart's).


    I think you can more or less get something like that with the following strategy:

    - all memory accesses through pointers are performed as written.
    - local variables are aggressively optimized into registers.
    - basic optimizations:
    - constant folding, dead code elimination.
    - basic control flow ones: jump threading and the like.
    - basic data flow optimizations.
    - peephole, good instruction selection.

    In that environment, the way the programmer writes the code is the rest
    of the optimization. Want loop unrolling? Write it yourself.


    You might like to try to formalise this. You won't be the first to
    attempt it. But you might be the first to succeed, because no one
    (AFAIK) has managed it so far.




    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Sun Mar 24 03:51:12 2024
    On 23/03/2024 12:26, bart wrote:
    On 23/03/2024 07:26, James Kuyper wrote:
    bart <bc@freeuk.com> writes:
    On 22/03/2024 17:14, James Kuyper wrote:
    [...]
    If you want to tell a system not only what a program must do, but
    also how it must do it, you need to use a lower-level language than
    C.

    Which one?

    That's up to you. The point is, C is NOT that language.

    I'm asking which /mainstream/ HLL is lower level than C. So specifically ruling out assembly.

    If there is no such choice, then this is the problem: it has to be C or nothing.

    How much of a problem is it, really?

    My field is probably the place where low level programming is most
    ubiquitous. There are plenty of people who use assembly - for good
    reasons or for bad (or for reasons that some people think are good,
    other people think are bad). C is the most common choice.

    Other languages used for small systems embedded programming include C++,
    Ada, Forth, BASIC, Pascal, Lua, and Micropython. Forth is the only one
    that could be argued as lower-level or more "directly translated" than C.

    The trick to writing low-level code in C (or C++) is not to pretend that
    C is a "directly translated" language, or to fight with your compiler.
    It is to learn how to work /with/ your compiler and its optimisations to
    get what you need. Complaining that "LTO broke my code" does not make
    your product work. Arbitrarily disabling optimisations that you feel
    are "bad" or imagine to be non-conforming is just kicking the can down
    the road. You learn what /actually/ works - as guaranteed by the C
    standards, or by your compiler.

    Sometimes that means using compiler-specific or target-specific
    extensions. That's okay. No one ever suggested that pure C-standard C
    code was sufficient for all tasks. C was designed to allow some coding
    to be done in a highly portable and re-usable manner, and also to
    support non-portable systems programming relying on the implementation,
    and this has not changed. When I write code for low-level use on a
    specific microcontroller, I am not writing portable code anyway.

    So what language is lower level than C? GCC C (or clang C, or IAR C for
    the 8051, or any other specific C compiler).

    How would /I/ ensure that after "memset(buffer, 0, sizeof(buffer));"
    that the buffer was really written with zeros? I'd follow it with:

    asm ("" : "+m" (buffer));

    That's a gcc extension, but it will guarantee that the buffer is cleared
    - without any other costs.

    (Alternatively, I'd clear the memory using volatile writes, rather than memset.)


    I don't think anyone seriously wants to switch to assembly for the
    sort of tasks they want to use C for.

    Why not? Assembly provides the kind of control you're looking for; C
    does not. If that kind of control is important to you, you have to find
    a language which provides it. If not assembler or C, what would you use?

    Among non-mainstream ones, my own would fit the bill. Since I write the implementations, I can ensure the compiler doesn't have a mind of its own.

    However if somebody else tried to implement it, then I can't guarantee
    the same behaviour. This would need to somehow be enforced with a
    precise language spec, or mine would need to be a reference
    implementation with a lot of test cases.


    -----------------

    Take this program:

    #include <stdio.h>
    int main(void) {
    goto L;
    0x12345678;
    L:
    printf("Hello, World!\n");
    }

    If I use my compiler, then that 12345678 pattern gets compiled into the binary (because it is loaded into a register then discarded). That means
    I can use that value as a marker or sentinel which can be searched for.

    However no other compiler I tried will do that. If I instead change that line to:

    int a = 0x12345678;

    then a tcc-compiled binary will contain that value. So will lccwin32-compiled (with a warning). But not DMC or gcc.

    If I get rid of the 'goto' , then gcc-O0 will work, but still not DMC or gcc-O3.

    Here I can use `volatile` to ensure that value stays in, but not if I
    put the 'goto' back in!

    It's all too unpredictable.


    The /minimum/ requirements of the compiler are very predictable. The
    details beyond that are not - which is completely as expected. You are
    trying to achieve an effect that cannot be expressed in C, and thus it
    is folly to expect a simple way to achieve it with any C compiler. You
    will find that with many C compilers you can get what you want, but you
    have to write it in a way that suits the compiler. For gcc, you might
    do it by putting a const variable in an explicit linker section using a gcc-specific __attribute__. Maybe you can get it by using a volatile
    and /not/ removing the "goto".

    But if you want to do something that has no semantic meaning in the
    language you are using, you can't expect compilers to support a
    particular way to achieve this!


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Kaz Kylheku@3:633/280.2 to All on Sun Mar 24 03:56:09 2024
    On 2024-03-23, David Brown <david.brown@hesbynett.no> wrote:
    On 23/03/2024 10:20, Richard Kettlewell wrote:
    David Brown <david.brown@hesbynett.no> writes:
    I have tried to explain the reality of what the C standards say in a
    couple of posts (including one that I had not posted before you wrote
    this one). I have tried to make things as clear as possible, and
    hopefully you will see the point.

    If not, then you must accept that you interpret the C standards in a
    different manner from the main compile vendors, as well as some "big
    names" in this group. That is, of course, not proof in itself - but
    you must realise that for practical purposes you need to be aware of
    how others interpret the standard, both for your own coding and for
    the advice or recommendations you give to others.

    Agreed that the ship has sailed on whether LTO is a valid optimization.
    But it’s understandable why someone might reach a different conclusion.

    I /do/ understand why Kaz thinks the way he does. I am just trying to
    show that his interpretation is wrong, so that he can better understand
    what is going on, and how to get the behaviour he wants.

    I'm just looking at what very plain, simple sentences are saying and
    taking it as-is.

    I would be entirely happy to see clearer wording in the standards here,
    or at least some footnotes saying what is allowed or not allowed.

    The wording isn't unclear in any way, though.

    What is needed is equally clear new wording which acknowledges the LTO
    model of program construction that is currently not described.

    That could be done without changing any of the existing wording.
    A new translation phase could be wedged between 7 and 8 stating
    that translation units may be optionally partitioned into subsets,
    and those subsets subject to further semantic analysis and translation, resulting in merged translation units.

    The standard currently presents a reference model that is squarely based
    on traditional technology.

    If you read the Rationale for C89, mostly they were concerned with how different models of linkage treat multiply defined identifers, and
    worked out a common specification that allows programs to be portable
    among those different linkage models.

    Ideas like LTO were not on the radar.

    It would be unreasonable to expect them to guarantee the behaviour of
    code under new standards when the code did not have guaranteed behaviour under the old standards. Using TU boundaries to "get memset to work"
    has never been guaranteed.

    memset is part of the language. It doesn't have to be a function
    in another translation unit that is reached via external linkage.
    The inclusion of <string.h> can bring in an inline or at least static definition. Compilers have treated memset as if it were a built-in
    primitive. That is justified. It is not part of my topic about LTO.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Sun Mar 24 04:58:21 2024
    On 23/03/2024 17:07, Kaz Kylheku wrote:
    On 2024-03-23, David Brown <david.brown@hesbynett.no> wrote:
    On 22/03/2024 20:43, Kaz Kylheku wrote:
    On 2024-03-22, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    Is the "call" instruction *observable behavior* as defined in 5.1.2.3?



    Running a program under a test harness is effectively running a
    different program. Of course it can yield information about the
    original program, but in effect you're linking the program with a
    different set of libraries.

    It's a different program, but the retained translation unit must be the
    same, except that the external references it makes are resolved to
    different entities.

    That is true - /if/ you make the restriction that the translation unit
    is complied completely to linkable machine code or assembly, and that it
    is not changed in any way when it is combined into the new program.
    Such a setup is common in practice, but it is in no way required by the
    C standards and does not apply for more advanced compilation and build
    scenarios.

    Well, it's only not required if you hand-wave away the sentences in
    section 5.

    You can't just do that!

    And it is only required if you read between the lines in section 5 and
    see things that simply are not there. You can't just do that!

    I believe we are at an impasse here, unless someone can think of a new
    point to make.

    One thing I would ask before leaving this - could you take a look at the latest draft for the next C standard after C23?

    <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf>

    Look at the definitions of the "reproducible" and "unsequenced" function
    type attributes in 6.7.13.8. In particular, look at the leeway
    explicitly given to the compiler for re-arranging code in 6.7.13.8.3p6
    and similar examples. Consider how that fits (or fails to fit) with
    your interpretation of the translation phases in section 5.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Sun Mar 24 05:44:42 2024
    bart <bc@freeuk.com> writes:

    On 23/03/2024 07:26, James Kuyper wrote:

    bart <bc@freeuk.com> writes:

    On 22/03/2024 17:14, James Kuyper wrote:

    [...]

    If you want to tell a system not only what a program must do, but
    also how it must do it, you need to use a lower-level language than
    C.

    Which one?

    That's up to you. The point is, C is NOT that language.

    I'm asking which /mainstream/ HLL is lower level than C. So
    specifically ruling out assembly.

    If there is no such choice, then this is the problem: it has to be C
    or nothing.

    If it has to be C or nothing, then it's nothing. Some people might
    not like that, but that's the way it is.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From bart@3:633/280.2 to All on Sun Mar 24 06:58:56 2024
    On 23/03/2024 16:25, David Brown wrote:
    On 23/03/2024 01:09, Kaz Kylheku wrote:
    On 2024-03-22, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

    I'm not aware that any such language exists, at least in the mainstream
    (and I've looked at a *lot* of programming languages). I conclude that
    there just isn't enough demand for that kind of thing.

    I think lack of demand combines with it actually being an extremely difficult task.

    Consider something as simple as "x++;" in C. How could that be
    implemented? Perhaps the cpu has an "increment" instruction. Perhaps
    it has an "add immediate" instruction. Perhaps it needs to load 1 into
    a register, then use an "add" instruction. Perhaps "x" is in memory.
    Some cpus can execute an increment directly on the memory address as an atomic instruction. Some can do so, but only using specific (and more expensive) instructions. Some can't do it at all without locking
    mechanisms and synchronisation loops.

    So what does this user of this mythical LLL expect when he/she writes
    "x++;" ?

    This is not the issue the comes up in the OP (or the issue that was
    assumed as I don't think the OP has clarified).

    There it is not about micro-managing the implementation of x++, but the compiler deciding it isn't needed at all.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From bart@3:633/280.2 to All on Sun Mar 24 08:21:58 2024
    On 23/03/2024 16:51, David Brown wrote:
    On 23/03/2024 12:26, bart wrote:
    On 23/03/2024 07:26, James Kuyper wrote:
    bart <bc@freeuk.com> writes:
    On 22/03/2024 17:14, James Kuyper wrote:
    [...]
    If you want to tell a system not only what a program must do, but
    also how it must do it, you need to use a lower-level language than
    C.

    Which one?

    That's up to you. The point is, C is NOT that language.

    I'm asking which /mainstream/ HLL is lower level than C. So
    specifically ruling out assembly.

    If there is no such choice, then this is the problem: it has to be C
    or nothing.

    How much of a problem is it, really?

    My field is probably the place where low level programming is most ubiquitous. There are plenty of people who use assembly - for good
    reasons or for bad (or for reasons that some people think are good,
    other people think are bad). C is the most common choice.

    Other languages used for small systems embedded programming include C++, Ada, Forth, BASIC, Pascal, Lua, and Micropython. Forth is the only one
    that could be argued as lower-level or more "directly translated" than C.

    Well, Forth is certainly cruder than C (it's barely a language IMO). But
    I don't remember seeing anything in it resembling a type system that corresponds to the 'i8-i64 u8-u64 f32-f64' types typical in current
    hardware. (Imagine trying to create a precisely laid out struct.)

    It is just too weird. I think I'd rather take my chances with C.

    BASIC, ..., Lua, and Micropython.

    Hmm, I think my own scripting language is better at low level than any
    of these. It supports those low-level types for a start. And I can do
    stuff like this:

    println peek(0x40'0000, u16):"m"

    fun peek(addr, t=byte) = makeref(addr, t)^

    This displays 'MZ', the signature of the (low-)loaded EXE image on Windows

    Possibly it is even better than C; is this little program valid (no UB)
    C, even when it is known that the program is low-loaded:

    #include <stdio.h>
    typedef unsigned char byte;

    int main(void) {
    printf("%c%c\n", *(byte*)0x400000, *(byte*)0x400001);
    }

    This works on DMC, tcc, mcc, lccwin, but not gcc because that loads
    programs at high addresses. The problem being that the address involved,
    while belonging to the program, is outside of any C data objects.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From James Kuyper@3:633/280.2 to All on Sun Mar 24 03:51:58 2024
    On 3/23/24 12:07, Kaz Kylheku wrote:
    On 2024-03-23, David Brown <david.brown@hesbynett.no> wrote:
    ....
    That is true - /if/ you make the restriction that the translation unit
    is complied completely to linkable machine code or assembly, and that it
    is not changed in any way when it is combined into the new program.
    Such a setup is common in practice, but it is in no way required by the
    C standards and does not apply for more advanced compilation and build
    scenarios.

    Well, it's only not required if you hand-wave away the sentences in
    section 5.

    Or, you could read the whole of section 5. 5.1.2.3p6 makes it clear that
    all of the other requirements of the standard apply only insofar as the observable behavior of the program is concerned. Any method of achieving observable behavior that matches the behavior that would be permitted if
    the abstract semantics were followed, is permitted, even if the actual semantics producing that behavior are quite different from those specified.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Kaz Kylheku@3:633/280.2 to All on Sun Mar 24 12:23:00 2024
    On 2024-03-23, David Brown <david.brown@hesbynett.no> wrote:
    I believe we are at an impasse here, unless someone can think of a new
    point to make.

    How about a completely different one about a related but separate
    matter (small one).

    It has occurred to me that the definition of "translation unit" is
    lacking a little bit in regard to existing practice. Or that at least
    it could use a footnote:

    "A source file together with all the headers and source files included
    via the preprocessing directive #include is known as a preprocessing
    translation unit."

    But in fact, in actual compilers we can do something like this:

    gcc -DMAIN='int main(void) { puts("hello"); }'

    and then in the source file we can have

    #include <stdio.h>
    MAIN

    the point is that a translation unit tokens can come from sources
    other than a source file and its included header files.

    Say we have:

    printf '#include <stdio.h>\nMAIN\n' | \
    gcc -DMAIN='int main(void) { puts("hello"); }' -x c -

    How we can subject this to a standard-based interpretation is
    to identify the output of printf piped into gcc, as well as
    the -DMAIN option, as being the "source file".

    "The text of the program is kept in units called source files, (or
    preprocessing files) in this document."

    Thus the unit in which we are keeping the source in the above shell
    script is identifiable as the content of the pipe, and the symbol MAIN.
    It is understood that the MAIN symbol precedes the content of the pipe.
    Those things together are the "source file".

    This is all fine, but could benefit from a foot note like "A source file
    need not be a single data unit accessible by name in a file system. Implementations may allow situations such as source code dynamically
    generated, transmitted to the translator via an interprocess
    communication mechanism or network. Furthermore, implementations may
    allow some tokens of a translation unit to be injected via a
    configuraton mechanism, such as command line arguments."

    One thing I would ask before leaving this - could you take a look at the latest draft for the next C standard after C23?

    <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf>

    Thanks, I'm now using that, discontinuing most use of n3096.

    Look at the definitions of the "reproducible" and "unsequenced" function type attributes in 6.7.13.8. In particular, look at the leeway
    explicitly given to the compiler for re-arranging code in 6.7.13.8.3p6
    and similar examples. Consider how that fits (or fails to fit) with
    your interpretation of the tranSlation phases in section 5.

    These are intersting and useful attributes. They are ortoghonal to the translation unit issue though.

    If we declare that a function in another translation unit is
    reproducible, and we call it twice with the same arguments, then
    two calls need not take place.

    That is not anything like LTO: the function attributes which drives
    those semantic possibilities comes from the same translation unit.

    If a function is attributed as "reproducible" or "unsequenced" in another translation unit, such that this is not visible to our current
    translation unit (the header file declaration for the function omits
    the attributes), then it looks like an ordinary function. If we call
    it twice, it gets called twice.

    There is no conflict between the semantics of these advanced attributes,
    and the claim that LTO is nonconforming.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Kaz Kylheku@3:633/280.2 to All on Sun Mar 24 16:50:44 2024
    On 2024-03-23, James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
    On 3/23/24 12:07, Kaz Kylheku wrote:
    On 2024-03-23, David Brown <david.brown@hesbynett.no> wrote:
    ...
    That is true - /if/ you make the restriction that the translation unit
    is complied completely to linkable machine code or assembly, and that it >>> is not changed in any way when it is combined into the new program.
    Such a setup is common in practice, but it is in no way required by the >>> C standards and does not apply for more advanced compilation and build
    scenarios.

    Well, it's only not required if you hand-wave away the sentences in
    section 5.

    Or, you could read the whole of section 5. 5.1.2.3p6 makes it clear that
    all of the other requirements of the standard apply only insofar as the

    Aha, so you agree there are requirements, just that the behavior they
    imply can be achieved without them being followed in every detail.

    observable behavior of the program is concerned.

    I believe what you're referring to is now in 5.1.2.4¶6 in N3220.

    Yes, you make the excellent point.

    If we make any claim about conformance, it has to be rooted in
    observable behavior, which is the determiner of conformance.

    But we will not find that problem in LTO. If any empirical test of a LTO implementation shows that there is a difference in the ISO C observable behavior of a strictly conforming program, that LTO implementation
    obviously has a bug, not LTO itself. (So why bother looking.) I mean,
    the absolute baseline requirement any LTO implementor strives toward is
    no change in observable behavior in a strictly conforming program, which
    would be a showstopper.

    At best we may be able to say that if those requirements with regard
    to translation phase 7 and 9 separation are assiduously followed, the implementation belongs to a certain identifiable class, which is
    suitable for certain purposes (or for certain ways of expressing those
    purposes in a program). Certain techniques will be reliable that
    would otherwise be not. However, since it is something not reflected in observable behavior (as defined in ISO C), the class division does not
    land along the line of conforming versus non-conforming.

    Any method of achieving observable behavior that matches the behavior
    that would be permitted if the abstract semantics were followed, is permitted, even if the actual semantics producing that behavior are
    quite different from those specified.

    I've never lost sight of that; however, in this case somehow,
    there is something different.

    The problem is that that requirements in question, that I have
    been concerned about, are not in fact necessary, the first place, for establishing what the observable behavior is.

    It's not the case that the requirements are necesary, but then
    another path can be found to that observable behavior.

    That is to say, the description of translation phase 7 (for the purposes
    of observable behavior and conformance) could as well say that "the
    tokens are smantically analyzed and translated, possibly with the help
    of access to any truth whatsoever related to the entire program's
    observable behavior, by means of a magic oracle." As long as no
    falsehood in relation to observable behavior is relied upon by mistake,
    all is well as far as ensuring the right observable behavior,
    which is synonymous with conforming.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Mon Mar 25 00:21:32 2024
    On 24/03/2024 06:50, Kaz Kylheku wrote:
    On 2024-03-23, James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
    On 3/23/24 12:07, Kaz Kylheku wrote:
    On 2024-03-23, David Brown <david.brown@hesbynett.no> wrote:
    ...
    That is true - /if/ you make the restriction that the translation unit >>>> is complied completely to linkable machine code or assembly, and that it >>>> is not changed in any way when it is combined into the new program.
    Such a setup is common in practice, but it is in no way required by the >>>> C standards and does not apply for more advanced compilation and build >>>> scenarios.

    Well, it's only not required if you hand-wave away the sentences in
    section 5.

    Or, you could read the whole of section 5. 5.1.2.3p6 makes it clear that
    all of the other requirements of the standard apply only insofar as the

    Aha, so you agree there are requirements, just that the behavior they
    imply can be achieved without them being followed in every detail.

    observable behavior of the program is concerned.

    I believe what you're referring to is now in 5.1.2.4¶6 in N3220.

    Yes. Usually the C standards committee try to avoid inserting sections
    and the resulting changes in numbering, but they have, for some reason,
    given the first paragraph of 5.1.2 its own section number in n3220 and
    bumped everything down a step.


    Yes, you make the excellent point.

    If we make any claim about conformance, it has to be rooted in
    observable behavior, which is the determiner of conformance.

    Agreed.


    But we will not find that problem in LTO. If any empirical test of a LTO implementation shows that there is a difference in the ISO C observable behavior of a strictly conforming program, that LTO implementation
    obviously has a bug, not LTO itself.

    Yes. Any optimisation that changes the observable behaviour of a
    program (other than amongst alternative correct behaviours - sometimes
    there are several for the same input, as a result of unspecified
    behaviours) is invalid as an optimisation. (I am assuming the program
    does not execute any undefined behaviour - otherwise all bets are off.)

    This applies to all optimisations and to the compilation itself - optimisations don't get to change the observable behaviour. Equally,
    any re-arrangement of code or other effects of the compiler that don't
    change the observable behaviour are perfectly valid and don't imply non-conformity.

    (So why bother looking.) I mean,
    the absolute baseline requirement any LTO implementor strives toward is
    no change in observable behavior in a strictly conforming program, which would be a showstopper.


    Yes.

    I don't believe anyone - except you - has said anything otherwise. A C implementation is conforming if and only if it takes any correct C
    source code and generates a program image that always has correct
    observable behaviour when no undefined behaviour is executed. There are
    no extra imaginary requirements to be conforming, such as not being
    allowed to use extra information while compiling translation units.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Mon Mar 25 00:42:00 2024
    On 23/03/2024 20:58, bart wrote:
    On 23/03/2024 16:25, David Brown wrote:
    On 23/03/2024 01:09, Kaz Kylheku wrote:
    On 2024-03-22, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

    I'm not aware that any such language exists, at least in the mainstream >>>> (and I've looked at a *lot* of programming languages). I conclude that >>>> there just isn't enough demand for that kind of thing.

    I think lack of demand combines with it actually being an extremely
    difficult task.

    Consider something as simple as "x++;" in C. How could that be
    implemented? Perhaps the cpu has an "increment" instruction. Perhaps
    it has an "add immediate" instruction. Perhaps it needs to load 1
    into a register, then use an "add" instruction. Perhaps "x" is in
    memory. Some cpus can execute an increment directly on the memory
    address as an atomic instruction. Some can do so, but only using
    specific (and more expensive) instructions. Some can't do it at all
    without locking mechanisms and synchronisation loops.

    So what does this user of this mythical LLL expect when he/she writes
    "x++;" ?

    This is not the issue the comes up in the OP (or the issue that was
    assumed as I don't think the OP has clarified).


    That is trivially true. I was picking a simple example and showing how difficult it is to try to define a language where "the compiler does
    exactly what I tell it to do". If it is that difficult to define the programmer's precise expectation of the behaviour of "x++;" at the
    lowest level, how could we hope to do it with anything like the OP's case?

    It sounds easy to make lists of expected behaviour, like Kaz did and
    like you no doubt have (at least in your head, if not written down) for
    your own low-level language. Such lists are totally subjective, and
    thus inappropriate for general languages usable by a range of people for
    a range of tasks.

    There it is not about micro-managing the implementation of x++, but the compiler deciding it isn't needed at all.


    First you have to decide /exactly/ what you mean by "x++;", before you
    can decide if it is valid to remove it or not.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Mon Mar 25 01:22:25 2024
    On Sat, 23 Mar 2024 11:26:03 +0000
    bart <bc@freeuk.com> wrote:

    On 23/03/2024 07:26, James Kuyper wrote:
    bart <bc@freeuk.com> writes:
    On 22/03/2024 17:14, James Kuyper wrote:
    [...]
    If you want to tell a system not only what a program must do, but
    also how it must do it, you need to use a lower-level language
    than C.

    Which one?

    That's up to you. The point is, C is NOT that language.

    I'm asking which /mainstream/ HLL is lower level than C. So
    specifically ruling out assembly.


    Do you want mainstream of today or mainstream of the past also count?
    For later, I'd think that PL/M and BLISS are lower level than C.
    But I know neither so could be wrong.
    https://en.wikipedia.org/wiki/PL/M
    https://en.wikipedia.org/wiki/BLISS

    Ada also allows certain degree of control on how things done, but I
    am not sure that control is tighter than in C. I would think that in
    majority of situations Ada's 'as if' rules are similar to C.




    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Mon Mar 25 01:26:41 2024
    On Sat, 23 Mar 2024 11:26:03 +0000
    bart <bc@freeuk.com> wrote:

    On 23/03/2024 07:26, James Kuyper wrote:
    bart <bc@freeuk.com> writes:
    On 22/03/2024 17:14, James Kuyper wrote:
    [...]
    If you want to tell a system not only what a program must do, but
    also how it must do it, you need to use a lower-level language
    than C.

    Which one?

    That's up to you. The point is, C is NOT that language.

    I'm asking which /mainstream/ HLL is lower level than C. So
    specifically ruling out assembly.

    If there is no such choice, then this is the problem: it has to be C
    or nothing.

    I don't think anyone seriously wants to switch to assembly for the
    sort of tasks they want to use C for.

    Why not? Assembly provides the kind of control you're looking for; C
    does not. If that kind of control is important to you, you have to
    find a language which provides it. If not assembler or C, what
    would you use?

    Among non-mainstream ones, my own would fit the bill. Since I write
    the implementations, I can ensure the compiler doesn't have a mind of
    its own.

    However if somebody else tried to implement it, then I can't
    guarantee the same behaviour. This would need to somehow be enforced
    with a precise language spec, or mine would need to be a reference implementation with a lot of test cases.


    -----------------

    Take this program:

    #include <stdio.h>
    int main(void) {
    goto L;
    0x12345678;
    L:
    printf("Hello, World!\n");
    }

    If I use my compiler, then that 12345678 pattern gets compiled into
    the binary (because it is loaded into a register then discarded).
    That means I can use that value as a marker or sentinel which can be
    searched for.


    Does it apply to your aarch64 compiler as well?


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Mon Mar 25 01:52:39 2024
    On 23/03/2024 22:21, bart wrote:
    On 23/03/2024 16:51, David Brown wrote:
    On 23/03/2024 12:26, bart wrote:
    On 23/03/2024 07:26, James Kuyper wrote:
    bart <bc@freeuk.com> writes:
    On 22/03/2024 17:14, James Kuyper wrote:
    [...]
    If you want to tell a system not only what a program must do, but
    also how it must do it, you need to use a lower-level language than >>>>>> C.

    Which one?

    That's up to you. The point is, C is NOT that language.

    I'm asking which /mainstream/ HLL is lower level than C. So
    specifically ruling out assembly.

    If there is no such choice, then this is the problem: it has to be C
    or nothing.

    How much of a problem is it, really?

    My field is probably the place where low level programming is most
    ubiquitous. There are plenty of people who use assembly - for good
    reasons or for bad (or for reasons that some people think are good,
    other people think are bad). C is the most common choice.

    Other languages used for small systems embedded programming include
    C++, Ada, Forth, BASIC, Pascal, Lua, and Micropython. Forth is the
    only one that could be argued as lower-level or more "directly
    translated" than C.

    Well, Forth is certainly cruder than C (it's barely a language IMO). But
    I don't remember seeing anything in it resembling a type system that corresponds to the 'i8-i64 u8-u64 f32-f64' types typical in current hardware. (Imagine trying to create a precisely laid out struct.)

    Forth can be considered a typeless language - you deal with cells (or
    double cells, etc.), which have contents but not types. And you can
    define structs with specific layouts quite easily. (Note that I've
    never tried this myself - my Forth experience is /very/ limited, and you
    will get much more accurate information in comp.lang.forth or another
    place Forth experts hang out.)

    A key thing you miss, in comparison to C, is the type checking and the structured identifier syntax.

    In C, if you have :

    struct foo {
    int32_t x;
    int8_t y;
    uint16_t z;
    };

    struct foo obj;

    obj.x = obj.y + obj.z;

    then you access the fields as "obj.x", etc. Your struct may or may not
    have padding, depending on the target and compiler (or compiler-specific extensions). If "obj2" is an object of a different type, then "obj2.x"
    might be a different field or a compile-time error if that type has no
    field "x".


    In Forth, you write (again, I could be inaccurate here) :

    struct
    4 field >x
    1 field >y
    2 field >z
    constant /foo

    The names - including the punctuation (punctuation characters can be
    freely used in identifiers in Forth) - are arbitrary. This is
    equivalent to :

    : >x 0 + ;
    : >y 4 + ;
    : >z 5 + ;
    : /foo 7 ;

    You make your instance "obj" by :

    create obj /foo allot

    which makes "obj" the address of a block of 7 bytes - but does not give
    it a type in any sense. ("/foo" simply means "7").

    The equivalent of "obj.x = obj.y + obj.z" would be :

    obj >y c@ obj >z w@ + obj >x l!

    That is :

    1. Put the address of obj on the stack.
    2. Add 4 to it (the definition of >y)
    3. Use that as an address and fetch the 8-bit value from that address,
    putting it on the stack.
    4. Put the address of obj on the stack.
    5. Add 5 to it (the definition of >z)
    6. Use that as an address and fetch the 16-bit value from that address, putting it on the stack.
    7. Add the top two values from the stack and put the result on the stack.
    8. Put the address of obj on the stack.
    9. Add 0 to it (the definition of >x)
    10. Use that as an address and store the 32-bit value from the top of
    the stack to that address.


    I'm assuming this Forth uses 32-bit stack cells, and ignoring
    signed/unsigned issues for simplicity. There are, after all, better
    places to find Forth tutorials for the details.


    At no point is the definition of the struct type attached to "obj". In
    fact, there is no struct type - there's just some defined words for
    adding offsets to an address (or adding those values to anything else).
    You can just as well write "10 >y ." to do "printf("%i", 10 + 4);".


    There's therefore also no connection between the field accessor words
    and the type, or any requirement that they are only used with the right
    kind of object. On the other hand, suppose you wanted to dispense with storing the field "x" and calculate it as "p->y + p->z" every time you
    needed it. In C, you'd write:

    int32_t calc_x(const struct foo * p) { return p->x + p->y; }

    and replace uses of "obj.x" with "calc_x(&obj)".

    In Forth, you might have defined :

    : >x@ >x l@ ;
    : >y@ >y c@ ;
    : >z@ >z w@ ;

    and used >x@ as your accessor for reading obj.x (as "obj >x@") in the
    rest of your code. Now you can remove ">x" from the struct definition
    and write:

    : >x@ dup >y@ over >z@ + ;

    and all your uses of "obj >x@" remain unchanged in the rest of your
    code, but now they calculate x on the fly.


    This is all /way/ off-topic for comp.lang.c, but it's perhaps
    interesting to see a completely different way of doing things in a very different language.

    And note that although Forth is often byte-compiled very directly to
    give you exactly the actions you specify in the source code, it is also sometimes compiled to machine code - using optimisations.


    It is just too weird. I think I'd rather take my chances with C.

    Forth does take some getting used to!


    BASIC, ..., Lua, and Micropython.

    Hmm, I think my own scripting language is better at low level than any
    of these.

    These all have one key advantage over your language - they are real
    languages, available for use by /other/ programmers for development of products.

    It supports those low-level types for a start. And I can do
    stuff like this:

    println peek(0x40'0000, u16):"m"

    fun peek(addr, t=byte) = makeref(addr, t)^

    This displays 'MZ', the signature of the (low-)loaded EXE image on Windows

    Possibly it is even better than C; is this little program valid (no UB)
    C, even when it is known that the program is low-loaded:

    #include <stdio.h>
    typedef unsigned char byte;

    int main(void) {
    printf("%c%c\n", *(byte*)0x400000, *(byte*)0x400001);
    }

    This works on DMC, tcc, mcc, lccwin, but not gcc because that loads
    programs at high addresses. The problem being that the address involved, while belonging to the program, is outside of any C data objects.


    I think you are being quite unreasonable in blaming gcc - or C - for generating code that cannot access that particular arbitrary address!
    The addresses accessible in a program are defined by the OS and the
    target environment, not the language or compiler. And C has a perfectly
    good way of forcing access to addresses - use "volatile".



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Mon Mar 25 02:53:53 2024
    On Sat, 23 Mar 2024 21:21:58 +0000
    bart <bc@freeuk.com> wrote:

    On 23/03/2024 16:51, David Brown wrote:
    On 23/03/2024 12:26, bart wrote: =20
    On 23/03/2024 07:26, James Kuyper wrote: =20
    bart <bc@freeuk.com> writes: =20
    On 22/03/2024 17:14, James Kuyper wrote: =20
    [...] =20
    If you want to tell a system not only what a program must do,
    but also how it must do it, you need to use a lower-level
    language than C. =20

    Which one? =20

    That's up to you. The point is, C is NOT that language. =20

    I'm asking which /mainstream/ HLL is lower level than C. So=20
    specifically ruling out assembly.

    If there is no such choice, then this is the problem: it has to be
    C or nothing. =20
    =20
    How much of a problem is it, really?
    =20
    My field is probably the place where low level programming is most=20 ubiquitous.=C2=A0 There are plenty of people who use assembly - for goo=
    d=20
    reasons or for bad (or for reasons that some people think are good,=20 other people think are bad).=C2=A0 C is the most common choice.
    =20
    Other languages used for small systems embedded programming include
    C++, Ada, Forth, BASIC, Pascal, Lua, and Micropython.=C2=A0 Forth is the only one that could be argued as lower-level or more "directly
    translated" than C. =20
    =20
    Well, Forth is certainly cruder than C (it's barely a language IMO).
    But I don't remember seeing anything in it resembling a type system
    that corresponds to the 'i8-i64 u8-u64 f32-f64' types typical in
    current hardware. (Imagine trying to create a precisely laid out
    struct.)
    =20
    It is just too weird. I think I'd rather take my chances with C.
    =20
    BASIC, ..., Lua, and Micropython. =20
    =20
    Hmm, I think my own scripting language is better at low level than
    any of these. It supports those low-level types for a start. And I
    can do stuff like this:
    =20
    println peek(0x40'0000, u16):"m"
    =20
    fun peek(addr, t=3Dbyte) =3D makeref(addr, t)^
    =20
    This displays 'MZ', the signature of the (low-)loaded EXE image on
    Windows
    =20
    Possibly it is even better than C; is this little program valid (no
    UB) C, even when it is known that the program is low-loaded:
    =20
    #include <stdio.h>
    typedef unsigned char byte;
    =20
    int main(void) {
    printf("%c%c\n", *(byte*)0x400000, *(byte*)0x400001);
    }
    =20
    This works on DMC, tcc, mcc, lccwin, but not gcc because that loads=20 programs at high addresses. The problem being that the address
    involved, while belonging to the program, is outside of any C data
    objects.
    =20
    =20

    #include <stdio.h>
    #include <stddef.h>

    int main(void)
    {
    char* p0 =3D (char*)((size_t)main & -(size_t)0x10000);
    printf("%c%c\n", p0[0], p0[1]);
    return 0;
    }


    That would work for small programs. Not necessarily for bigger
    programs.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Kaz Kylheku@3:633/280.2 to All on Mon Mar 25 03:02:21 2024
    On 2024-03-24, David Brown <david.brown@hesbynett.no> wrote:
    On 24/03/2024 06:50, Kaz Kylheku wrote:
    (So why bother looking.) I mean,
    the absolute baseline requirement any LTO implementor strives toward is
    no change in observable behavior in a strictly conforming program, which
    would be a showstopper.


    Yes.

    I don't believe anyone - except you - has said anything otherwise. A C implementation is conforming if and only if it takes any correct C
    source code and generates a program image that always has correct
    observable behaviour when no undefined behaviour is executed. There are
    no extra imaginary requirements to be conforming, such as not being
    allowed to use extra information while compiling translation units.

    But the requirement isn't imaginary. The "least requirements"
    paragraph doesn't mean that all other requirements are imaginary;
    most of them are necessary to describe the language so that we know
    how to find the observable behavior.

    It takes a modicum of inference to deduce that a certain explicitly
    stated requirement doesn't exist as far as observability/conformance.

    We are clearly not imagining the sentences which describe a classic
    translation and linkage model. The argument that they don't matter
    for conformance is different from the argument that we imagined
    something between the lines. It is the inference based on 5.1.2.4 that
    is between the lines; potentially between any pair of lines anywhere!

    Furthermore, the requirents may matter to other kinds of observability.

    In C programming, we don't always just care about ISO C observability.

    In safety critical coding, we might want to conduct a code review of
    the disassembly of an object file (does it correctly implement the
    intent we believe to be expressed in the source), and then retain that
    exact file until wit needs to be recompiled. If the code is actually a
    an intermediate code that is further translated during linking, that's
    not good; we face the prospect of reviewing potentially the entire image
    each time. Thus we might want an implementation which has a way of
    conforming to the classic linkage model (that happens to be conveniently described).

    We just may not confuse that conformance (private contract between
    implementor and user) with ISO C conformance, as I have.
    Sorry about that!

    What is significant is that the concept has support in ISO C wording.
    Such a contract can just refer to that: "our project requires the
    classic translation and linkage model that arises from the translation
    phases descriptions 7 and 8 being closely followed".
    As long as you have a way to disable LTO (or not enable it), you have
    that.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Mon Mar 25 03:27:12 2024
    On 24/03/2024 17:02, Kaz Kylheku wrote:
    On 2024-03-24, David Brown <david.brown@hesbynett.no> wrote:
    On 24/03/2024 06:50, Kaz Kylheku wrote:
    (So why bother looking.) I mean,
    the absolute baseline requirement any LTO implementor strives toward is
    no change in observable behavior in a strictly conforming program, which >>> would be a showstopper.


    Yes.

    I don't believe anyone - except you - has said anything otherwise. A C
    implementation is conforming if and only if it takes any correct C
    source code and generates a program image that always has correct
    observable behaviour when no undefined behaviour is executed. There are
    no extra imaginary requirements to be conforming, such as not being
    allowed to use extra information while compiling translation units.

    But the requirement isn't imaginary. The "least requirements"
    paragraph doesn't mean that all other requirements are imaginary;
    most of them are necessary to describe the language so that we know
    how to find the observable behavior.


    The text is not imaginary - your reading between the lines /is/. There
    is no rule in the C standards stopping the compiler from using
    additional information or knowledge about other parts of the program.

    It takes a modicum of inference to deduce that a certain explicitly
    stated requirement doesn't exist as far as observability/conformance.

    We are clearly not imagining the sentences which describe a classic translation and linkage model. The argument that they don't matter
    for conformance is different from the argument that we imagined
    something between the lines. It is the inference based on 5.1.2.4 that
    is between the lines; potentially between any pair of lines anywhere!

    Furthermore, the requirents may matter to other kinds of observability.

    In C programming, we don't always just care about ISO C observability.

    I agree on that. The C standards are not the be all and end all of
    things of interest to C programmers. If it were, we'd never have
    compilers with extensions.

    But it /is/ the only thing that matters when you talk about "conforming" compilers.

    If you want to say that LTO breaks some of the requirements that /you/
    have for the way /you/ want to do unit testing, that's absolutely fine.
    If you want to say that this applies to many other C developers, I'd
    prefer to see a bit of evidence or justification for the claim, but I'd
    take it seriously - I fully appreciate that people have needs beyond
    what the C standards give them.

    But that's not what you have been saying. You have been saying that LTO breaks the requirements of the C standards, and you are wrong about that.


    In safety critical coding, we might want to conduct a code review of
    the disassembly of an object file (does it correctly implement the
    intent we believe to be expressed in the source), and then retain that
    exact file until wit needs to be recompiled.

    Sure. And for that reason, some developers in that field will not use
    LTO. I personally don't make much use of LTO because it makes software
    a pain to debug. I do, however, retain the full toolchain used for a
    project, including all build scripts and flags, libraries and compilers,
    and make sure my builds are reproducible on multiple computers - then
    any testing or reviews of the disassembly remain valid over time. With
    LTO, at least some parts may need to be re-validated after a build even
    for source code changes to apparently different parts of the program -
    that is a cost that must be weighed against the benefits of LTO. (I
    have considered doing LTO builds in parallel with non-LTO builds - using
    the LTO builds solely for more advanced static checking, while using the
    more debuggable non-LTO build for the "real" binary.)

    I have agreed that there are many reasons why LTO might not be a good
    choice for any given project. I have merely contended the claim that conformity is such a reason.

    If the code is actually a
    an intermediate code that is further translated during linking, that's
    not good; we face the prospect of reviewing potentially the entire image
    each time. Thus we might want an implementation which has a way of conforming to the classic linkage model (that happens to be conveniently described).

    We just may not confuse that conformance (private contract between implementor and user) with ISO C conformance, as I have.
    Sorry about that!


    Are you saying that after dozens of posts back and forth where you made
    claims about non-conformity of C compilers handling of C code in
    comp.lang.c, with heavy references to the C standards which define the
    term "conformity", you are now saying that you were not talking about C standard conformity?

    What is significant is that the concept has support in ISO C wording.
    Such a contract can just refer to that: "our project requires the
    classic translation and linkage model that arises from the translation
    phases descriptions 7 and 8 being closely followed".
    As long as you have a way to disable LTO (or not enable it), you have
    that.





    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Mon Mar 25 03:45:48 2024
    Richard Kettlewell <invalid@invalid.invalid> writes:

    David Brown <david.brown@hesbynett.no> writes:

    I have tried to explain the reality of what the C standards say in a
    couple of posts (including one that I had not posted before you wrote
    this one). I have tried to make things as clear as possible, and
    hopefully you will see the point.

    If not, then you must accept that you interpret the C standards in a
    different manner from the main compile vendors, as well as some "big
    names" in this group. That is, of course, not proof in itself - but
    you must realise that for practical purposes you need to be aware of
    how others interpret the standard, both for your own coding and for
    the advice or recommendations you give to others.

    Agreed that the ship has sailed on whether LTO is a valid optimization.
    But it's understandable why someone might reach a different conclusion.
    [...]

    Granted that someone might follow reasoning like the comments you
    gave. Even so, some further reflection should be enough for them to
    reconsider their original assessment. In particular, the following:

    "A C program need not all be translated at the same time." This
    excerpt from the Standard implies that C programs may be translated
    in their entirety all at the same time.

    Notice the lead in to section 5.1.1.2 p1, describing translation
    phases, says "The precedence among the syntax rules of translation
    is specified by the following phases." All of phases 1 through 8
    involve translation, but they are about when various forms of
    source recognition take place, not about when code is generated.

    The "semantically analyzed" in translation phase 7 is nothing more
    than type determination and verifying constraints are not violated.
    Nothing about these analyses changes if optimizations are carried
    out in translation phase 8.

    Notice that translation phase 8 says translator output is collected
    into a program image "which contains information needed for
    execution in its execution environment." A reasonable inference
    is that all code generation could occur at the end of translation
    phase 8, as part of producing that information.

    The first two points of paragraph 2 in section 1:
    This International Standard does not specify
    * the mechanism by which C programs are transformed for use
    by a data-processing system;
    * the mechanism by which C programs are invoked for use by a
    data-processing system;

    The key phrase in section 5.1.2.3: "The /least requirements/ on a
    conforming implementation are: [...]" [emphasis added].

    Nothing in the C standard requires an implementation to generate
    executable code. The output of translation phase 7 could be a machine-independent intermediate form. The output of translation
    phase 8 could be the same machine-independent intermediate form.
    Executing the program could be running an interpreter on the program "executable" holding only the machine-independent intermediate
    parts, and the interpreter might carry out optimizations at run
    time. All of these possibilities are allowed in a conforming
    implementation as long as the "least requirements" of 5.1.2.3 are
    met.

    It's a mistake to draw any firm conclusions based on reading parts
    of the standard in isolation. The C standard has been written as a
    cohesive whole, and it's important to understand it in the same way.

    Related to that, although the C standard gives explicit definitions
    for many words and phrases, it also uses words that it does not
    define (and presumably are not defined in any of the normative
    references, though that may be difficult to verify). When
    confronted with one of these non-defined terms, often arguments are
    made that a word means X or Y or Z, because of ... (fill in the
    blank). It's important to remember that, whatever the case is for X
    or Y or Z, what /we/ think doesn't matter; all that does matter is
    what the standard's authors (and members of the ISO C committee)
    think. The C standard means what the ISO C group thinks it means.
    They are the ultimate and sole authority. Any discussion about what
    the C standard requires that ignores that or pretends otherwise is
    a meaningless exercise.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Malcolm McLean@3:633/280.2 to All on Mon Mar 25 04:53:24 2024
    On 24/03/2024 16:45, Tim Rentsch wrote:
    The C standard means what the ISO C group thinks it means.
    They are the ultimate and sole authority. Any discussion about what
    the C standard requires that ignores that or pretends otherwise is
    a meaningless exercise.

    An intentionalist.
    But when a text has come about by a process of argument, negotation and compromise and votes, is that postion so easy to defend as it might
    appear to be for a simpler text?

    --
    Check out Basic Algorithms and my other books: https://www.lulu.com/spotlight/bgy1mm


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From bart@3:633/280.2 to All on Mon Mar 25 05:58:24 2024
    On 24/03/2024 15:53, Michael S wrote:
    On Sat, 23 Mar 2024 21:21:58 +0000
    bart <bc@freeuk.com> wrote:

    On 23/03/2024 16:51, David Brown wrote:
    On 23/03/2024 12:26, bart wrote:
    On 23/03/2024 07:26, James Kuyper wrote:
    bart <bc@freeuk.com> writes:
    On 22/03/2024 17:14, James Kuyper wrote:
    [...]
    If you want to tell a system not only what a program must do,
    but also how it must do it, you need to use a lower-level
    language than C.

    Which one?

    That's up to you. The point is, C is NOT that language.

    I'm asking which /mainstream/ HLL is lower level than C. So
    specifically ruling out assembly.

    If there is no such choice, then this is the problem: it has to be
    C or nothing.

    How much of a problem is it, really?

    My field is probably the place where low level programming is most
    ubiquitous. There are plenty of people who use assembly - for good
    reasons or for bad (or for reasons that some people think are good,
    other people think are bad). C is the most common choice.

    Other languages used for small systems embedded programming include
    C++, Ada, Forth, BASIC, Pascal, Lua, and Micropython. Forth is the
    only one that could be argued as lower-level or more "directly
    translated" than C.

    Well, Forth is certainly cruder than C (it's barely a language IMO).
    But I don't remember seeing anything in it resembling a type system
    that corresponds to the 'i8-i64 u8-u64 f32-f64' types typical in
    current hardware. (Imagine trying to create a precisely laid out
    struct.)

    It is just too weird. I think I'd rather take my chances with C.

    > BASIC, ..., Lua, and Micropython.

    Hmm, I think my own scripting language is better at low level than
    any of these. It supports those low-level types for a start. And I
    can do stuff like this:

    println peek(0x40'0000, u16):"m"

    fun peek(addr, t=byte) = makeref(addr, t)^

    This displays 'MZ', the signature of the (low-)loaded EXE image on
    Windows

    Possibly it is even better than C; is this little program valid (no
    UB) C, even when it is known that the program is low-loaded:

    #include <stdio.h>
    typedef unsigned char byte;

    int main(void) {
    printf("%c%c\n", *(byte*)0x400000, *(byte*)0x400001);
    }

    This works on DMC, tcc, mcc, lccwin, but not gcc because that loads
    programs at high addresses. The problem being that the address
    involved, while belonging to the program, is outside of any C data
    objects.



    #include <stdio.h>
    #include <stddef.h>

    int main(void)
    {
    char* p0 = (char*)((size_t)main & -(size_t)0x10000);
    printf("%c%c\n", p0[0], p0[1]);
    return 0;
    }


    That would work for small programs. Not necessarily for bigger
    programs.


    I'm not sure how that works. Are EXE images always loaded at multiple of
    64KB? I suppose on larger programs it could search backwards 64KB at a
    time (although it could also hit on a rogue 'MZ' in program data).

    My point however was whether C considered that p0[0] access UB because
    it doesn't point into any C data object.

    If so, it would make access to memory-mapped devices or frame-buffers,
    or implementing things like garbage collectors, problematical.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From bart@3:633/280.2 to All on Mon Mar 25 06:12:35 2024
    On 24/03/2024 14:26, Michael S wrote:
    On Sat, 23 Mar 2024 11:26:03 +0000
    bart <bc@freeuk.com> wrote:

    On 23/03/2024 07:26, James Kuyper wrote:
    bart <bc@freeuk.com> writes:
    On 22/03/2024 17:14, James Kuyper wrote:
    [...]
    If you want to tell a system not only what a program must do, but
    also how it must do it, you need to use a lower-level language
    than C.

    Which one?

    That's up to you. The point is, C is NOT that language.

    I'm asking which /mainstream/ HLL is lower level than C. So
    specifically ruling out assembly.

    If there is no such choice, then this is the problem: it has to be C
    or nothing.

    I don't think anyone seriously wants to switch to assembly for the
    sort of tasks they want to use C for.

    Why not? Assembly provides the kind of control you're looking for; C
    does not. If that kind of control is important to you, you have to
    find a language which provides it. If not assembler or C, what
    would you use?

    Among non-mainstream ones, my own would fit the bill. Since I write
    the implementations, I can ensure the compiler doesn't have a mind of
    its own.

    However if somebody else tried to implement it, then I can't
    guarantee the same behaviour. This would need to somehow be enforced
    with a precise language spec, or mine would need to be a reference
    implementation with a lot of test cases.


    -----------------

    Take this program:

    #include <stdio.h>
    int main(void) {
    goto L;
    0x12345678;
    L:
    printf("Hello, World!\n");
    }

    If I use my compiler, then that 12345678 pattern gets compiled into
    the binary (because it is loaded into a register then discarded).
    That means I can use that value as a marker or sentinel which can be
    searched for.


    Does it apply to your aarch64 compiler as well?

    I don't support arm64 as a native C (only via intermediate C). Why, is
    there something peculiar about that architecture?

    I would expect that 0x12345678 pattern to still be in memory but
    probably not in an immediate instruction field. So if wanted to mark a location in the code, I might need a different approach.

    If I ever do directly target that processor, I'll be able to tell you more.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Mon Mar 25 06:33:13 2024
    On Sun, 24 Mar 2024 19:12:35 +0000
    bart <bc@freeuk.com> wrote:

    On 24/03/2024 14:26, Michael S wrote:
    On Sat, 23 Mar 2024 11:26:03 +0000
    bart <bc@freeuk.com> wrote:

    On 23/03/2024 07:26, James Kuyper wrote:
    bart <bc@freeuk.com> writes:
    On 22/03/2024 17:14, James Kuyper wrote:
    [...]
    If you want to tell a system not only what a program must do,
    but also how it must do it, you need to use a lower-level
    language than C.

    Which one?

    That's up to you. The point is, C is NOT that language.

    I'm asking which /mainstream/ HLL is lower level than C. So
    specifically ruling out assembly.

    If there is no such choice, then this is the problem: it has to be
    C or nothing.

    I don't think anyone seriously wants to switch to assembly for
    the sort of tasks they want to use C for.

    Why not? Assembly provides the kind of control you're looking
    for; C does not. If that kind of control is important to you, you
    have to find a language which provides it. If not assembler or C,
    what would you use?

    Among non-mainstream ones, my own would fit the bill. Since I write
    the implementations, I can ensure the compiler doesn't have a mind
    of its own.

    However if somebody else tried to implement it, then I can't
    guarantee the same behaviour. This would need to somehow be
    enforced with a precise language spec, or mine would need to be a
    reference implementation with a lot of test cases.


    -----------------

    Take this program:

    #include <stdio.h>
    int main(void) {
    goto L;
    0x12345678;
    L:
    printf("Hello, World!\n");
    }

    If I use my compiler, then that 12345678 pattern gets compiled into
    the binary (because it is loaded into a register then discarded).
    That means I can use that value as a marker or sentinel which can
    be searched for.


    Does it apply to your aarch64 compiler as well?

    I don't support arm64 as a native C (only via intermediate C). Why,
    is there something peculiar about that architecture?


    Nothing specific to ARM64 in this particular case. The same problem
    would apply to any instruction set with maximal instruction width <= 32
    bits.

    For smaller immediates, ARM64 does have encoding rules that can be
    surprising.

    I would expect that 0x12345678 pattern to still be in memory but
    probably not in an immediate instruction field.

    Not necessarily. Compiler can use several strategies.
    E.g. https://godbolt.org/z/vMcaxcs7G

    So if wanted to mark
    a location in the code, I might need a different approach.


    Exactly.

    If I ever do directly target that processor, I'll be able to tell you
    more.





    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Chris M. Thomasson@3:633/280.2 to All on Mon Mar 25 06:45:45 2024
    On 3/24/2024 9:02 AM, Kaz Kylheku wrote:
    On 2024-03-24, David Brown <david.brown@hesbynett.no> wrote:
    On 24/03/2024 06:50, Kaz Kylheku wrote:
    [...]
    In safety critical coding, we might want to conduct a code review of
    the disassembly of an object file (does it correctly implement the
    intent we believe to be expressed in the source), and then retain that
    exact file until wit needs to be recompiled.

    Before C/C++ 11, I was really worried about a hyper aggressive LTO
    messing around with my special ASM code for thread sync. So I would
    examine the disassembly. One alteration could break it! The problem is
    that it might not break right now... But a year from now wrt running for extended periods of time. If a little shit LTO messed with my externally assembled functions, assembled into an .o and linked in, will, it could
    create a ticking time bomb of very subtle race conditions...

    NOT GOOD!

    [...]

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From bart@3:633/280.2 to All on Mon Mar 25 06:56:02 2024
    On 24/03/2024 14:52, David Brown wrote:
    On 23/03/2024 22:21, bart wrote:

    Well, Forth is certainly cruder than C (it's barely a language IMO).
    But I don't remember seeing anything in it resembling a type system
    that corresponds to the 'i8-i64 u8-u64 f32-f64' types typical in
    current hardware. (Imagine trying to create a precisely laid out struct.)

    Forth can be considered a typeless language - you deal with cells (or
    double cells, etc.), which have contents but not types. And you can
    define structs with specific layouts quite easily. (Note that I've
    never tried this myself - my Forth experience is /very/ limited, and you will get much more accurate information in comp.lang.forth or another
    place Forth experts hang out.)

    A key thing you miss, in comparison to C, is the type checking and the structured identifier syntax.

    In C, if you have :

    struct foo {
    int32_t x;
    int8_t y;
    uint16_t z;
    };

    struct foo obj;

    obj.x = obj.y + obj.z;

    then you access the fields as "obj.x", etc. Your struct may or may not
    have padding, depending on the target and compiler (or compiler-specific extensions). If "obj2" is an object of a different type, then "obj2.x" might be a different field or a compile-time error if that type has no
    field "x".


    In Forth, you write (again, I could be inaccurate here) :

    struct
    4 field >x
    1 field >y
    2 field >z
    constant /foo

    <...>

    Thanks. You've demonstrated perfectly why I would never use Forth. I'd
    rather write in assembly.

    But what people want are the conveniences and familiarity of a HLL,
    without the bloody-mindedness of an optimising C compiler.

    And note that although Forth is often byte-compiled very directly to
    give you exactly the actions you specify in the source code, it is also sometimes compiled to machine code - using optimisations.


    It is just too weird. I think I'd rather take my chances with C.

    Forth does take some getting used to!


    BASIC, ..., Lua, and Micropython.

    Hmm, I think my own scripting language is better at low level than any
    of these.

    These all have one key advantage over your language - they are real languages, available for use by /other/ programmers for development of products.

    My language exists. Anyone is welcome to reimplement elements of the
    design, since most script languages stink at low-level work or dealing
    with FFIs.

    It is not necessary for me to provide a concrete implementation for
    others to use. But here's one expressed as C code for 64-bit Linux:

    https://github.com/sal55/langs/blob/master/qu.c

    Build using:

    > gcc qu.c -oqu -lm -ldl -fno-builtin

    or using:

    > tcc qu.c -o qu -lm -ldl -fdollars-in-identifiers

    Run it like this:

    > ./qu -nosys hello

    'hello.q' should contain something like like 'println "Hello, World"'.

    The -nosys needed as it normally uses a WinAPI-based standard library.

    It can't run the 'peek/MZ' example since EXE layouts on Linux are
    different, and, if using gcc, 0x400000 is an illegal address.

    For something else, try creating test.q:

    type date = struct
    byte d,m
    u16 year
    end

    d := date(24,3,2024)

    println d, date.bytes

    Run as './qu -nosys test'. I don't have docs however. BTW here is your
    Forth example:

    type foo1 = struct
    int32 x
    int8 y
    word16 z
    end

    type foo2 = struct $caligned
    int32 x
    int8 y
    word16 z
    end

    println foo1.bytes
    println foo2.bytes

    There are two versions, one has no automatic padding, which is 7 bytes,
    and the other is 8 bytes in size.

    This works on DMC, tcc, mcc, lccwin, but not gcc because that loads
    programs at high addresses. The problem being that the address
    involved, while belonging to the program, is outside of any C data
    objects.


    I think you are being quite unreasonable in blaming gcc - or C - for generating code that cannot access that particular arbitrary address!

    There were two separate points here. One is that a gcc-compiled version
    won't work because exe images are not loaded at 0x40'0000. The other was
    me speculating whether the access to 0x40'0000, even when valid memory
    for this process, was UB in C.




    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Keith Thompson@3:633/280.2 to All on Mon Mar 25 07:49:43 2024
    bart <bc@freeuk.com> writes:
    [...]
    But what people want are the conveniences and familiarity of a HLL,
    without the bloody-mindedness of an optimising C compiler.
    [...]

    Exactly which people want that?

    The evidence suggests that, while some people undoubtedly want that (and
    it's a perfectly legitimate desire), there isn't enough demand to induce
    anyone to actually produce such a thing and for it to catch on.
    Developers have had decades to define and implement the kind of language
    you're talking about. Why haven't they?

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Working, but not speaking, for Medtronic
    void Void(void) { Void(); } /* The recursive call of the void */

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: None to speak of (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Mon Mar 25 09:38:26 2024
    On 24/03/2024 21:49, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:
    [...]
    But what people want are the conveniences and familiarity of a HLL,
    without the bloody-mindedness of an optimising C compiler.
    [...]

    Exactly which people want that?

    The evidence suggests that, while some people undoubtedly want that (and
    it's a perfectly legitimate desire), there isn't enough demand to induce anyone to actually produce such a thing and for it to catch on.

    I personally think (or speculate, if you feel the word is more
    appropriate given that I have no real evidence) that a major part of
    this is a lack of agreement on what optimisations these people want or
    don't want. I expect Kaz and Bart would agree that they want C
    compilers to be required to generate code that does what they mean it to
    do, and be able to optimise within those requirements to do the required
    job as efficiently as possible. But I expect they would disagree in
    many ways in regard to what they mean by it all - what optimisations are allowed, and what code /really/ means in their eyes.

    The best we can reasonably hope for is for a carefully considered
    document that describes minimum requirements, and for compilers to
    provide flags to allow fine-tuning so that programmers can get the
    results they want.

    And that is /exactly/ what we have with C and quality C compilers.
    Sure, none of this is perfect or an ideal fit for everyone and every
    task - but it is good enough that you'd need to come up with something
    quite extraordinary to make it attractive compared to C.

    Developers have had decades to define and implement the kind of language you're talking about. Why haven't they?



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Mon Mar 25 09:42:03 2024
    On Sun, 24 Mar 2024 13:49:43 -0700
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

    bart <bc@freeuk.com> writes:
    [...]
    But what people want are the conveniences and familiarity of a HLL,
    without the bloody-mindedness of an optimising C compiler.
    [...]

    Exactly which people want that?

    The evidence suggests that, while some people undoubtedly want that
    (and it's a perfectly legitimate desire), there isn't enough demand
    to induce anyone to actually produce such a thing and for it to catch
    on.

    Such things are produced all the time. A yes, they fail to catch on.
    The most recent [half-hearted] attempt that didn't realize yet that it
    has no chance is called zig.

    Developers have had decades to define and implement the kind of
    language you're talking about. Why haven't they?


    Because C is juggernaut?


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Mon Mar 25 09:43:32 2024
    On 24/03/2024 20:56, bart wrote:
    On 24/03/2024 14:52, David Brown wrote:
    On 23/03/2024 22:21, bart wrote:



    This works on DMC, tcc, mcc, lccwin, but not gcc because that loads
    programs at high addresses. The problem being that the address
    involved, while belonging to the program, is outside of any C data
    objects.


    I think you are being quite unreasonable in blaming gcc - or C - for
    generating code that cannot access that particular arbitrary address!

    There were two separate points here. One is that a gcc-compiled version won't work because exe images are not loaded at 0x40'0000.

    I think that is because your gcc toolchain is creating 64-bit Windows binaries, while the others are creating 32-bit binaries. I could be
    wrong here, of course.

    The other was
    me speculating whether the access to 0x40'0000, even when valid memory
    for this process, was UB in C.


    Trying to access non-existent memory is UB, yes. I can't imagine a
    language where such a thing would be anything else than undefined
    behaviour, or defined as a hard run-time error.

    But you can run something with UB if you want - at your own risk,
    because C and the compiler give you no guarantees of what will happen.
    But if you write "x = *(volatile uint8_t *) 0x400000;", then you can
    guarantee that the code will at least /try/ to read that address. What happens depends on the OS, memory protection systems, etc. But it is
    not exactly difficult to do this kind of thing in C - that's why
    "volatile" exists.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From bart@3:633/280.2 to All on Mon Mar 25 10:07:44 2024
    On 24/03/2024 20:49, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:
    [...]
    But what people want are the conveniences and familiarity of a HLL,
    without the bloody-mindedness of an optimising C compiler.
    [...]

    Exactly which people want that?

    The evidence suggests that, while some people undoubtedly want that (and
    it's a perfectly legitimate desire), there isn't enough demand to induce anyone to actually produce such a thing and for it to catch on.
    Developers have had decades to define and implement the kind of language you're talking about. Why haven't they?

    Perhaps many settle for using C but using a lesser C compiler or one
    with optimisation turned off.

    The language still has the UBs of C, whereas people want it more implementation-defined, but a weaker compiler is less likely to leverage
    that UB to do unexpected things or to wreck somebody's expections of how
    a piece of code will behave.

    With the dominance of C it's hard to produce a competing language,
    especially it it looks like C. And people still want all the
    optimisations that are possible without taking advantage of UB of
    needing to delete chunks of the user's code.

    This is probably a similar discussion to that of makefiles; why isn't
    there a competing build system?

    C is often used as an intermediate language by compilers. There is a
    source language where a lot of this stuff is well-defined by its spec.
    There is a known target, or a small range of targets, on which the
    behaviour is also well-defined.

    However, in the middle you have C, where much of it isn't well-defined!
    People want a language which is like the hypothetical one discussed, one
    that doesn't get 'in the way' with its crazy ideas. But now it's either
    going to be a lot work to generate C code that behaves as expected, or
    they have to settle for a non-optimising compiler and cross their fingers.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Mon Mar 25 10:39:47 2024
    On Sun, 24 Mar 2024 23:07:44 +0000
    bart <bc@freeuk.com> wrote:

    On 24/03/2024 20:49, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:
    [...]
    But what people want are the conveniences and familiarity of a HLL,
    without the bloody-mindedness of an optimising C compiler.
    [...]

    Exactly which people want that?

    The evidence suggests that, while some people undoubtedly want that
    (and it's a perfectly legitimate desire), there isn't enough demand
    to induce anyone to actually produce such a thing and for it to
    catch on. Developers have had decades to define and implement the
    kind of language you're talking about. Why haven't they?

    Perhaps many settle for using C but using a lesser C compiler or one
    with optimisation turned off.


    What is "lesser C compiler"?
    Something like IAR ? Yes, people use it.
    Something like TI? People use it when they have no other choice.
    20 years ago there were Diab Data, Kiel and few others. I didn't hear
    about them lately.
    Microchip, I'd guess, still has its own compilers for many of their
    families, but that's because they have to. "Bigger" compilers dont want
    to support this chips.
    On the opposite edge of scale, IBM has compilers for their mainframes
    and for POWER/AIX. The former are used widely. The later are quickly
    losing to "bigger' compilers running on the same platform.
    As to tcc, mcc, lccwin etc... those only used by hobbyists. Never by
    pro. The only "lesser" PC-hosted PC-targeting C compilers that are used
    by significant amount of pro developers are Intel and
    Borland/Embarcadero, the later strictly for historical reasons.
    Embarcadero switched their dev suits to "bigger" compiler quite a few
    years ago, but some people like their old stuff. Well, may be, National Instruments compiler still used? I really don't know.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From bart@3:633/280.2 to All on Mon Mar 25 13:12:40 2024
    On 24/03/2024 23:39, Michael S wrote:
    On Sun, 24 Mar 2024 23:07:44 +0000
    bart <bc@freeuk.com> wrote:

    On 24/03/2024 20:49, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:
    [...]
    But what people want are the conveniences and familiarity of a HLL,
    without the bloody-mindedness of an optimising C compiler.
    [...]

    Exactly which people want that?

    The evidence suggests that, while some people undoubtedly want that
    (and it's a perfectly legitimate desire), there isn't enough demand
    to induce anyone to actually produce such a thing and for it to
    catch on. Developers have had decades to define and implement the
    kind of language you're talking about. Why haven't they?

    Perhaps many settle for using C but using a lesser C compiler or one
    with optimisation turned off.


    What is "lesser C compiler"?
    Something like IAR ? Yes, people use it.
    Something like TI? People use it when they have no other choice.
    20 years ago there were Diab Data, Kiel and few others. I didn't hear
    about them lately.
    Microchip, I'd guess, still has its own compilers for many of their
    families, but that's because they have to. "Bigger" compilers dont want
    to support this chips.
    On the opposite edge of scale, IBM has compilers for their mainframes
    and for POWER/AIX. The former are used widely. The later are quickly
    losing to "bigger' compilers running on the same platform.

    As to tcc, mcc, lccwin etc... those only used by hobbyists.

    AFAIK lccwin can be used commercially.

    And I would recommend tcc especially for transpiled code. Because it can process it very quickly, but also because the code should already be
    verified so it doesn't need deep analysis.

    Further, it doesn't have warnings about obscure, irrelevant matters and
    is unlikely to produce any surprises in the generated code.

    While mcc is my private tool which is my first choice for C code that
    /I/ write or generate.

    The bigger compilers I'm aware of are gcc, clang, and MSVC. Only gcc
    works on my Windows machine. For obscure but related reasons (as clang piggybacks onto MSVC) neither of those two currently work. There is also ZigCC, based around Clang I think, which at least works.

    I've never heard of IAR, TI, Diab Data or Kiel. I've heard of DMC
    (that's another I use, but it's 32-bit), Watcom, Borland, TurboC and Intel.

    Never by
    pro.

    What's a 'pro'? I used to use my in-house languages and compilers for commercial software; the customer paid for the application, not for the language or compiler.

    The only "lesser" PC-hosted PC-targeting C compilers that are used
    by significant amount of pro developers are Intel and
    Borland/Embarcadero, the later strictly for historical reasons.
    Embarcadero switched their dev suits to "bigger" compiler quite a few
    years ago, but some people like their old stuff. Well, may be, National Instruments compiler still used? I really don't know.

    I guess you mean companies using big tools and big ecosystems that need equally big compilers to go with them.

    I mainly use, and develop, small, nippy tools and would rate them above
    above any of the big, glossy ones.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Mon Mar 25 19:37:40 2024
    On 24/03/2024 23:42, Michael S wrote:
    On Sun, 24 Mar 2024 13:49:43 -0700
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

    bart <bc@freeuk.com> writes:
    [...]
    But what people want are the conveniences and familiarity of a HLL,
    without the bloody-mindedness of an optimising C compiler.
    [...]

    Exactly which people want that?

    The evidence suggests that, while some people undoubtedly want that
    (and it's a perfectly legitimate desire), there isn't enough demand
    to induce anyone to actually produce such a thing and for it to catch
    on.

    Such things are produced all the time. A yes, they fail to catch on.
    The most recent [half-hearted] attempt that didn't realize yet that it
    has no chance is called zig.


    Languages like this are usually better in some ways than C (there's
    plenty of scope for that with C - we can do a better job of designing a language now than 50 years ago, not least because we can expect more
    from tools than we could 50 years ago).

    But they can never cover everything people want - people want
    contradictory things. Thus for everyone (except perhaps the language designers themselves) such new languages have big disadvantages as well
    as big advantages, and they will be missing some key features, seen from
    that person's perspective.

    And their execution models are invariably either only vaguely defined,
    or defined in terms of behaviour with an "as-if" rule to allow
    optimisation. Which means they are no better than C for people who
    think compilers should be blind translators. (And it also means that
    they will be no worse than C for people who understand more about
    programming languages and compilers, and for those that either don't
    know or don't care.)


    Zig is a language I've looked at, and it does have some nice things.
    But it will be a /long/ time before it is something I could consider
    using for my work, so it would be hobby only. And of course it has made
    some design decisions that I think are wrong, and are a big step down
    from the current leading alternative to C in many fields - C++.


    Developers have had decades to define and implement the kind of
    language you're talking about. Why haven't they?


    Because C is juggernaut?


    Yes, it has a /huge/ momentum. That means that even if a new language
    comes along that is better than C in every way, it has to be /much/
    better to make it worth the effort to change. Rust is making a fair
    stab at this - it is no easy job.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Mon Mar 25 19:53:32 2024
    On 25/03/2024 00:39, Michael S wrote:
    On Sun, 24 Mar 2024 23:07:44 +0000
    bart <bc@freeuk.com> wrote:

    On 24/03/2024 20:49, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:
    [...]
    But what people want are the conveniences and familiarity of a HLL,
    without the bloody-mindedness of an optimising C compiler.
    [...]

    Exactly which people want that?

    The evidence suggests that, while some people undoubtedly want that
    (and it's a perfectly legitimate desire), there isn't enough demand
    to induce anyone to actually produce such a thing and for it to
    catch on. Developers have had decades to define and implement the
    kind of language you're talking about. Why haven't they?

    Perhaps many settle for using C but using a lesser C compiler or one
    with optimisation turned off.


    What is "lesser C compiler"?
    Something like IAR ? Yes, people use it.

    I am not sure how IAR's tools would count as a "lesser C compiler".
    They make very solid C tools for specific embedded targets. And they
    have lots of the optimisations that some people get worked up about - including, IIRC, whole-program optimisations.

    Something like TI? People use it when they have no other choice.

    TI's C tools are a bit more varied in quality over their range of
    targets. They have a particular bizarre non-conformity that they do not zero-initialise variables that have no explicit initialisation - a fact
    that is documented as a small note in the middle of the manual (for the
    two TI compiler manuals I have read).

    20 years ago there were Diab Data, Kiel and few others. I didn't hear
    about them lately.

    I tried out Diab Data for the 68k some 25 years ago. It was /way/
    better than anything else around, but outside our budget at the time.
    People sometimes complain that type-based alias analysis, or
    optimisations based on the UB of signed integer overflow are somehow
    "new" optimisations by "evil" gcc developers designed to "win
    benchmarks" even though they "break" user code. Diab Data was doing
    this kind of optimisation long before gcc.

    I've never been a fan of Keil. Perhaps it's because their main target
    was the 8051, an architecture that was outdated when it was introduced
    in 1980 and that survived decades longer than it should have. But their compiler is only a "lesser C compiler" in the sense that the target is
    really bad for efficient C, so they have to have a lot of extra keywords
    and extensions to let people get decent results. Thus you program in
    "Keil 8051 C", not standard C. But the compiler does a /lot/ of
    optimisation, including very advanced whole-program optimisation.

    Microchip, I'd guess, still has its own compilers for many of their
    families, but that's because they have to. "Bigger" compilers dont want
    to support this chips.

    I think some of Microchip's old tools are the ones that I used that
    really could be called "lesser C compilers". One I remember had support
    for structs, and support for arrays, but not for arrays of structs or
    structs containing arrays.

    On the opposite edge of scale, IBM has compilers for their mainframes
    and for POWER/AIX. The former are used widely. The later are quickly
    losing to "bigger' compilers running on the same platform.
    As to tcc, mcc, lccwin etc... those only used by hobbyists. Never by
    pro. The only "lesser" PC-hosted PC-targeting C compilers that are used
    by significant amount of pro developers are Intel and
    Borland/Embarcadero, the later strictly for historical reasons.
    Embarcadero switched their dev suits to "bigger" compiler quite a few
    years ago, but some people like their old stuff. Well, may be, National Instruments compiler still used? I really don't know.




    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Mon Mar 25 19:58:37 2024
    On 25/03/2024 03:12, bart wrote:
    On 24/03/2024 23:39, Michael S wrote:
    On Sun, 24 Mar 2024 23:07:44 +0000
    bart <bc@freeuk.com> wrote:

    On 24/03/2024 20:49, Keith Thompson wrote:
    bart <bc@freeuk.com> writes:
    [...]
    But what people want are the conveniences and familiarity of a HLL,
    without the bloody-mindedness of an optimising C compiler.
    [...]

    Exactly which people want that?

    The evidence suggests that, while some people undoubtedly want that
    (and it's a perfectly legitimate desire), there isn't enough demand
    to induce anyone to actually produce such a thing and for it to
    catch on. Developers have had decades to define and implement the
    kind of language you're talking about. Why haven't they?
    Perhaps many settle for using C but using a lesser C compiler or one
    with optimisation turned off.


    What is "lesser C compiler"?
    Something like IAR ? Yes, people use it.
    Something like TI? People use it when they have no other choice.
    20 years ago there were Diab Data, Kiel and few others. I didn't hear
    about them lately.
    Microchip, I'd guess, still has its own compilers for many of their
    families, but that's because they have to. "Bigger" compilers dont want
    to support this chips.
    On the opposite edge of scale, IBM has compilers for their mainframes
    and for POWER/AIX. The former are used widely. The later are quickly
    losing to "bigger' compilers running on the same platform.

    As to tcc, mcc, lccwin etc... those only used by hobbyists.

    AFAIK lccwin can be used commercially.

    "/Can/ be used commercially" does not imply "/is/ used professionally".
    I'm sure there are some people who use it in their work, but I would
    expect that in any statistics about compiler usage, it would be in the
    "Others < 0.1%" category.

    I guess you mean companies using big tools and big ecosystems that need equally big compilers to go with them.

    I mainly use, and develop, small, nippy tools and would rate them above above any of the big, glossy ones.


    Then you use a different rating system than the vast majority of professionals. That, of course, is your free choice to make - just
    don't be surprised when others disagree with you.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Mon Mar 25 22:04:50 2024
    On Sun, 24 Mar 2024 18:58:24 +0000
    bart <bc@freeuk.com> wrote:

    On 24/03/2024 15:53, Michael S wrote:
    On Sat, 23 Mar 2024 21:21:58 +0000
    bart <bc@freeuk.com> wrote:
    =20
    On 23/03/2024 16:51, David Brown wrote: =20
    On 23/03/2024 12:26, bart wrote: =20
    On 23/03/2024 07:26, James Kuyper wrote: =20
    bart <bc@freeuk.com> writes: =20
    On 22/03/2024 17:14, James Kuyper wrote: =20
    [...] =20
    If you want to tell a system not only what a program must do,
    but also how it must do it, you need to use a lower-level
    language than C. =20

    Which one? =20

    That's up to you. The point is, C is NOT that language. =20

    I'm asking which /mainstream/ HLL is lower level than C. So
    specifically ruling out assembly.

    If there is no such choice, then this is the problem: it has to
    be C or nothing. =20

    How much of a problem is it, really?

    My field is probably the place where low level programming is most
    ubiquitous.=C2=A0 There are plenty of people who use assembly - for
    good reasons or for bad (or for reasons that some people think
    are good, other people think are bad).=C2=A0 C is the most common
    choice.

    Other languages used for small systems embedded programming
    include C++, Ada, Forth, BASIC, Pascal, Lua, and Micropython.
    Forth is the only one that could be argued as lower-level or more
    "directly translated" than C. =20

    Well, Forth is certainly cruder than C (it's barely a language
    IMO). But I don't remember seeing anything in it resembling a type
    system that corresponds to the 'i8-i64 u8-u64 f32-f64' types
    typical in current hardware. (Imagine trying to create a precisely
    laid out struct.)

    It is just too weird. I think I'd rather take my chances with C.
    =20
    > BASIC, ..., Lua, and Micropython. =20

    Hmm, I think my own scripting language is better at low level than
    any of these. It supports those low-level types for a start. And I
    can do stuff like this:

    println peek(0x40'0000, u16):"m"

    fun peek(addr, t=3Dbyte) =3D makeref(addr, t)^

    This displays 'MZ', the signature of the (low-)loaded EXE image on
    Windows

    Possibly it is even better than C; is this little program valid (no
    UB) C, even when it is known that the program is low-loaded:

    #include <stdio.h>
    typedef unsigned char byte;

    int main(void) {
    printf("%c%c\n", *(byte*)0x400000, *(byte*)0x400001);
    }

    This works on DMC, tcc, mcc, lccwin, but not gcc because that loads
    programs at high addresses. The problem being that the address
    involved, while belonging to the program, is outside of any C data
    objects.

    =20
    =20
    #include <stdio.h>
    #include <stddef.h>
    =20
    int main(void)
    {
    char* p0 =3D (char*)((size_t)main & -(size_t)0x10000);
    printf("%c%c\n", p0[0], p0[1]);
    return 0;
    }
    =20
    =20
    That would work for small programs. Not necessarily for bigger
    programs.
    =20
    =20
    I'm not sure how that works.

    Neither do I.

    Are EXE images always loaded at multiple
    of 64KB? I suppose on larger programs it could search backwards 64KB
    at a time (although it could also hit on a rogue 'MZ' in program
    data).
    =20
    My point however was whether C considered that p0[0] access UB
    because it doesn't point into any C data object.


    Well, C does not even have to run on von Neumann computers. Harward or
    even less obvious targets than Harward are both legal and used in
    practice. So, of course, any use of code object as data object is UB.

    The code below is not UB. IMHO. But it is not portable outside of
    compilers based on gcc infrastructure. Probably, not portable outside of Windows OS, too. But who needs to look for 'MZ' outside Windows?

    #include <stdio.h>
    extern char __image_base__[];
    int main(void)
    {
    printf("%c%c\n", __image_base__[0], __image_base__[1]);
    return 0;
    }


    If so, it would make access to memory-mapped devices or
    frame-buffers, or implementing things like garbage collectors,
    problematical.

    Implementation-defined linker tricks like the one above can be used
    (or should I say "are used" ?) for implementing dynamic memory. I don't
    see why they can't be used to implement GC. They are not portable, but
    I don't believe that they are UB. At least not as long as one does not
    consider data races.










    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Mon Mar 25 22:16:28 2024
    On Sun, 24 Mar 2024 23:43:32 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    I could be wrong here, of course.


    It seems, you are.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Mon Mar 25 22:24:24 2024
    On Mon, 25 Mar 2024 13:04:50 +0200
    Michael S <already5chosen@yahoo.com> wrote:


    extern char __image_base__[];


    This symbol is a little more portable. It works both for 32b and
    64b, both for gcc link infrastructure and for MSVC link infrastructure.

    extern char __ImageBase[];






    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Mon Mar 25 23:26:01 2024
    On 25/03/2024 12:16, Michael S wrote:
    On Sun, 24 Mar 2024 23:43:32 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    I could be wrong here, of course.


    It seems, you are.


    It happens - and it was not unexpected here, as I said. I don't have
    all these compilers installed to test.

    But it would be helpful if you had a /little/ more information. If you
    don't know why some compilers generate binaries that have memory mapped
    at 0x400000, and others do not, fair enough. I am curious, but it's not
    at all important.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Tue Mar 26 00:11:17 2024
    On Mon, 25 Mar 2024 13:26:01 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    On 25/03/2024 12:16, Michael S wrote:
    On Sun, 24 Mar 2024 23:43:32 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    I could be wrong here, of course.


    It seems, you are.


    It happens - and it was not unexpected here, as I said. I don't have
    all these compilers installed to test.

    But it would be helpful if you had a /little/ more information. If
    you don't know why some compilers generate binaries that have memory
    mapped at 0x400000, and others do not, fair enough. I am curious,
    but it's not at all important.


    I am not an expert, but it does not look like the problem is directly
    related to compiler or linker. All 32-bit Windows compilers/linkers,
    including gcc, clang and MSVC, by default put symbol ___ImageBase at
    address 4 MB. However loader relocates it to wherever it wants,
    typically much higher.
    I don't know for sure why loader does it to images generated by gcc,
    clang and MSVC and does not do it to images generated by lccwin and
    others, but I have an educated guess: most likely, these other compilers
    link by default with an option similar to Microsoft's /Fixed https://learn.microsoft.com/en-us/cpp/build/reference/fixed-fixed-base-address?view=msvc-170

    The option disables ASLR and thus can shorten app load time and make performance just a little snappier. Still, I wouldn't make it default.

    To get similar behavior with [32-bit] MSVC user can specify '/linker
    /fixed' on the command line. I don't know how to do it with gcc variant supplied with msys2. But, I'd guess, if you google for long enough, you
    can find it.






    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Malcolm McLean@3:633/280.2 to All on Tue Mar 26 00:26:32 2024
    On 25/03/2024 08:58, David Brown wrote:
    On 25/03/2024 03:12, bart wrote:
    On 24/03/2024 23:39, Michael S wrote:

    As to tcc, mcc, lccwin etc... those only used by hobbyists.

    AFAIK lccwin can be used commercially.

    "/Can/ be used commercially" does not imply "/is/ used professionally".
    I'm sure there are some people who use it in their work, but I would
    expect that in any statistics about compiler usage, it would be in the "Others < 0.1%" category.


    lccwin is used to compile C functions with an interface which maes them callable from Matlab. Whilst I haven't written Matalb code commercially
    and it would be rare to do so, I have written Matlab ocdoe
    professionally, and that is quite common. I probably also made rather
    heavier use of the C interfaces than was really justified.

    My Matlab File Exchange submissions are still going strong. But the gem,
    the faded bar chart, hasn't been valued, and hasn't attracted any stars. Matlab users, download and give some love.

    --
    Check out Basic Algorithms and my other books: https://www.lulu.com/spotlight/bgy1mm


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Tue Mar 26 00:43:38 2024
    On Mon, 25 Mar 2024 13:26:32 +0000
    Malcolm McLean <malcolm.arthur.mclean@gmail.com> wrote:

    On 25/03/2024 08:58, David Brown wrote:
    On 25/03/2024 03:12, bart wrote:
    On 24/03/2024 23:39, Michael S wrote:

    As to tcc, mcc, lccwin etc... those only used by hobbyists.

    AFAIK lccwin can be used commercially.

    "/Can/ be used commercially" does not imply "/is/ used
    professionally". I'm sure there are some people who use it in their
    work, but I would expect that in any statistics about compiler
    usage, it would be in the "Others < 0.1%" category.


    lccwin is used to compile C functions with an interface which maes
    them callable from Matlab.

    On which platform?
    By which percentage of people that write C functions callable from
    Matlab?

    Whilst I haven't written Matalb code
    commercially and it would be rare to do so, I have written Matlab
    ocdoe professionally, and that is quite common. I probably also made
    rather heavier use of the C interfaces than was really justified.


    Back in DOS days, I don't remember lccwin among compilers that
    Mathworks recognized as allowed to build MEX modules. In fact, I don't
    remember existence of lccwin.
    On Win64 the point is mute, because there exist an unified platform ABI
    that everybody follow. But back in DOS and Win32 days official
    Mathworks blessing was important.

    My Matlab File Exchange submissions are still going strong. But the
    gem, the faded bar chart, hasn't been valued, and hasn't attracted
    any stars. Matlab users, download and give some love.




    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Tue Mar 26 02:17:15 2024
    On 24/03/2024 19:58, bart wrote:
    On 24/03/2024 15:53, Michael S wrote:

    #include <stdio.h>
    #include <stddef.h>

    int main(void)
    {
    char* p0 = (char*)((size_t)main & -(size_t)0x10000);
    printf("%c%c\n", p0[0], p0[1]);
    return 0;
    }


    That would work for small programs. Not necessarily for bigger
    programs.


    I'm not sure how that works. Are EXE images always loaded at multiple of 64KB? I suppose on larger programs it could search backwards 64KB at a
    time (although it could also hit on a rogue 'MZ' in program data).

    My point however was whether C considered that p0[0] access UB because
    it doesn't point into any C data object.

    As it stands in the code, I believe it is undefined behaviour.


    If so, it would make access to memory-mapped devices or frame-buffers,
    or implementing things like garbage collectors, problematical.

    As I wrote (more than once), the C way to handle this is "volatile".
    Volatile accesses are implementation-defined, and are "observable
    behaviour" - therefore the compiler generates code that makes exactly
    the read and write accesses you ask for when they are done using a volatile-qualified lvalue.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Tue Mar 26 02:30:01 2024
    On 25/03/2024 14:11, Michael S wrote:
    On Mon, 25 Mar 2024 13:26:01 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    On 25/03/2024 12:16, Michael S wrote:
    On Sun, 24 Mar 2024 23:43:32 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    I could be wrong here, of course.


    It seems, you are.


    It happens - and it was not unexpected here, as I said. I don't have
    all these compilers installed to test.

    But it would be helpful if you had a /little/ more information. If
    you don't know why some compilers generate binaries that have memory
    mapped at 0x400000, and others do not, fair enough. I am curious,
    but it's not at all important.


    I am not an expert, but it does not look like the problem is directly
    related to compiler or linker. All 32-bit Windows compilers/linkers,

    (I get the impression that at least some of the compilers in question
    are 64-bit, but I can't be sure, especially as Bart is using them on
    Windows while I mainly use Linux for development.)

    including gcc, clang and MSVC, by default put symbol ___ImageBase at
    address 4 MB. However loader relocates it to wherever it wants,
    typically much higher.

    OK.

    Is this a kind of address randomisation thing? That would make sense,
    since such randomisation makes various kinds of attacks a good deal
    harder, and hardening against attacks is something gcc and binutils (I'm guessing that's the linker involved here) take seriously.

    I don't know for sure why loader does it to images generated by gcc,
    clang and MSVC and does not do it to images generated by lccwin and
    others, but I have an educated guess: most likely, these other compilers
    link by default with an option similar to Microsoft's /Fixed https://learn.microsoft.com/en-us/cpp/build/reference/fixed-fixed-base-address?view=msvc-170

    The option disables ASLR and thus can shorten app load time and make performance just a little snappier. Still, I wouldn't make it default.


    Maybe it makes things easier for some kinds of code generation? It
    might reduce the need for position-independent code.

    Most of my work is in small-systems embedded programming, where you do
    not normally have an MMU and addresses tend to be fixed. (There are
    occasions when you want to be able to load code at different addresses,
    but then you need to make sure it is generated as position-independent.)
    So while I know more than most programmers about linking on those
    systems, it is in a specific area of programming. It's not something I
    have looked at for Windows at all.

    To get similar behavior with [32-bit] MSVC user can specify '/linker
    /fixed' on the command line. I don't know how to do it with gcc variant supplied with msys2. But, I'd guess, if you google for long enough, you
    can find it.


    It is presumably a linker flag, rather than a gcc flag. And I don't
    know if Bart's gcc setup is using the common binutils linker, or
    something else.

    Thanks for the information, anyway.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Keith Thompson@3:633/280.2 to All on Tue Mar 26 02:54:09 2024
    Michael S <already5chosen@yahoo.com> writes:
    On Sun, 24 Mar 2024 13:49:43 -0700
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    bart <bc@freeuk.com> writes:
    [...]
    But what people want are the conveniences and familiarity of a HLL,
    without the bloody-mindedness of an optimising C compiler.
    [...]

    Exactly which people want that?

    The evidence suggests that, while some people undoubtedly want that
    (and it's a perfectly legitimate desire), there isn't enough demand
    to induce anyone to actually produce such a thing and for it to catch
    on.

    Such things are produced all the time. A yes, they fail to catch on.
    The most recent [half-hearted] attempt that didn't realize yet that it
    has no chance is called zig.

    Does Zig have those characteristics because its language definition say
    so, or because there's a single implementation that happens to work that
    way? I took a quick look at the documentation and didn't see anything definitive.

    Developers have had decades to define and implement the kind of
    language you're talking about. Why haven't they?


    Because C is juggernaut?

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Working, but not speaking, for Medtronic
    void Void(void) { Void(); } /* The recursive call of the void */

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: None to speak of (3:633/280.2@fidonet)
  • From bart@3:633/280.2 to All on Tue Mar 26 03:06:24 2024
    On 25/03/2024 12:26, David Brown wrote:
    On 25/03/2024 12:16, Michael S wrote:
    On Sun, 24 Mar 2024 23:43:32 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    I could be wrong here, of course.


    It seems, you are.


    It happens - and it was not unexpected here, as I said. I don't have
    all these compilers installed to test.

    But it would be helpful if you had a /little/ more information. If you don't know why some compilers generate binaries that have memory mapped
    at 0x400000, and others do not, fair enough. I am curious, but it's not
    at all important.


    In the PE EXE format, the default image load base is specified in a
    special header in the file:

    Magic: 20B
    Link version: 1.0
    Code size: 512 200
    Idata size: 1024 400
    Zdata size: 512
    Entry point: 4096 1000 in data:0
    Code base: 4096
    Image base: 4194304 400000
    Section align: 4096

    By convention it is at 0x40'0000 (I've no idea why).

    More recently, dynamic loading, regardless of what it says in the PE
    header, has become popular with linkers. So, while there is still a
    fixed value in the Image Base file, which might be 0x140000000, it gets
    loaded at some random address, usually in high memory above 2GB.

    I don't know what's responsible for that, but presumably the OS must be
    in on the act.

    To make this possible, both for loading above 2GB, and for loading at an address not known by the linker, the code inside the EXE must be position-independent, and have relocation info for any absolute 64-bit
    static addresses. 32-bit static addresses won't work.

    If I take this C program:

    #include <stdio.h>
    int main(void) {
    printf("%p\n", main);
    }

    This shows 0000000000401000 when compiled with mcc or tcc, or
    0000000000401020 with lccwin32 (the exact address of 'main' relative to
    the image base will vary). With DMC (32 bits) it's 0040210. All load at 0x400000.

    With gcc, it shows: 00007ff6e63a1591.

    Dynamic loading can be disabled by passing --disable-dynamicbase to ld,
    then it might show something like 0000000140001000, which corresponds to
    the default Image Base file in the EXE header

    Not dynamic, but still high.

    (My compilers, both for C and M, did not generate code suitable for high-loading until a few months ago. That didn't matter since the EXEs
    loaded at the fixed 0x400000 adddress. But it can matter for DLL files
    and will do for OBJ files, since the latter would need to use an
    external linker.

    So if I do this with a mix of mcc and gcc:

    C:\c>mcc test -c
    Compiling test.c to test.obj

    C:\c>gcc test.obj

    C:\c>a
    00007FF613311540

    I get the same high-loaded address. I don't think that Tiny C has that
    support yet for high-loading code.)

    To summarise: the high-loading is not directly to do with compilers, but
    the program that generates the EXE. But the compiler does need to
    generate code that could be loaded high if needed.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From bart@3:633/280.2 to All on Tue Mar 26 03:39:23 2024
    On 25/03/2024 13:11, Michael S wrote:
    On Mon, 25 Mar 2024 13:26:01 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    On 25/03/2024 12:16, Michael S wrote:
    On Sun, 24 Mar 2024 23:43:32 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    I could be wrong here, of course.


    It seems, you are.


    It happens - and it was not unexpected here, as I said. I don't have
    all these compilers installed to test.

    But it would be helpful if you had a /little/ more information. If
    you don't know why some compilers generate binaries that have memory
    mapped at 0x400000, and others do not, fair enough. I am curious,
    but it's not at all important.


    I am not an expert, but it does not look like the problem is directly
    related to compiler or linker. All 32-bit Windows compilers/linkers, including gcc, clang and MSVC, by default put symbol ___ImageBase at
    address 4 MB. However loader relocates it to wherever it wants,
    typically much higher.
    I don't know for sure why loader does it to images generated by gcc,
    clang and MSVC and does not do it to images generated by lccwin and
    others, but I have an educated guess: most likely, these other compilers
    link by default with an option similar to Microsoft's /Fixed https://learn.microsoft.com/en-us/cpp/build/reference/fixed-fixed-base-address?view=msvc-170

    It's all up to the options written to the EXE file headers.

    By setting the same options (plus generating base-reloc tables, plus
    ensuring the code can run above 2GB), I can get the EXEs written by my
    two compilers (for C and for my language) to be loaded at a high address
    too.

    My compilers don't use a linker.

    Some of those options are normally used only for DLLs; they would need
    to be set for EXEs too.

    This was just an experiment; I will try adding it as a formal option to
    each compiler.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Tue Mar 26 03:51:18 2024
    On Mon, 25 Mar 2024 16:06:24 +0000
    bart <bc@freeuk.com> wrote:

    On 25/03/2024 12:26, David Brown wrote:
    On 25/03/2024 12:16, Michael S wrote: =20
    On Sun, 24 Mar 2024 23:43:32 +0100
    David Brown <david.brown@hesbynett.no> wrote: =20

    I could be=C2=A0 wrong here, of course.
    =20

    It seems, you are.
    =20
    =20
    It happens - and it was not unexpected here, as I said.=C2=A0 I don't
    have all these compilers installed to test.
    =20
    But it would be helpful if you had a /little/ more information.=C2=A0 If you don't know why some compilers generate binaries that have
    memory mapped at 0x400000, and others do not, fair enough.=C2=A0 I am curious, but it's not at all important.
    =20
    =20
    In the PE EXE format, the default image load base is specified in a=20 special header in the file:
    =20
    Magic: 20B
    Link version: 1.0
    Code size: 512 200
    Idata size: 1024 400
    Zdata size: 512
    Entry point: 4096 1000 in data:0
    Code base: 4096
    Image base: 4194304 400000
    Section align: 4096
    =20
    By convention it is at 0x40'0000 (I've no idea why).
    =20
    More recently, dynamic loading, regardless of what it says in the PE=20 header, has become popular with linkers. So, while there is still a=20
    fixed value in the Image Base file, which might be 0x140000000, it
    gets loaded at some random address, usually in high memory above 2GB.
    =20
    I don't know what's responsible for that, but presumably the OS must
    be in on the act.
    =20
    To make this possible, both for loading above 2GB, and for loading at
    an address not known by the linker, the code inside the EXE must be=20 position-independent, and have relocation info for any absolute
    64-bit static addresses. 32-bit static addresses won't work.


    I don't understand why you say that EXE must be position-independent.
    I never learned PE format in depth (and learned only absolute minimum of
    elf, just enough to be able to load images in simple embedded
    scenario), but my impression always was that PE EXE contains plenty of relocation info for a loader, so it (loader) can modify (I think
    professional argot uses the word 'fix') non-PIC at load time to run at
    any chosen position.
    Am I wrong about it?




    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Scott Lurndal@3:633/280.2 to All on Tue Mar 26 04:24:31 2024
    Reply-To: slp53@pacbell.net

    David Brown <david.brown@hesbynett.no> writes:
    On 25/03/2024 00:39, Michael S wrote:

    I tried out Diab Data for the 68k some 25 years ago. It was /way/
    better than anything else around, but outside our budget at the time.

    We used them for our 88k based systems in the early 90', they were,
    as you say, way better than anything else at the time (Moto
    was shipping a version of PCC, and gcc was rather primitive).


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: UsenetServer - www.usenetserver.com (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Tue Mar 26 05:07:35 2024
    On 25/03/2024 17:06, bart wrote:
    On 25/03/2024 12:26, David Brown wrote:
    On 25/03/2024 12:16, Michael S wrote:
    On Sun, 24 Mar 2024 23:43:32 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    I could be wrong here, of course.


    It seems, you are.


    It happens - and it was not unexpected here, as I said. I don't have
    all these compilers installed to test.

    But it would be helpful if you had a /little/ more information. If
    you don't know why some compilers generate binaries that have memory
    mapped at 0x400000, and others do not, fair enough. I am curious, but
    it's not at all important.


    In the PE EXE format, the default image load base is specified in a
    special header in the file:

    Magic: 20B
    Link version: 1.0
    Code size: 512 200
    Idata size: 1024 400
    Zdata size: 512
    Entry point: 4096 1000 in data:0
    Code base: 4096
    Image base: 4194304 400000
    Section align: 4096

    By convention it is at 0x40'0000 (I've no idea why).

    More recently, dynamic loading, regardless of what it says in the PE
    header, has become popular with linkers. So, while there is still a
    fixed value in the Image Base file, which might be 0x140000000, it gets loaded at some random address, usually in high memory above 2GB.

    I don't know what's responsible for that, but presumably the OS must be
    in on the act.

    To make this possible, both for loading above 2GB, and for loading at an address not known by the linker, the code inside the EXE must be position-independent, and have relocation info for any absolute 64-bit static addresses. 32-bit static addresses won't work.

    If I take this C program:

    #include <stdio.h>
    int main(void) {
    printf("%p\n", main);
    }

    This shows 0000000000401000 when compiled with mcc or tcc, or 0000000000401020 with lccwin32 (the exact address of 'main' relative to
    the image base will vary). With DMC (32 bits) it's 0040210. All load at 0x400000.

    With gcc, it shows: 00007ff6e63a1591.

    Dynamic loading can be disabled by passing --disable-dynamicbase to ld,
    then it might show something like 0000000140001000, which corresponds to
    the default Image Base file in the EXE header

    Not dynamic, but still high.

    (My compilers, both for C and M, did not generate code suitable for high-loading until a few months ago. That didn't matter since the EXEs loaded at the fixed 0x400000 adddress. But it can matter for DLL files
    and will do for OBJ files, since the latter would need to use an
    external linker.

    So if I do this with a mix of mcc and gcc:

    C:\c>mcc test -c
    Compiling test.c to test.obj

    C:\c>gcc test.obj

    C:\c>a
    00007FF613311540

    I get the same high-loaded address. I don't think that Tiny C has that support yet for high-loading code.)

    To summarise: the high-loading is not directly to do with compilers, but
    the program that generates the EXE. But the compiler does need to
    generate code that could be loaded high if needed.

    Thanks for that explanation - it fills in some blanks in my understanding.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From bart@3:633/280.2 to All on Tue Mar 26 05:10:23 2024
    On 25/03/2024 16:51, Michael S wrote:
    On Mon, 25 Mar 2024 16:06:24 +0000
    bart <bc@freeuk.com> wrote:

    On 25/03/2024 12:26, David Brown wrote:
    On 25/03/2024 12:16, Michael S wrote:
    On Sun, 24 Mar 2024 23:43:32 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    I could be wrong here, of course.


    It seems, you are.


    It happens - and it was not unexpected here, as I said. I don't
    have all these compilers installed to test.

    But it would be helpful if you had a /little/ more information. If
    you don't know why some compilers generate binaries that have
    memory mapped at 0x400000, and others do not, fair enough. I am
    curious, but it's not at all important.


    In the PE EXE format, the default image load base is specified in a
    special header in the file:

    Magic: 20B
    Link version: 1.0
    Code size: 512 200
    Idata size: 1024 400
    Zdata size: 512
    Entry point: 4096 1000 in data:0
    Code base: 4096
    Image base: 4194304 400000
    Section align: 4096

    By convention it is at 0x40'0000 (I've no idea why).

    More recently, dynamic loading, regardless of what it says in the PE
    header, has become popular with linkers. So, while there is still a
    fixed value in the Image Base file, which might be 0x140000000, it
    gets loaded at some random address, usually in high memory above 2GB.

    I don't know what's responsible for that, but presumably the OS must
    be in on the act.

    To make this possible, both for loading above 2GB, and for loading at
    an address not known by the linker, the code inside the EXE must be
    position-independent, and have relocation info for any absolute
    64-bit static addresses. 32-bit static addresses won't work.


    I don't understand why you say that EXE must be position-independent.
    I never learned PE format in depth (and learned only absolute minimum of
    elf, just enough to be able to load images in simple embedded
    scenario), but my impression always was that PE EXE contains plenty of relocation info for a loader, so it (loader) can modify (I think
    professional argot uses the word 'fix') non-PIC at load time to run at
    any chosen position.
    Am I wrong about it?


    A PE EXE designed to run only at the image base given won't be position-independent, so it can't be moved anywwhere else.

    There isn't enough info to make it possible, especially before position-independent addressing modes for x64 came along (that is, using offset to the RIP intruction pointer instead of 32-bit absolute addresses).

    Take this C program:

    int abc;
    int* ptr = &abc;

    int main(void) {
    int x;
    x = abc;
    }

    Some of the assembly generated is this:

    abc: resb 4

    ptr: dq abc
    ...
    mov eax, [abc]

    That last reference is an absolute 32-bit address, for example it might
    have address 0x00403000 when loaded at 0x400000.

    If the program is instead loaded at 0x78230000, there is no reloc info
    to tell it that that particular 32-bit value, plus the 64-bit field initialising ptr, must be adjusted.

    RIP-relative addressing (I think sometimes called PIC), can fix that
    second reference:

    mov eax, [rip:abc]

    But it only works for code, not data; that initialisation is still absolute.

    When a DLL is generated instead, those will need to be moved (to avoid multiple DLLs all based at the same address). In that case,
    base-relocation tables are needed: a list of addresses that contain a
    field that needs relocating, and what type and size of reloc is needed.

    The same info is needed for EXE if it contains flags saying that the EXE
    could be loaded at an arbitrary adddress.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Tue Mar 26 07:01:40 2024
    On 25/03/2024 19:10, bart wrote:
    On 25/03/2024 16:51, Michael S wrote:
    On Mon, 25 Mar 2024 16:06:24 +0000
    bart <bc@freeuk.com> wrote:

    On 25/03/2024 12:26, David Brown wrote:
    On 25/03/2024 12:16, Michael S wrote:
    On Sun, 24 Mar 2024 23:43:32 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    I could be wrong here, of course.

    It seems, you are.

    It happens - and it was not unexpected here, as I said. I don't
    have all these compilers installed to test.

    But it would be helpful if you had a /little/ more information. If
    you don't know why some compilers generate binaries that have
    memory mapped at 0x400000, and others do not, fair enough. I am
    curious, but it's not at all important.

    In the PE EXE format, the default image load base is specified in a
    special header in the file:

    Magic: 20B
    Link version: 1.0
    Code size: 512 200
    Idata size: 1024 400
    Zdata size: 512
    Entry point: 4096 1000 in data:0
    Code base: 4096
    Image base: 4194304 400000
    Section align: 4096

    By convention it is at 0x40'0000 (I've no idea why).

    More recently, dynamic loading, regardless of what it says in the PE
    header, has become popular with linkers. So, while there is still a
    fixed value in the Image Base file, which might be 0x140000000, it
    gets loaded at some random address, usually in high memory above 2GB.

    I don't know what's responsible for that, but presumably the OS must
    be in on the act.

    To make this possible, both for loading above 2GB, and for loading at
    an address not known by the linker, the code inside the EXE must be
    position-independent, and have relocation info for any absolute
    64-bit static addresses. 32-bit static addresses won't work.


    I don't understand why you say that EXE must be position-independent.
    I never learned PE format in depth (and learned only absolute minimum of
    elf, just enough to be able to load images in simple embedded
    scenario), but my impression always was that PE EXE contains plenty of
    relocation info for a loader, so it (loader) can modify (I think
    professional argot uses the word 'fix') non-PIC at load time to run at
    any chosen position.
    Am I wrong about it?


    A PE EXE designed to run only at the image base given won't be position-independent, so it can't be moved anywwhere else.

    There isn't enough info to make it possible, especially before position-independent addressing modes for x64 came along (that is, using offset to the RIP intruction pointer instead of 32-bit absolute addresses).

    Take this C program:

    int abc;
    int* ptr = &abc;

    int main(void) {
    int x;
    x = abc;
    }

    Some of the assembly generated is this:

    abc: resb 4

    ptr: dq abc
    ...
    mov eax, [abc]

    That last reference is an absolute 32-bit address, for example it might
    have address 0x00403000 when loaded at 0x400000.

    If the program is instead loaded at 0x78230000, there is no reloc info
    to tell it that that particular 32-bit value, plus the 64-bit field initialising ptr, must be adjusted.

    RIP-relative addressing (I think sometimes called PIC), can fix that
    second reference:

    mov eax, [rip:abc]

    But it only works for code, not data; that initialisation is still
    absolute.

    When a DLL is generated instead, those will need to be moved (to avoid multiple DLLs all based at the same address). In that case,
    base-relocation tables are needed: a list of addresses that contain a
    field that needs relocating, and what type and size of reloc is needed.

    The same info is needed for EXE if it contains flags saying that the EXE could be loaded at an arbitrary adddress.


    I have a few comments about this. One is that PIC is "Position
    Independent Code", while PID is "Position Independent Data". Enabling
    PIC on a compiler may imply PID as well, or they may be independent.
    This can all cause significant run-time costs as access to non-local
    data and functions has at least one extra layer of indirection - though
    doing it via a register like RIP reduces that overhead quite a bit.

    An alternative method is to have the linker generate a file that
    contains the executable before the final linking, and a link relocation
    table. This is similar to a linkable object file - a reference to the
    address of the variable "abc" would be replaced by 0x00000000 in the
    machine code, and an entry in the relocation table would say "fill in
    the address of abc at position Y from the start of the code section".

    Then the program is loaded into memory by a link-loader that fills these
    blank fields, just the same way as a static linker does when generating
    the image. This complicates the loading mechanism and makes it slower
    to start code, but it runs faster.

    There are other ways to do things, possibly combinations of these.

    I believe the COFF format, which is the base for Windows executable
    formats, supports such relocation tables. That does not mean that they
    are supported or used on Windows, of course. You know more about what
    is actually used in PE format files than I do.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Tue Mar 26 08:05:01 2024
    On Mon, 25 Mar 2024 18:10:23 +0000
    bart <bc@freeuk.com> wrote:

    On 25/03/2024 16:51, Michael S wrote:
    On Mon, 25 Mar 2024 16:06:24 +0000
    bart <bc@freeuk.com> wrote:
    =20
    On 25/03/2024 12:26, David Brown wrote: =20
    On 25/03/2024 12:16, Michael S wrote: =20
    On Sun, 24 Mar 2024 23:43:32 +0100
    David Brown <david.brown@hesbynett.no> wrote: =20

    I could be=A0 wrong here, of course.
    =20

    It seems, you are.
    =20

    It happens - and it was not unexpected here, as I said.=A0 I don't
    have all these compilers installed to test.

    But it would be helpful if you had a /little/ more information.
    If you don't know why some compilers generate binaries that have
    memory mapped at 0x400000, and others do not, fair enough.=A0 I am
    curious, but it's not at all important.
    =20

    In the PE EXE format, the default image load base is specified in a
    special header in the file:

    Magic: 20B
    Link version: 1.0
    Code size: 512 200
    Idata size: 1024 400
    Zdata size: 512
    Entry point: 4096 1000 in data:0
    Code base: 4096
    Image base: 4194304 400000
    Section align: 4096

    By convention it is at 0x40'0000 (I've no idea why).

    More recently, dynamic loading, regardless of what it says in the
    PE header, has become popular with linkers. So, while there is
    still a fixed value in the Image Base file, which might be
    0x140000000, it gets loaded at some random address, usually in
    high memory above 2GB.

    I don't know what's responsible for that, but presumably the OS
    must be in on the act.

    To make this possible, both for loading above 2GB, and for loading
    at an address not known by the linker, the code inside the EXE
    must be position-independent, and have relocation info for any
    absolute 64-bit static addresses. 32-bit static addresses won't
    work.=20
    =20
    I don't understand why you say that EXE must be
    position-independent. I never learned PE format in depth (and
    learned only absolute minimum of elf, just enough to be able to
    load images in simple embedded scenario), but my impression always
    was that PE EXE contains plenty of relocation info for a loader, so
    it (loader) can modify (I think professional argot uses the word
    'fix') non-PIC at load time to run at any chosen position.
    Am I wrong about it? =20
    =20
    =20
    A PE EXE designed to run only at the image base given won't be=20 position-independent, so it can't be moved anywwhere else.
    =20
    There isn't enough info to make it possible, especially before=20 position-independent addressing modes for x64 came along (that is,
    using offset to the RIP intruction pointer instead of 32-bit absolute addresses).
    =20
    Take this C program:
    =20
    int abc;
    int* ptr =3D &abc;
    =20
    int main(void) {
    int x;
    x =3D abc;
    }
    =20
    Some of the assembly generated is this:
    =20
    abc: resb 4
    =20
    ptr: dq abc
    ...
    mov eax, [abc]
    =20
    That last reference is an absolute 32-bit address, for example it
    might have address 0x00403000 when loaded at 0x400000.
    =20
    If the program is instead loaded at 0x78230000, there is no reloc
    info to tell it that that particular 32-bit value, plus the 64-bit
    field initialising ptr, must be adjusted.
    =20
    RIP-relative addressing (I think sometimes called PIC), can fix that=20 second reference:
    =20
    mov eax, [rip:abc]
    =20
    But it only works for code, not data; that initialisation is still
    absolute.
    =20
    When a DLL is generated instead, those will need to be moved (to
    avoid multiple DLLs all based at the same address). In that case,=20 base-relocation tables are needed: a list of addresses that contain a=20 field that needs relocating, and what type and size of reloc is
    needed.
    =20
    The same info is needed for EXE if it contains flags saying that the
    EXE could be loaded at an arbitrary adddress.
    =20

    Your explanation exactly matches what I was imagining.
    The technology for relocation of non-PIC code is already here, in file
    format definitions and in OS loader code. The linker or the part of
    compiler that serves the role of linker can decide to not generate
    required tables. Operation in such mode will have small benefits in EXE
    size and in quicker load time, but IMHO nowadays it should be used
    rarely, only in special situations rather than serve as a default of the
    tool.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From bart@3:633/280.2 to All on Tue Mar 26 08:25:27 2024
    On 25/03/2024 21:05, Michael S wrote:
    On Mon, 25 Mar 2024 18:10:23 +0000
    bart <bc@freeuk.com> wrote:

    On 25/03/2024 16:51, Michael S wrote:
    On Mon, 25 Mar 2024 16:06:24 +0000
    bart <bc@freeuk.com> wrote:

    On 25/03/2024 12:26, David Brown wrote:
    On 25/03/2024 12:16, Michael S wrote:
    On Sun, 24 Mar 2024 23:43:32 +0100
    David Brown <david.brown@hesbynett.no> wrote:

    I could be wrong here, of course.


    It seems, you are.


    It happens - and it was not unexpected here, as I said. I don't
    have all these compilers installed to test.

    But it would be helpful if you had a /little/ more information.
    If you don't know why some compilers generate binaries that have
    memory mapped at 0x400000, and others do not, fair enough. I am
    curious, but it's not at all important.


    In the PE EXE format, the default image load base is specified in a
    special header in the file:

    Magic: 20B
    Link version: 1.0
    Code size: 512 200
    Idata size: 1024 400
    Zdata size: 512
    Entry point: 4096 1000 in data:0
    Code base: 4096
    Image base: 4194304 400000
    Section align: 4096

    By convention it is at 0x40'0000 (I've no idea why).

    More recently, dynamic loading, regardless of what it says in the
    PE header, has become popular with linkers. So, while there is
    still a fixed value in the Image Base file, which might be
    0x140000000, it gets loaded at some random address, usually in
    high memory above 2GB.

    I don't know what's responsible for that, but presumably the OS
    must be in on the act.

    To make this possible, both for loading above 2GB, and for loading
    at an address not known by the linker, the code inside the EXE
    must be position-independent, and have relocation info for any
    absolute 64-bit static addresses. 32-bit static addresses won't
    work.

    I don't understand why you say that EXE must be
    position-independent. I never learned PE format in depth (and
    learned only absolute minimum of elf, just enough to be able to
    load images in simple embedded scenario), but my impression always
    was that PE EXE contains plenty of relocation info for a loader, so
    it (loader) can modify (I think professional argot uses the word
    'fix') non-PIC at load time to run at any chosen position.
    Am I wrong about it?


    A PE EXE designed to run only at the image base given won't be
    position-independent, so it can't be moved anywwhere else.

    There isn't enough info to make it possible, especially before
    position-independent addressing modes for x64 came along (that is,
    using offset to the RIP intruction pointer instead of 32-bit absolute
    addresses).

    Take this C program:

    int abc;
    int* ptr = &abc;

    int main(void) {
    int x;
    x = abc;
    }

    Some of the assembly generated is this:

    abc: resb 4

    ptr: dq abc
    ...
    mov eax, [abc]

    That last reference is an absolute 32-bit address, for example it
    might have address 0x00403000 when loaded at 0x400000.

    If the program is instead loaded at 0x78230000, there is no reloc
    info to tell it that that particular 32-bit value, plus the 64-bit
    field initialising ptr, must be adjusted.

    RIP-relative addressing (I think sometimes called PIC), can fix that
    second reference:

    mov eax, [rip:abc]

    But it only works for code, not data; that initialisation is still
    absolute.

    When a DLL is generated instead, those will need to be moved (to
    avoid multiple DLLs all based at the same address). In that case,
    base-relocation tables are needed: a list of addresses that contain a
    field that needs relocating, and what type and size of reloc is
    needed.

    The same info is needed for EXE if it contains flags saying that the
    EXE could be loaded at an arbitrary adddress.


    Your explanation exactly matches what I was imagining.
    The technology for relocation of non-PIC code is already here, in file
    format definitions and in OS loader code. The linker or the part of
    compiler that serves the role of linker can decide to not generate
    required tables. Operation in such mode will have small benefits in EXE
    size and in quicker load time, but IMHO nowadays it should be used
    rarely, only in special situations rather than serve as a default of the tool.

    There are two aspects to be considered:

    * Relocating a program to a different address below 2GB

    * Relocating a program to any address including above 2GB

    The first can be accommodated with tables derived from the reloc info of object files.

    But the second requires compiler cooperation in generating code that
    will work above 2GB.

    Part of that can be done with RIP-relative address modes as I touched
    on. But not all; RIP-relative won't work here:

    movsx rax, dword [i]
    mov rax, [rbx*8 + abc]

    where the address works with registers. This requires something like:

    lea rcx, [rip:abc] # or mov rcx, abc (64-bit abs addr)
    mov rax, [rbx*8 + rcx]

    This is specific to x64, but other processors will have their issues.
    Like ARM64 which doesn't even have the 32-bit displayment used with rip
    here.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Michael S@3:633/280.2 to All on Tue Mar 26 10:31:03 2024
    On Mon, 25 Mar 2024 21:25:27 +0000
    bart <bc@freeuk.com> wrote:

    On 25/03/2024 21:05, Michael S wrote:
    On Mon, 25 Mar 2024 18:10:23 +0000
    bart <bc@freeuk.com> wrote:
    =20
    On 25/03/2024 16:51, Michael S wrote: =20
    On Mon, 25 Mar 2024 16:06:24 +0000
    bart <bc@freeuk.com> wrote:
    =20
    On 25/03/2024 12:26, David Brown wrote: =20
    On 25/03/2024 12:16, Michael S wrote: =20
    On Sun, 24 Mar 2024 23:43:32 +0100
    David Brown <david.brown@hesbynett.no> wrote: =20

    I could be=A0 wrong here, of course.
    =20

    It seems, you are.
    =20

    It happens - and it was not unexpected here, as I said.=A0 I don't >>>>> have all these compilers installed to test.

    But it would be helpful if you had a /little/ more information.
    If you don't know why some compilers generate binaries that have
    memory mapped at 0x400000, and others do not, fair enough.=A0 I am >>>>> curious, but it's not at all important.
    =20

    In the PE EXE format, the default image load base is specified
    in a special header in the file:

    Magic: 20B
    Link version: 1.0
    Code size: 512 200
    Idata size: 1024 400
    Zdata size: 512
    Entry point: 4096 1000 in data:0
    Code base: 4096
    Image base: 4194304 400000
    Section align: 4096

    By convention it is at 0x40'0000 (I've no idea why).

    More recently, dynamic loading, regardless of what it says in the
    PE header, has become popular with linkers. So, while there is
    still a fixed value in the Image Base file, which might be
    0x140000000, it gets loaded at some random address, usually in
    high memory above 2GB.

    I don't know what's responsible for that, but presumably the OS
    must be in on the act.

    To make this possible, both for loading above 2GB, and for
    loading at an address not known by the linker, the code inside
    the EXE must be position-independent, and have relocation info
    for any absolute 64-bit static addresses. 32-bit static
    addresses won't work. =20

    I don't understand why you say that EXE must be
    position-independent. I never learned PE format in depth (and
    learned only absolute minimum of elf, just enough to be able to
    load images in simple embedded scenario), but my impression always
    was that PE EXE contains plenty of relocation info for a loader,
    so it (loader) can modify (I think professional argot uses the
    word 'fix') non-PIC at load time to run at any chosen position.
    Am I wrong about it? =20


    A PE EXE designed to run only at the image base given won't be
    position-independent, so it can't be moved anywwhere else.

    There isn't enough info to make it possible, especially before
    position-independent addressing modes for x64 came along (that is,
    using offset to the RIP intruction pointer instead of 32-bit
    absolute addresses).

    Take this C program:

    int abc;
    int* ptr =3D &abc;

    int main(void) {
    int x;
    x =3D abc;
    }

    Some of the assembly generated is this:

    abc: resb 4

    ptr: dq abc
    ...
    mov eax, [abc]

    That last reference is an absolute 32-bit address, for example it
    might have address 0x00403000 when loaded at 0x400000.

    If the program is instead loaded at 0x78230000, there is no reloc
    info to tell it that that particular 32-bit value, plus the 64-bit
    field initialising ptr, must be adjusted.

    RIP-relative addressing (I think sometimes called PIC), can fix
    that second reference:

    mov eax, [rip:abc]

    But it only works for code, not data; that initialisation is still
    absolute.

    When a DLL is generated instead, those will need to be moved (to
    avoid multiple DLLs all based at the same address). In that case,
    base-relocation tables are needed: a list of addresses that
    contain a field that needs relocating, and what type and size of
    reloc is needed.

    The same info is needed for EXE if it contains flags saying that
    the EXE could be loaded at an arbitrary adddress.
    =20
    =20
    Your explanation exactly matches what I was imagining.
    The technology for relocation of non-PIC code is already here, in
    file format definitions and in OS loader code. The linker or the
    part of compiler that serves the role of linker can decide to not
    generate required tables. Operation in such mode will have small
    benefits in EXE size and in quicker load time, but IMHO nowadays it
    should be used rarely, only in special situations rather than serve
    as a default of the tool. =20
    =20
    There are two aspects to be considered:
    =20
    * Relocating a program to a different address below 2GB
    =20
    * Relocating a program to any address including above 2GB
    =20
    The first can be accommodated with tables derived from the reloc info
    of object files.
    =20
    But the second requires compiler cooperation in generating code that=20
    will work above 2GB.
    =20
    Part of that can be done with RIP-relative address modes as I touched=20
    on. But not all; RIP-relative won't work here:
    =20
    movsx rax, dword [i]
    mov rax, [rbx*8 + abc]
    =20
    where the address works with registers. This requires something like:
    =20
    lea rcx, [rip:abc] # or mov rcx, abc (64-bit abs addr)
    mov rax, [rbx*8 + rcx]
    =20
    This is specific to x64, but other processors will have their issues.=20
    Like ARM64 which doesn't even have the 32-bit displayment used with
    rip here.
    =20

    You mean, when compiler knows that the program is loaded at low address
    and when combined data segments are relatively small then compiler can
    use zero-extended or sign-extended 32-bit literals to form 64-bit
    addresses of static/global objects?=20
    I see how relocation of such program is a problem in 64-bit mode, but
    still fail to see how similar problem could happen in 32-bit mode.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From bart@3:633/280.2 to All on Tue Mar 26 11:34:14 2024
    On 25/03/2024 23:31, Michael S wrote:
    On Mon, 25 Mar 2024 21:25:27 +0000
    bart <bc@freeuk.com> wrote:

    Your explanation exactly matches what I was imagining.
    The technology for relocation of non-PIC code is already here, in
    file format definitions and in OS loader code. The linker or the
    part of compiler that serves the role of linker can decide to not
    generate required tables. Operation in such mode will have small
    benefits in EXE size and in quicker load time, but IMHO nowadays it
    should be used rarely, only in special situations rather than serve
    as a default of the tool.

    There are two aspects to be considered:

    * Relocating a program to a different address below 2GB

    * Relocating a program to any address including above 2GB

    The first can be accommodated with tables derived from the reloc info
    of object files.

    But the second requires compiler cooperation in generating code that
    will work above 2GB.

    Part of that can be done with RIP-relative address modes as I touched
    on. But not all; RIP-relative won't work here:

    movsx rax, dword [i]
    mov rax, [rbx*8 + abc]

    where the address works with registers. This requires something like:

    lea rcx, [rip:abc] # or mov rcx, abc (64-bit abs addr)
    mov rax, [rbx*8 + rcx]

    This is specific to x64, but other processors will have their issues.
    Like ARM64 which doesn't even have the 32-bit displayment used with
    rip here.


    You mean, when compiler knows that the program is loaded at low address
    and when combined data segments are relatively small then compiler can
    use zero-extended or sign-extended 32-bit literals to form 64-bit
    addresses of static/global objects?
    I see how relocation of such program is a problem in 64-bit mode, but
    still fail to see how similar problem could happen in 32-bit mode.


    At 32 bits the problems of high-loading disappear, as programs and data
    need to fit into 2GB.

    Some problems with relocating remain. RIP-relative can't be used, as I
    believe that works only in 64-bit mode.

    What remains are the base-relocations, which had in the past only been
    needed when generating dynamic libraries like DLLs. They just weren't a
    thing for EXE.

    This then reduces to whether the C toolset will generate the right EXE.
    Either it does or doesn't, but you can always choose a different compiler.

    All I can tell you is that of the suite of 5 compilers I've tried, 4 of
    them, including in 32-bit mode if supported, don't generate an EXE that
    will be loaded at an arbitrary address. Only gcc will do that.

    The same goes for Clang run at rextester.com: that doesn't load high
    (but it could also be an old version).

    Maybe some don't think it's that important. But it's not as
    straightforward as you seem to think. Yes, it might have been a bit
    simpler with 32 bits, but it wasn't trendy then, and not not many still
    use 32 bits.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Stefan Ram@3:633/280.2 to All on Wed Mar 27 21:30:31 2024
    ram@zedat.fu-berlin.de (Stefan Ram) wrote or quoted:
    A "famous security bug":
    void f( void )
    { char buffer[ MAX ];
    /* . . . */
    memset( buffer, 0, sizeof( buffer )); }
    . Can you see what the bug is?
    (I have already read the answer; I post it as a pastime.)

    I was reading this under the heading "State postconditions".
    It was suggested that the code should rather be:

    void f()
    { char buffer[MAX];
    /* . . . */
    memset( buffer, 0, sizeof( buffer ));
    Ensures( buffer[ 0 ]== 0 ); }

    ("Ensures" states a postcondition). Now, according to the text
    I read, the compiler cannot eliminate the "memset" anymore.

    Here are some thoughts of mine on this:

    With the "buffer[ 0 ]== 0", I wonder, as per the "as if rule",
    whether the compiler would still be permitted to replace
    the memset by just "buffer[ 0 ]= 0;".

    So, what would be a bit more "paranoid" would then be:

    for( size_t i = 0; i < sizeof( buffer ); ++i )
    Ensures( buffer[ i ]== 0 );

    or,

    i = mylib_random( sizeof( buffer ));
    Ensures( buffer[ i ]== 0 );

    . How could one implement "Ensures" in C? The first thing that
    comes to mind is a call to "assert" of course.

    But I also have to think of an "escape" Chandler Carruth mentioned
    it in one talk. IIRC, it was something along the lines of

    static void escape( volatile void * p )
    { asm volatile( "" : : "g"(p) : "memory" ); }

    (which might not be standard C). Now, if you call "escape( buffer )"
    at the end of the definition of the function "f" above, the compiler
    knows that the contents of buffer has become visible to the outside
    world, so that the effects of the "memset" operation become visible
    externally, which means that the "memset" call cannot be elided.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: Stefan Ram (3:633/280.2@fidonet)
  • From Stefan Ram@3:633/280.2 to All on Wed Mar 27 21:35:27 2024
    ram@zedat.fu-berlin.de (Stefan Ram) wrote or quoted:
    void f()
    { char buffer[MAX];
    /* . . . */
    memset( buffer, 0, sizeof( buffer ));
    Ensures( buffer[ 0 ]== 0 ); }

    Oh, and now I see a potential bug in this:
    "buffer[ 0 ]" assumes that MAX > 0.

    (ISO C forbids "char buffer[ 0 ];", but the code
    might be used on some nonstandard implementation.)

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: Stefan Ram (3:633/280.2@fidonet)
  • From Richard Kettlewell@3:633/280.2 to All on Wed Mar 27 22:12:03 2024
    ram@zedat.fu-berlin.de (Stefan Ram) writes:
    i = mylib_random( sizeof( buffer ));
    Ensures( buffer[ i ]== 0 );

    . How could one implement "Ensures" in C? The first thing that
    comes to mind is a call to "assert" of course.

    The assert gets compiled out too.

    But I also have to think of an "escape" Chandler Carruth mentioned
    it in one talk. IIRC, it was something along the lines of

    static void escape( volatile void * p )
    { asm volatile( "" : : "g"(p) : "memory" ); }

    (which might not be standard C). Now, if you call "escape( buffer )"
    at the end of the definition of the function "f" above, the compiler
    knows that the contents of buffer has become visible to the outside
    world, so that the effects of the "memset" operation become visible
    externally, which means that the "memset" call cannot be elided.

    Indeed it’s not standard C, but variants of it are a common strategy on compilers that support it.

    The flaw is that any data from the target buffer that’s been copied into registers or other temporary storage isn’t erased. How much that matters
    is situational. In principle C23’s memset_explicit could address this.

    --
    https://www.greenend.org.uk/rjk/

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: terraraq NNTP server (3:633/280.2@fidonet)
  • From Kaz Kylheku@3:633/280.2 to All on Thu Mar 28 08:06:12 2024
    On 2024-03-24, David Brown <david.brown@hesbynett.no> wrote:
    On 24/03/2024 17:02, Kaz Kylheku wrote:
    On 2024-03-24, David Brown <david.brown@hesbynett.no> wrote:
    On 24/03/2024 06:50, Kaz Kylheku wrote:
    (So why bother looking.) I mean,
    the absolute baseline requirement any LTO implementor strives toward is >>>> no change in observable behavior in a strictly conforming program, which >>>> would be a showstopper.


    Yes.

    I don't believe anyone - except you - has said anything otherwise. A C
    implementation is conforming if and only if it takes any correct C
    source code and generates a program image that always has correct
    observable behaviour when no undefined behaviour is executed. There are >>> no extra imaginary requirements to be conforming, such as not being
    allowed to use extra information while compiling translation units.

    But the requirement isn't imaginary. The "least requirements"
    paragraph doesn't mean that all other requirements are imaginary;
    most of them are necessary to describe the language so that we know
    how to find the observable behavior.


    The text is not imaginary - your reading between the lines /is/. There
    is no rule in the C standards stopping the compiler from using
    additional information or knowledge about other parts of the program.

    Sure there is; just not in a way that speaks to the formal notion of conformance. The text is there, and a user and implementor can use
    that as a touchstone for agreeing on something outside of conformance.

    In safety critical coding, we might want to conduct a code review of
    the disassembly of an object file (does it correctly implement the
    intent we believe to be expressed in the source), and then retain that
    exact file until wit needs to be recompiled.

    Sure. And for that reason, some developers in that field will not use
    LTO. I personally don't make much use of LTO because it makes software
    a pain to debug.

    So, in that situation, your requirement can be articulated in a way that
    refers to the descriptions in ISO C. You're having your translation
    units semantically analyzed according to the abstract separation between
    phase 7 and 8 (which is not required to be followed for conformance).

    We can identify the LTO switch in the compiler as hinging around
    whether the abstract semantics is followed or not. (Just we can't tell
    using observable behavior.)

    This seems like a good thing.

    We just may not confuse that conformance (private contract between
    implementor and user) with ISO C conformance, as I have.
    Sorry about that!


    Are you saying that after dozens of posts back and forth where you made claims about non-conformity of C compilers handling of C code in comp.lang.c, with heavy references to the C standards which define the
    term "conformity", you are now saying that you were not talking about C standard conformity?

    Certainly not! I was wrongly talking about that one and only
    conformance.

    Once again, sorry about that.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Anton Shepelev@3:633/280.2 to All on Thu Mar 28 20:23:27 2024
    Kaz Kylheku:

    If C compilers warned about every piece of dead code that
    is eliminated, you'd be up to your ears in diagnostics all
    day.

    Is so much dead code a defect in the source or a benigh
    consequence of a well-pondered decision?

    If you do want the code deleted, that doesn't always mean
    you can do it yoruself. What gets eliminated can be target
    dependent:

    switch (sizeof (long)) {
    case 4: ...
    case 8: ..
    }

    The case above is IMHO best handled by conditional
    compilation, even though more work may be required to
    dispatch on a type size in the preprocessor.

    Because memset is part of the C language, the compiler
    knows exactly what effect it has (that it's equivalent to
    setting all the bytes to zero, like a sequence of
    assignments).

    Yes, it is an instance of a special case relying upon hard-
    coded information. Why not, however, let the programmer
    elimitate this dead code from his code, if it /is/ dead?

    --
    () ascii ribbon campaign -- against html e-mail
    /\ www.asciiribbon.org -- against proprietary attachments

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From James Kuyper@3:633/280.2 to All on Thu Mar 28 20:52:20 2024
    On Sat, 23 Mar 2024 11:26:03 +0000
    bart <bc@freeuk.com> wrote:

    On 23/03/2024 07:26, James Kuyper wrote:
    bart <bc@freeuk.com> writes:
    On 22/03/2024 17:14, James Kuyper wrote:
    [...]
    If you want to tell a system not only what a program must do, but
    also how it must do it, you need to use a lower-level language
    than C.

    Which one?

    That's up to you. The point is, C is NOT that language.

    I'm asking which /mainstream/ HLL is lower level than C. So
    specifically ruling out assembly.

    I don't know of any, and said nothing to suggest that there is one. I'm
    only pointing out that if that's important to you, you must either find
    such a language, or create it (as you seem to already be doing). If, as
    you imply, there's no such mainstream HLL, that implies that there's not
    enough people sharing your preferences to make such an HLL popular
    enough to qualify as mainstream.
    I certainly don't care how my programs achieve their observable
    behavior, and I'm only too happy to let machine-language experts use
    their specialized expertise to create compilers which achieve that
    behavior in whatever way is best for the target system. I have no desire
    to spend my time aquiring that expertise.




    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From James Kuyper@3:633/280.2 to All on Thu Mar 28 21:14:24 2024
    On 24/03/2024 19:58, bart wrote:
    On 24/03/2024 15:53, Michael S wrote:

    #include <stdio.h>
    #include <stddef.h>

    int main(void)
    {
    char* p0 = (char*)((size_t)main & -(size_t)0x10000);
    printf("%c%c\n", p0[0], p0[1]);
    return 0;
    }


    That would work for small programs. Not necessarily for bigger
    programs.


    I'm not sure how that works. Are EXE images always loaded at multiple of 64KB? I suppose on larger programs it could search backwards 64KB at a
    time (although it could also hit on a rogue 'MZ' in program data).

    My point however was whether C considered that p0[0] access UB because
    it doesn't point into any C data object.

    Here's what the standard says about (size_t)main:
    "... the result is implementation-defined. If the result cannot be
    represented in the integer type, the behavior is undefined. The result
    need not be in the range of values of any integer type."

    Here's what the standard says about the conversion to char*:
    "the result is implementation-defined, might not be correctly aligned,
    might not point to an entity of the referenced type, and might produce
    an indeterminate representation when stored into an object."

    Alignment cannot be an issue with char*, but the other two problems
    remain. In particular, I think you're assuming that, when converted back
    to a pointer, the resulting pointer will point 0x10000 bytes further on
    in memory. There's no such guarantee.

    p0[0] is defined as *(p0+0). As a result, the relevant wording occurs in
    the description of the unary * operator.

    "If the operand points to a function, the result is a function
    designator; if it points to an object, the result is an lvalue
    designating the object."

    Here's the most fundamental problem: there's no guarantee that p0[0]
    points at a C object. There's a very good chance, if the code does what
    you're hoping it will do, that it points inside a function. As a result,
    the following applies:

    "If an invalid value has been assigned to the pointer, the behavior of
    the unary * operator is undefined."

    So, an implementation is free to define the behavior of such code so
    that it does what you want - but the C standard doesn't even come close
    to mandating that it do so.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Fri Mar 29 05:07:28 2024
    On 27/03/2024 22:06, Kaz Kylheku wrote:
    On 2024-03-24, David Brown <david.brown@hesbynett.no> wrote:
    On 24/03/2024 17:02, Kaz Kylheku wrote:
    On 2024-03-24, David Brown <david.brown@hesbynett.no> wrote:
    On 24/03/2024 06:50, Kaz Kylheku wrote:
    (So why bother looking.) I mean,
    the absolute baseline requirement any LTO implementor strives toward is >>>>> no change in observable behavior in a strictly conforming program, which >>>>> would be a showstopper.


    Yes.

    I don't believe anyone - except you - has said anything otherwise. A C >>>> implementation is conforming if and only if it takes any correct C
    source code and generates a program image that always has correct
    observable behaviour when no undefined behaviour is executed. There are >>>> no extra imaginary requirements to be conforming, such as not being
    allowed to use extra information while compiling translation units.

    But the requirement isn't imaginary. The "least requirements"
    paragraph doesn't mean that all other requirements are imaginary;
    most of them are necessary to describe the language so that we know
    how to find the observable behavior.


    The text is not imaginary - your reading between the lines /is/. There
    is no rule in the C standards stopping the compiler from using
    additional information or knowledge about other parts of the program.

    Sure there is; just not in a way that speaks to the formal notion of conformance. The text is there, and a user and implementor can use
    that as a touchstone for agreeing on something outside of conformance.


    Users and implementers can agree on requirements that are outside the requirements of the standards - that is certainly true. A user will
    require many things of a compiler that are not in the standard - the
    system it runs on, its speed, its cost, the quality of the error
    messages, and countless other things.

    Those are not mentioned in the C standards, but are without doubt
    important to users.

    However, you can't claim there are things in the C standards that have implications about things that are not related to conformance to the C standards! And you can't claim that violating something that is based
    on /your/ requirements outside of the C standards makes a compiler non-conforming in the context of the C standards.

    You are free to say that /you/ require a particular behaviour from your compiler, and that LTO violates conformity with /your/ requirements.
    And you can happily use a reference to the C standards to help explain
    your additional requirements. You just don't get to say that the C
    standards make requirements that they don't contain.

    In safety critical coding, we might want to conduct a code review of
    the disassembly of an object file (does it correctly implement the
    intent we believe to be expressed in the source), and then retain that
    exact file until wit needs to be recompiled.

    Sure. And for that reason, some developers in that field will not use
    LTO. I personally don't make much use of LTO because it makes software
    a pain to debug.

    So, in that situation, your requirement can be articulated in a way that refers to the descriptions in ISO C.

    No, not remotely.

    My requirements for debugging are not covered in the C standards in any
    way. Currently, enabling LTO in gcc makes the code generation difficult
    for single-step debugging - it is often very difficult to see which
    assembly instructions match up with which piece of source code. I
    fine-tune other optimisation flags too in order to give a better balance
    (for my own personal definition of "better") between code efficiency and
    ease of debugging. I do not suspect gcc LTO of generating incorrect or non-conforming code.

    When you choose not to enable LTO, you are making exactly the same kind
    of decision (except you do so for testability, rather than debugability).

    You're having your translation
    units semantically analyzed according to the abstract separation between phase 7 and 8 (which is not required to be followed for conformance).


    That is completely irrelevant to me. What /is/ relevant, is that code
    is not moved around too much and it is thus easier to follow when single-stepping or doing other debugging. I may also disable inlining
    and other inter-procedural optimisations within units - something that
    clearly has no relevance to conformity.

    We can identify the LTO switch in the compiler as hinging around
    whether the abstract semantics is followed or not. (Just we can't tell
    using observable behavior.)

    No, we can't. LTO is fully valid, conforming optimisation that does not affect the abstract semantics of the language in any way.

    But it might affect other requirements outside of the C standards and
    their definition of the semantics of the language.


    This seems like a good thing.

    It's a good thing that people get the choice to balance different
    requirements beyond the C standards. (And they even get some options
    that affect conformity, because that is not always important to all users.)


    We just may not confuse that conformance (private contract between
    implementor and user) with ISO C conformance, as I have.
    Sorry about that!


    Are you saying that after dozens of posts back and forth where you made
    claims about non-conformity of C compilers handling of C code in
    comp.lang.c, with heavy references to the C standards which define the
    term "conformity", you are now saying that you were not talking about C
    standard conformity?

    Certainly not! I was wrongly talking about that one and only
    conformance.

    Once again, sorry about that.


    OK. Let's try to be clear - "conformance" on its own, in c.l.c., means conformity to the C standards. If you or I are talking about conforming
    to a different set of requirements, we need to be explicit about it.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Tim Rentsch@3:633/280.2 to All on Thu Apr 18 05:10:28 2024
    Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:

    On 24/03/2024 16:45, Tim Rentsch wrote:

    The C standard means what the ISO C group thinks it means.
    They are the ultimate and sole authority. Any discussion about what
    the C standard requires that ignores that or pretends otherwise is
    a meaningless exercise.

    An intentionalist.

    That is a misunderstanding of what I said.

    But when a text has come about by a process of argument, negotation
    and compromise and votes, is that postion so easy to defend as it
    might appear to be for a simpler text?

    It's not a position, it's an observation. The ISO C committee is
    the recognized authority for judgment about the meaning of the C
    standard. Whatever discussion may have gone into writing the
    document is irrelevant; all that matters is that the ISO C
    group went through the approved ISO process, and hence the world
    at large defers to their view as being authoritative on the
    question of how to read the text of the standard.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From David Brown@3:633/280.2 to All on Thu Apr 18 18:20:19 2024
    On 17/04/2024 21:10, Tim Rentsch wrote:
    Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:

    On 24/03/2024 16:45, Tim Rentsch wrote:

    The C standard means what the ISO C group thinks it means.
    They are the ultimate and sole authority. Any discussion about what
    the C standard requires that ignores that or pretends otherwise is
    a meaningless exercise.

    An intentionalist.

    That is a misunderstanding of what I said.

    But when a text has come about by a process of argument, negotation
    and compromise and votes, is that postion so easy to defend as it
    might appear to be for a simpler text?

    It's not a position, it's an observation. The ISO C committee is
    the recognized authority for judgment about the meaning of the C
    standard. Whatever discussion may have gone into writing the
    document is irrelevant; all that matters is that the ISO C
    group went through the approved ISO process, and hence the world
    at large defers to their view as being authoritative on the
    question of how to read the text of the standard.

    You can't have it both ways.

    One interpretation is that the /text/ of the standard is the be-all and end-all of "the C standard", in which case what the ISO C group thinks
    is irrelevant. It is only the written word that matters.

    The other is that it is the beliefs and intentions of the ISO C group,
    as the C authority, that defines "the C standard", in which case the
    written standard is just an approximate summary of how they define the language. Any other published writings or discussions, such as
    rationale documents, WG documents, Jens Gustedt's Blog, C compilers and libraries written by committee members, etc., are relevant to
    understanding the group's interpretation of and meaning behind the standard.

    You can't claim that /only/ the text matters and also that /only/ the committee's judgement matters.


    I think most people would say that the text of the C standard is authoritative, not the committee or their opinions, judgements, thoughts
    or interpretations. If the text does not match their intentions, or is
    - in their opinion - misunderstood by others, then it is their job to
    revise or update the standard document.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (3:633/280.2@fidonet)
  • From Keith Thompson@3:633/280.2 to All on Fri Apr 19 07:26:01 2024
    Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
    Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
    On 24/03/2024 16:45, Tim Rentsch wrote:
    The C standard means what the ISO C group thinks it means.
    They are the ultimate and sole authority. Any discussion about what
    the C standard requires that ignores that or pretends otherwise is
    a meaningless exercise.

    An intentionalist.

    That is a misunderstanding of what I said.

    But when a text has come about by a process of argument, negotation
    and compromise and votes, is that postion so easy to defend as it
    might appear to be for a simpler text?

    It's not a position, it's an observation. The ISO C committee is
    the recognized authority for judgment about the meaning of the C
    standard. Whatever discussion may have gone into writing the
    document is irrelevant; all that matters is that the ISO C
    group went through the approved ISO process, and hence the world
    at large defers to their view as being authoritative on the
    question of how to read the text of the standard.

    I agree only to some extent.

    I agree that the committee is the primary authority on what the words
    they publish mean. If a passage in the standard is unclear, it's the
    committee that will publish an official response to any defect report. Sometimes that response will be something like "The current wording is
    clear enough, and here's what it means".

    But most of the standard has not been subject to such defect reports,
    and the only source of information we have *or need* is the standard
    itself. If the standard says, to pick an entirely arbitrary example,
    "A pointer to any object type may be converted to a pointer to
    void and back again; the result shall compare equal to the original
    pointer.", I don't need the committee to explain what that means.

    It's the committee's job to publish words whose meaning is sufficiently
    clear, and overall I'd say they've done that job reasonably well.

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Working, but not speaking, for Medtronic
    void Void(void) { Void(); } /* The recursive call of the void */

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: None to speak of (3:633/280.2@fidonet)