• Re: Recent history of vi

    From Nuno Silva@3:633/10 to All on Sun Nov 16 00:43:50 2025
    On 2025-11-15, rbowman wrote:

    On Sat, 15 Nov 2025 09:59:39 +0000, Nuno Silva wrote:

    On 2025-11-15, rbowman wrote:

    On Fri, 14 Nov 2025 16:09:47 -0500, c186282 wrote:

    WordStar and close variants were VERY popular back in the day. Kind >>>> of everyone's "first word processor".
    Everyone used it alongside Lotus-123.

    It was bundled on the Osborne 1 CP/M machine. I got a lot of miles out
    of it as a programming editor in the text mode. When I finally moved to
    the DOS world I bought Brief.

    https://en.wikipedia.org/wiki/Brief_(text_editor)

    'ed' wasn't much fun. I think I way have had a freeware clone of vi
    that was no Joy either. I guessing 95% of the people who say 'I use vi'
    never have. Most Linux distros bring up Vim if you type 'vi'. One
    exception is Arch. 'vi' is a hard link to ex which comes up in the
    visual mode for that old timey flavor.

    IIRC that source was lost or elusive for a long time, or perhaps held
    back by lack of permission to distribute?

    Like Unix itself ed and vi had licensing problems.

    https://tech.slashdot.org/story/02/01/24/0146248/caldera-releases- original-unices-under-bsd-license

    Somehow at least the legacy vi code escaped the AT&T, UNIX System Laboratories, Novel, Caldera, SCO mess.

    How did it escape the SCO mess? Wouldn't such a release from Caldera
    rely on Santa Cruz Operation having gotten ownership of the code from
    Novell? And if I'm reading Wikipedia right,[0] Novell still having the
    rights played a role in the later mess involving the SCO Group?

    Or is there something that I'm overlooking here?

    [0] https://enwp.org/SCO_v._Novell

    --
    Nuno Silva

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Pancho@3:633/10 to All on Sun Nov 16 09:14:32 2025
    On 11/16/25 00:43, Eli the Bearded wrote:


    It's fine to to like nano or emacs or vscode or whatever. But that
    just means you are not coming from a place that can judge my
    appreciation of the features of vi(m).


    Yes, it is a question of taste and not morals.

    My taste includes both vi and vscode. ;-)




    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From The Natural Philosopher@3:633/10 to All on Sun Nov 16 10:33:32 2025
    On 16/11/2025 05:11, Ted Nolan <tednolan> wrote:
    In article <10fasl6$3p4r1$3@dont-email.me>,
    Lawrence DOliveiro <ldo@nz.invalid> wrote:
    On 15 Nov 2025 18:48:51 GMT, Ted Nolan <tednolan> wrote:

    I find it takes a lot of munging to get vim to *really* work like vi.

    To me, that sounds like someone saying ??it takes a lot of munging to get a >> Trabant to *really* work like a Morris Minor??. I can??t imagine myself
    wanting to use either.

    Well, squids & kids, but my fingers do vi automatically. Anything else
    not so much.

    I find that depressing.

    I used to have to write reams of code in 'vi'. Horrible

    Today I use whatever-it-is-that-typing-vi-brings-up to edit the odd
    config file with root permissions, as it is marginally quicker than
    invoking a GUI text editor and managing root perms.
    And its quicker than learning nano and I don't need joe for a single
    line of /etc/whatever.



    --
    "Women actually are capable of being far more than the feminists will
    let them."




    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Chris Ahlstrom@3:633/10 to All on Sun Nov 16 09:31:08 2025
    rbowman wrote this post by blinking in Morse code:

    On 15 Nov 2025 18:48:51 GMT, Ted Nolan <tednolan> wrote:


    I find it takes a lot of munging to get vim to *really* work like vi.
    The one on FreeBSD which I think is technically "nex" is much closer out
    of the box.

    :he compatible has the disclaimer

    When this option is set, numerous other options are set to make Vim as Vi-compatible as possible.

    The Arch vi is the real thing.

    You sure about that. ex-vi, vim.tiny, nvi are all close, but not
    even this one is the "real" vi:

    <https://github.com/n-t-roff/heirloom-ex-vi>

    For example, it adds UTF-8 support.

    I've no idea what version it is because
    real vi doesn't do --version or much of anything useful.

    I first learned vi using pc-vi around 1985. My first try at it
    confused the *hell* out of me, but eventually I got the hang of
    it. I also used microEmacs for awhile.

    Apparently Bill Joy hates vi now. It's like Dennis Ritchie using
    Windows :-(

    But...

    <https://anders.unix.se/2015/10/26/interview-with-dennis-ritchie-2003/?>

    Q: Could you please describe a typical work day at Bell Labs?
    What software do you use?

    A: I tend to come in late unless there?s a meeting, but spend
    a fair amount of time tending to e-mail communication. My
    own environment (on PC hardware) actually runs Windows NT,
    but it is used mainly as a graphics terminal connected to a
    Plan 9 server, in a way approximately analogous to an X
    windows client. The connection at home is now via cable
    modem (until last summer ISDN), and Ethernet at the office.
    Any editing, software work, and mail is done in this
    exported Plan 9. For stuff like getting Excel and Word
    things, plus much WWW browsing, I revert to NT.

    --
    I'm not a level-headed person... -- Bruce Perens

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Chris Ahlstrom@3:633/10 to All on Sun Nov 16 09:35:17 2025
    Eli the Bearded wrote this post by blinking in Morse code:

    In comp.os.linux.misc, rbowman <bowman@montana.com> wrote:

    <brevsnip>

    The Arch vi is the real thing. I've no idea what version it is because
    real vi doesn't do --version or much of anything useful.

    In vi, the standard way to get the version is with ":version". It looks
    like arch is using Heirloom Vi:

    https://ex-vi.sourceforge.net/

    In nvi, :version yields

    Version nvi-1.81.6 (2007-11-18) The CSRG, University of California, Berkeley.

    That is a port of old code with many multibyte (eg UTF-8) fixes. It
    should work with hardcopy terminals, which a lot of other vi
    implementations (including vim) will not do. Those others expect you
    to use ex mode on hardcopy terminals.

    I learned vi on Digital Unix, A/UX, HP-UX, SunOS 4, and Solaris
    2.(various), but I dabbled in vi clones for a long time, and was using
    vim back in the 2.x versions. Elvis is still the default vi in
    Slackware, and I've used recent versions of elvis for that reason. nvi
    is default on NetBSD, and probably that FreeBSD one mentioned above. I
    use NetBSD regularly and other BSDs very rarely.

    In the vim distro there are sample macro packages. The ones to run
    Conway's Game of Life were written by me on a Solaris box. The Solaris
    vi can run them, but eventually it crashes out because there is a bug
    that makes real vi (at least real vi of that era) forget marks after a
    while. Vim will just work. Neovim fails to even start.

    On the Debian system I'm working on right now those macros are in /usr/share/vim/vim90/macros/life/

    Elijah
    ------
    admits elvis is a pretty good vi imitation, but still not perfect

    I wouldn't mind an old vi clone with syntax highlight. That's my
    biggest crutch for writing code.

    --
    The early bird gets the coffee left over from the night before.

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Chris Ahlstrom@3:633/10 to All on Sun Nov 16 09:49:10 2025
    The Natural Philosopher wrote this post by blinking in Morse code:

    On 16/11/2025 05:11, Ted Nolan <tednolan> wrote:
    In article <10fasl6$3p4r1$3@dont-email.me>,
    Lawrence DOliveiro <ldo@nz.invalid> wrote:
    On 15 Nov 2025 18:48:51 GMT, Ted Nolan <tednolan> wrote:

    I find it takes a lot of munging to get vim to *really* work like vi.

    To me, that sounds like someone saying ??it takes a lot of munging to get a
    Trabant to *really* work like a Morris Minor??. I can??t imagine myself >>> wanting to use either.

    Well, squids & kids, but my fingers do vi automatically. Anything else
    not so much.

    I find that depressing.

    I used to have to write reams of code in 'vi'. Horrible

    I still do. (Though it is vim).

    Today I use whatever-it-is-that-typing-vi-brings-up to edit the odd
    config file with root permissions, as it is marginally quicker than
    invoking a GUI text editor and managing root perms.
    And its quicker than learning nano and I don't need joe for a single
    line of /etc/whatever.

    We are old and stuck in our ways :-)

    My first editor (other than a keypunch) was TECO, in a computer
    class, on a PDP-11, early 1980's.

    --
    You would if you could but you can't so you won't.

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Chris Ahlstrom@3:633/10 to All on Sun Nov 16 09:50:35 2025
    Waldek Hebisch wrote this post by blinking in Morse code:

    In alt.folklore.computers rbowman <bowman@montana.com> wrote:
    On Sat, 15 Nov 2025 09:59:39 +0000, Nuno Silva wrote:

    On 2025-11-15, rbowman wrote:

    On Fri, 14 Nov 2025 16:09:47 -0500, c186282 wrote:

    WordStar and close variants were VERY popular back in the day. Kind >>>>> of everyone's "first word processor".
    Everyone used it alongside Lotus-123.

    It was bundled on the Osborne 1 CP/M machine. I got a lot of miles out >>>> of it as a programming editor in the text mode. When I finally moved to >>>> the DOS world I bought Brief.

    https://en.wikipedia.org/wiki/Brief_(text_editor)

    'ed' wasn't much fun. I think I way have had a freeware clone of vi
    that was no Joy either. I guessing 95% of the people who say 'I use vi' >>>> never have. Most Linux distros bring up Vim if you type 'vi'. One
    exception is Arch. 'vi' is a hard link to ex which comes up in the
    visual mode for that old timey flavor.

    IIRC that source was lost or elusive for a long time, or perhaps held
    back by lack of permission to distribute?

    Like Unix itself ed and vi had licensing problems.

    https://tech.slashdot.org/story/02/01/24/0146248/caldera-releases-
    original-unices-under-bsd-license

    Somehow at least the legacy vi code escaped the AT&T, UNIX System
    Laboratories, Novel, Caldera, SCO mess.

    Around 1992 I fetched 'elvis' from the net. I did not use it
    much, but a guy which was previously using real vi found it
    to be a reasonable replacement.

    Later, there was nvi. And after that Linux distributions
    switched to vim.

    Stevie was a clone of vi and Vim followed on Stevie.

    There was an implementation of SteVIe for the Atari ST iirc.

    --
    Children are natural mimics who act like their parents despite every
    effort to teach them good manners.

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Mechanicjay@3:633/10 to All on Sun Nov 16 15:40:22 2025
    On Sun, 16 Nov 2025 09:14:32 +0000, Pancho <Pancho.Jones@protonmail.com> wrote: >On 11/16/25 00:43, Eli the Bearded wrote:


    It's fine to to like nano or emacs or vscode or whatever. But that
    just means you are not coming from a place that can judge my
    appreciation of the features of vi(m).


    Yes, it is a question of taste and not morals.

    My taste includes both vi and vscode. ;-)

    Some years back I moved all my PHP work out of Eclipse and into vim. With a
    a few plugins I get modern conveniences like a debug console, code style enforcement and syntax validation. I was inspired to make this move by a friend
    of mine who does core engine work at mongoDB describing his VIM setup.

    Between my fingers knowing vi and not having to think about it, and the distraction free environment that doesn't fight me, it greatly increased my overall satisfaction with time spent writing software.

    :version on my workstation shows:
    VIM - Vi IMproved 9.1 (2024 Jan 02, compiled Oct 10 2025 02:26:29)

    --
    Sent from my Personal DECstation 5000/25

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Mechanicjay@3:633/10 to All on Sun Nov 16 15:47:30 2025
    On 16 Nov 2025 09:35:17 -0500, Chris Ahlstrom <OFeem1987@teleworm.us> wrote: >Eli the Bearded wrote this post by blinking in Morse code:

    In comp.os.linux.misc, rbowman <bowman@montana.com> wrote:

    <brevsnip>

    The Arch vi is the real thing. I've no idea what version it is because
    real vi doesn't do --version or much of anything useful.

    In vi, the standard way to get the version is with ":version". It looks
    like arch is using Heirloom Vi:

    https://ex-vi.sourceforge.net/

    In nvi, :version yields

    Version nvi-1.81.6 (2007-11-18) The CSRG, University of California, Berkeley.

    That is a port of old code with many multibyte (eg UTF-8) fixes. It
    should work with hardcopy terminals, which a lot of other vi
    implementations (including vim) will not do. Those others expect you
    to use ex mode on hardcopy terminals.

    I learned vi on Digital Unix, A/UX, HP-UX, SunOS 4, and Solaris
    2.(various), but I dabbled in vi clones for a long time, and was using
    vim back in the 2.x versions. Elvis is still the default vi in
    Slackware, and I've used recent versions of elvis for that reason. nvi
    is default on NetBSD, and probably that FreeBSD one mentioned above. I
    use NetBSD regularly and other BSDs very rarely.


    On this Ultrix 4.5 box, :version yeilds:
    Version 3.7, 18-Oct-85

    I was missing some nice features, like the ruler at the bottom and the ability to set an autowrap at 80 cols, which makes posting a message like this much easier, so I built me a newer vim:
    VIM - Vi IMproved 6.3 (2004 June 7, compiled Nov 11 2025 00:38:15)

    --
    Sent from my Personal DECstation 5000/25

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Richard Kettlewell@3:633/10 to All on Sun Nov 16 16:14:20 2025
    Nuno Silva <nunojsilva@invalid.invalid> writes:
    On 2025-11-15, rbowman wrote:
    IIRC that source was lost or elusive for a long time, or perhaps held
    back by lack of permission to distribute?

    Like Unix itself ed and vi had licensing problems.

    https://tech.slashdot.org/story/02/01/24/0146248/caldera-releases-
    original-unices-under-bsd-license

    Somehow at least the legacy vi code escaped the AT&T, UNIX System
    Laboratories, Novel, Caldera, SCO mess.

    How did it escape the SCO mess? Wouldn't such a release from Caldera
    rely on Santa Cruz Operation having gotten ownership of the code from
    Novell? And if I'm reading Wikipedia right,[0] Novell still having the rights played a role in the later mess involving the SCO Group?

    Or is there something that I'm overlooking here?

    [0] https://enwp.org/SCO_v._Novell

    I think the timeline is as follows:

    1976 Original ex, a fork of ed
    1977 vi, as a mode of ex
    1992 USL lawsuit filed
    1993 USL purchased by Novell
    1993 32V copyright invalidated by court[1]
    1994 4.4BSD-Lite released, excluding vi
    1994 USL lawsuit settled
    1995 Novell sell UnixWare to SCO
    2000 SCO sell Unix assets to Caldera Systems
    2002 Caldera releases 32V and V1...V7 under a BSD licence
    2002 Caldera renames to SCO Group
    2003 SCO Group (i.e. Caldera) sues IBM over supposed AT&T code in Linux
    2004 SCO Group sues Novell over ownership of AT&T code
    2007 SCO loses against Novell
    2007 SCO bankrupt, trustees continue legal action
    2021 SCO Group case lawsuits against IBM finally settled

    I suspect the 1993 decision means that vi was actually free and clear
    from that point onwards, with the 2002 relicensing being a legal no-op;
    at any rate it was before the point that Caldera turned evil and before
    anyone picked a fight over what Caldera had actually bought.

    4.4BSD-Lite already had nvi and other clones emerged; possibly nobody
    cared much about the original as a result.

    [1] https://web.archive.org/web/20180307020845/http://sco.tuxrocks.com/Docs/USL/Doc-92.html
    ?Consequently, I find that Plaintiff has failed to demonstrate a
    likelihood that it can successfully defend its copyright in
    32V. Plaintiff's claims of copyright violations are not a basis for
    injunctive relief.?

    --
    https://www.greenend.org.uk/rjk/

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Joe Makowiec@3:633/10 to All on Sun Nov 16 18:38:36 2025
    On 16 Nov 2025 in comp.os.linux.misc, Mechanicjay wrote:

    Some years back I moved all my PHP work out of Eclipse and into vim.
    With a a few plugins I get modern conveniences like a debug
    console, code style enforcement and syntax validation.

    If you don't mind my asking, which plugins? I see several web pages out
    there with suggestions, but I'd be curious to see another set.

    --
    Joe Makowiec
    http://makowiec.org/
    Email: http://makowiec.org/contact/?Joe
    Usenet Improvement Project: http://twovoyagers.com/improve-usenet.org/

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From The Natural Philosopher@3:633/10 to All on Sun Nov 16 20:26:27 2025
    On 16/11/2025 14:49, Chris Ahlstrom wrote:
    I used to have to write reams of code in 'vi'. Horrible
    I still do. (Though it is vim).

    I have a GUI. Geany is SO much nicer...
    --
    "Nature does not give up the winter because people dislike the cold."

    ? Confucius


    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From The Natural Philosopher@3:633/10 to All on Sun Nov 16 20:30:19 2025
    On 16/11/2025 20:27, Ted Nolan <tednolan> wrote:
    I tend to be looking at a log file, control-z out to check something else
    and then "fg" back

    Not since a GUI gave me unlimited consoles...on the same monitor


    --
    ?A leader is best When people barely know he exists. Of a good leader,
    who talks little,When his work is done, his aim fulfilled,They will say,
    ?We did this ourselves.?

    ? Lao Tzu, Tao Te Ching


    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Lawrence D?Oliveiro@3:633/10 to All on Sun Nov 16 20:57:23 2025
    On Sun, 16 Nov 2025 10:33:32 +0000, The Natural Philosopher wrote:

    Today I use whatever-it-is-that-typing-vi-brings-up to edit the odd
    config file with root permissions, as it is marginally quicker than
    invoking a GUI text editor and managing root perms.

    emacs -nw

    Don?t do GUI stuff as root.

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Lawrence D?Oliveiro@3:633/10 to All on Sun Nov 16 20:59:35 2025
    On 16 Nov 2025 20:19:12 GMT, Ted Nolan <tednolan> wrote:

    Lack of utf-8 would be an issue for some things, but mostly not.

    Without UTF-8, you could not have ??? or ??? or ?? or those curly quotes.


    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Lawrence D?Oliveiro@3:633/10 to All on Sun Nov 16 21:01:13 2025
    On Sun, 16 Nov 2025 09:14:32 +0000, Pancho wrote:

    My taste includes both vi and vscode. ;-)

    Visual Studio Code makes Emacs look petite.

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Lawrence D?Oliveiro@3:633/10 to All on Sun Nov 16 21:04:20 2025
    On Sun, 16 Nov 2025 20:30:19 +0000, The Natural Philosopher wrote:

    On 16/11/2025 20:27, Ted Nolan <tednolan> wrote:

    I tend to be looking at a log file, control-z out to check something
    else and then "fg" back

    Not since a GUI gave me unlimited consoles...on the same monitor

    I have maybe 20 Konsole tabs currently open. No need to look at logs in
    any editor: just use regular log-display commands (e.g. journalctl) for
    that. If I need to, I can copy and paste between editor windows and
    terminal windows.

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Lawrence D?Oliveiro@3:633/10 to All on Sun Nov 16 21:19:31 2025
    On 16 Nov 2025 21:04:20 GMT, Ted Nolan <tednolan> wrote:

    In article <10fddvm$dsjl$3@dont-email.me>,
    Lawrence DOliveiro <ldo@nz.invalid> wrote:

    On 16 Nov 2025 20:19:12 GMT, Ted Nolan <tednolan> wrote:

    Lack of utf-8 would be an issue for some things, but mostly not.

    Without UTF-8, you could not have ????? or ??????

    Right. Don't need those.

    You can?t even see them properly!

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Nuno Silva@3:633/10 to All on Sun Nov 16 23:13:58 2025
    On 2025-11-16, Lawrence D?Oliveiro wrote:

    On 16 Nov 2025 20:19:12 GMT, Ted Nolan <tednolan> wrote:

    Lack of utf-8 would be an issue for some things, but mostly not.

    Without UTF-8, you could not have ??? or ??? or ?? or those curly quotes.

    Those three are all in iso8859-15 and in Mac OS Roman...

    And I'd guess only the first isn't in latin1.

    As for curly quotes, I think not having these might actually be a
    feature :-P

    --
    Nuno Silva

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Niklas Karlsson@3:633/10 to All on Sun Nov 16 23:51:47 2025
    On 2025-11-16, Nuno Silva <nunojsilva@invalid.invalid> wrote:
    On 2025-11-16, Lawrence D?Oliveiro wrote:

    On 16 Nov 2025 20:19:12 GMT, Ted Nolan <tednolan> wrote:

    Lack of utf-8 would be an issue for some things, but mostly not.

    Without UTF-8, you could not have ??? or ??? or ?? or those curly quotes.

    Those three are all in iso8859-15 and in Mac OS Roman...

    Huh. I wouldn't have expected "?" to be in Mac OS Roman, but you're
    right. As of a certain version, anyhow.

    Niklas
    --
    "Avoid hyperbole at all costs, its the most destructive argument on
    the planet" - Mark McIntyre in comp.lang.c

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Lawrence D?Oliveiro@3:633/10 to All on Mon Nov 17 00:43:05 2025
    On 16 Nov 2025 23:18:50 GMT, Ted Nolan <tednolan> wrote:

    Bingo on the "smart" quotes!

    I also like using ?? and ?? as metasyntactic brackets, but I expect
    French people will interpret those as quotes ...

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Eli the Bearded@3:633/10 to All on Mon Nov 17 02:32:32 2025
    In comp.os.linux.misc, Chris Ahlstrom <OFeem1987@teleworm.us> wrote:
    You sure about that. ex-vi, vim.tiny, nvi are all close, but not
    even this one is the "real" vi:

    <https://github.com/n-t-roff/heirloom-ex-vi>

    For example, it adds UTF-8 support.

    https://github.com/n-t-roff/ex-1.1

    Starts in ex mode, to get vi, you need to ask for it at the : prompt.

    There's also

    https://github.com/n-t-roff/ex-2.2

    for a more modern flavor.

    Elijah
    ------
    will the real Ship of Theseus please sail home?

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Chris Ahlstrom@3:633/10 to All on Mon Nov 17 11:24:36 2025
    rbowman wrote this post by blinking in Morse code:

    On Sun, 16 Nov 2025 09:50:35 -0500, Chris Ahlstrom wrote:

    There was an implementation of SteVIe for the Atari ST iirc.

    ST Editor for Vi Enthusiasts.

    Moolenaar extended Stevie for his Amiga. The Amiga spawned a lot of software.

    https://en.wikipedia.org/wiki/Fred_Fish

    Cool!

    Also I installed the gulam unix-like shell on the ST, using vi to
    edit. Oh so long ago.

    --
    A friend of mine is into Voodoo Acupuncture. You don't have to go.
    You'll just be walking down the street and... Ooohh, that's much better.
    -- Steven Wright

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Chris Ahlstrom@3:633/10 to All on Mon Nov 17 11:39:23 2025
    rbowman wrote this post by blinking in Morse code:

    On Sun, 16 Nov 2025 20:30:19 +0000, The Natural Philosopher wrote:

    On 16/11/2025 20:27, Ted Nolan <tednolan> wrote:
    I tend to be looking at a log file, control-z out to check something
    else and then "fg" back

    Not since a GUI gave me unlimited consoles...on the same monitor

    While I use Vim for quick edits of a config file or with ssh, gVim is what
    I mostly use for that reason. I may use the menu once in a blue moon to change the font or theme.

    I use gvim for editing two to 4 files side-by-side, especially
    when copying code.

    Also gvimdiff and git difftool: [diff] tool = gvimdiff.

    For me, gvimdiff beats both kdiff3 and Beyond Compare for
    highlighting differences.

    --
    System going down at 1:45 this afternoon for disk crashing.

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Chris Ahlstrom@3:633/10 to All on Mon Nov 17 11:44:37 2025
    The Natural Philosopher wrote this post by blinking in Morse code:

    On 16/11/2025 14:49, Chris Ahlstrom wrote:
    I used to have to write reams of code in 'vi'. Horrible

    That reminds me of when the DOS team I once worked on used edlin
    as *the* editor. Wotta a pain editing in 64k segments on a 1000k
    file.

    I see that FreeDOS provides a fairly faithful edlin.

    Even the primitive vi was nicer than TECO and edlin.

    I still do. (Though it is vim).

    I have a GUI. Geany is SO much nicer...

    gvim when it's a help, vim usually, and always when ssh'ing.

    --
    Debian is the Jedi operating system: "Always two there are, a master and
    an apprentice".
    -- Simon Richter on debian-devel

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Scott Lurndal@3:633/10 to All on Mon Nov 17 19:39:24 2025
    Nuno Silva <nunojsilva@invalid.invalid> writes:
    On 2025-11-15, rbowman wrote:


    IIRC that source was lost or elusive for a long time, or perhaps held
    back by lack of permission to distribute?

    Like Unix itself ed and vi had licensing problems.

    https://tech.slashdot.org/story/02/01/24/0146248/caldera-releases-
    original-unices-under-bsd-license

    Somehow at least the legacy vi code escaped the AT&T, UNIX System
    Laboratories, Novel, Caldera, SCO mess.

    How did it escape the SCO mess? Wouldn't such a release from Caldera
    rely on Santa Cruz Operation having gotten ownership of the code from
    Novell? And if I'm reading Wikipedia right,[0] Novell still having the >rights played a role in the later mess involving the SCO Group?

    Or is there something that I'm overlooking here?

    Bill Joy's changes to ex(1) that implemented vi(1) mode were
    initially released in BSD. It was later opensourced via Solaris.

    https://en.wikipedia.org/wiki/Vi_(text_editor)

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Mechanicjay@3:633/10 to All on Tue Nov 18 03:55:50 2025
    On 16 Nov 2025 Joe Makowiec <makowiec@invalid.invalid> wrote:
    On 16 Nov 2025 in comp.os.linux.misc, Mechanicjay wrote:

    Some years back I moved all my PHP work out of Eclipse and into vim.
    With a a few plugins I get modern conveniences like a debug
    console, code style enforcement and syntax validation.

    If you don't mind my asking, which plugins? I see several web pages out >there with suggestions, but I'd be curious to see another set.

    Sure thing!

    Here's the plugin section from my .vimrc:

    Plugin 'VundleVim/Vundle.vim'
    Plugin 'itchyny/lightline.vim'
    Plugin 'tpope/vim-fugitive'
    Plugin 'joonty/vdebug'
    Bundle 'joonty/vim-phpqa.git'
    Bundle 'stephpy/vim-php-cs-fixer'

    Then at the bottom some calls to run the code style fixer and generate new ctags on each save:

    autocmd BufWritePost *.php silent! call PhpCsFixerFixFile()
    autocmd BufWritePost *.php silent! !eval 'ctags -f php.tags --languages=PHP -R' &

    This of course requires having some other software installed on the workstation, such as PHP Code Sniffer, PHP Mess Detector, ctags, the pecl Xdebug
    extention and...I think that's it. I set this up quite a few years ago and haven't had to mess with it much. It would be a bit of a process of discovery to get it all setup with the pieces in place again.



    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From The Natural Philosopher@3:633/10 to All on Tue Nov 18 12:09:42 2025
    On 18/11/2025 08:02, Ian wrote:
    On 2025-11-17, Lawrence D?Oliveiro <ldo@nz.invalid> wrote:
    On Mon, 17 Nov 2025 08:24:27 -0000 (UTC), Ian wrote:

    * Ubuntu isn't my preferred choice for servers, or anything really, but this
    particular application was developed for it, and I haven't got the time or >>> inclination to port it to a different distribution.

    What exactly was there about it that needed porting?

    I have no idea. It is available as an "apt-get install" on the latest Ubuntu, from the standard repos, documented, tested and "supported". It isn't available
    in the standard repos on other distributions, so that would need time and effort to locate a compatible 3rd party binary, or compile from source. Even if that "just works" it's alredy more effort and risk than installing Ubuntu and using the provided package, as this is on a dedicated VM anyway.

    Sometimes You just need things to work, and don't want another adventure...

    Hear hear!

    Dependency hell from non-distro packages...
    --
    Truth welcomes investigation because truth knows investigation will lead
    to converts. It is deception that uses all the other techniques.


    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Johnny Billquist@3:633/10 to All on Tue Nov 18 20:04:38 2025
    On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
    On 16 Nov 2025 20:19:12 GMT, Ted Nolan <tednolan> wrote:

    Lack of utf-8 would be an issue for some things, but mostly not.

    Without UTF-8, you could not have ??? or ??? or ?? or those curly quotes.
    Of course you could.
    They exist just fine in Latin-1 (hmm, maybe not the quotes...).

    Johnny


    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Eli the Bearded@3:633/10 to All on Tue Nov 18 20:29:55 2025
    In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:
    On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
    Without UTF-8, you could not have ??? or ??? or ?? or those curly quotes.
    Of course you could.
    They exist just fine in Latin-1 (hmm, maybe not the quotes...).

    The Latin-1 I know does not have a Euro symbol. It does have the generic currency placeholder at 0xA5: ?

    Elijah
    ------
    likes utf-8 better than iso-8859-$WHATEVER

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Carlos E.R.@3:633/10 to All on Wed Nov 19 02:37:46 2025
    On 2025-11-18 20:04, Johnny Billquist wrote:
    On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
    On 16 Nov 2025 20:19:12 GMT, Ted Nolan <tednolan> wrote:

    Lack of utf-8 would be an issue for some things, but mostly not.

    Without UTF-8, you could not have ??? or ??? or ?? or those curly
    quotes.
    Of course you could.
    They exist just fine in Latin-1 (hmm, maybe not the quotes...).

    But with the transmission you have to transmit first what charset you
    are going to use, and then you are limited by it, and the recipient must
    have the same map, and be able to use it. Perhaps he has to use his own
    map instead.

    --
    Cheers, Carlos.
    ES??, EU??;

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Richard Kettlewell@3:633/10 to All on Wed Nov 19 08:24:02 2025
    Eli the Bearded <*@eli.users.panix.com> writes:
    Johnny Billquist <bqt@softjar.se> wrote:
    On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
    Without UTF-8, you could not have ??? or ??? or ?? or those curly quotes. >> Of course you could.
    They exist just fine in Latin-1 (hmm, maybe not the quotes...).

    The Latin-1 I know does not have a Euro symbol. It does have the generic currency placeholder at 0xA5: ?

    Correct, latin-1 doesn?t have the euro symbol; latin-15 does. Neither
    have proper quotes. By now both are retrocomputing, really.

    --
    https://www.greenend.org.uk/rjk/

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Joe Makowiec@3:633/10 to All on Wed Nov 19 13:11:43 2025
    On 17 Nov 2025 in comp.os.linux.misc, Mechanicjay wrote:

    I set this up quite a few years ago and haven't had to mess with it
    much. It would be a bit of a process of discovery to get it all
    setup with the pieces in place again.

    Thanks. I know what you mean - spend hours or days getting something set
    up; it just runs; something else updates, which blows up the original
    thing; spend hours or days relearning the original thing... I tend to
    leave myself hints in config files, but that doesn't always help.

    --
    Joe Makowiec
    http://makowiec.org/
    Email: http://makowiec.org/contact/?Joe
    Usenet Improvement Project: http://twovoyagers.com/improve-usenet.org/

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Eric Pozharski@3:633/10 to All on Wed Nov 19 13:02:05 2025
    with <akjvulxcnk.ln2@Telcontar.valinor> Carlos E.R. wrote:
    On 2025-11-18 20:04, Johnny Billquist wrote:
    On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
    On 16 Nov 2025 20:19:12 GMT, Ted Nolan <tednolan> wrote:

    Lack of utf-8 would be an issue for some things, but mostly not.
    Without UTF-8, you could not have ??? or ??? or ?? or those curly
    quotes.
    Of course you could. They exist just fine in Latin-1 (hmm, maybe not
    the quotes...).
    But with the transmission you have to transmit first what charset you
    are going to use, and then you are limited by it, and the recipient
    must have the same map, and be able to use it. Perhaps he has to use
    his own map instead.

    If only there was some arrangement to make it work. And RFC2047 readily
    offers some. And that would be a nail for UTF-8 coffin.

    --
    Torvalds' goal for Linux is very simple: World Domination
    Stallman's goal for GNU is even simpler: Freedom

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Lawrence D?Oliveiro@3:633/10 to All on Wed Nov 19 20:12:03 2025
    On Wed, 19 Nov 2025 13:11:43 -0000 (UTC), Joe Makowiec wrote:

    On 17 Nov 2025 in comp.os.linux.misc, Mechanicjay wrote:

    I set this up quite a few years ago and haven't had to mess with it
    much. It would be a bit of a process of discovery to get it all
    setup with the pieces in place again.

    Thanks. I know what you mean - spend hours or days getting something set
    up; it just runs; something else updates, which blows up the original
    thing; spend hours or days relearning the original thing... I tend to
    leave myself hints in config files, but that doesn't always help.

    This is why I have taken to writing up notes about certain custom builds.

    --- PyGate Linux v1.5
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Stéphane CARPENTIER@3:633/10 to All on Fri Nov 21 19:55:07 2025
    Le 17-11-2025, Lawrence D?Oliveiro <ldo@nz.invalid> a crit:
    On 16 Nov 2025 23:18:50 GMT, Ted Nolan <tednolan> wrote:

    Bingo on the "smart" quotes!

    I also like using ?? and ?? as metasyntactic brackets, but I expect
    French people will interpret those as quotes ...

    Of course those are quotes.

    --
    Si vous avez du temps perdre :
    https://scarpet42.gitlab.io

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Stéphane CARPENTIER@3:633/10 to All on Fri Nov 21 19:58:12 2025
    Le 18-11-2025, Eli the Bearded <*@eli.users.panix.com> a crit:
    In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:
    On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
    Without UTF-8, you could not have ??? or ??? or ?? or those curly quotes. >> Of course you could.
    They exist just fine in Latin-1 (hmm, maybe not the quotes...).

    The Latin-1 I know does not have a Euro symbol. It does have the generic currency placeholder at 0xA5: ?

    They created the latin9 from the latin1 to add this ? symbol.

    --
    Si vous avez du temps perdre :
    https://scarpet42.gitlab.io

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Lawrence D?Oliveiro@3:633/10 to All on Fri Nov 21 20:27:28 2025
    On 21 Nov 2025 19:55:07 GMT, Stphane CARPENTIER wrote:

    Le 17-11-2025, Lawrence D?Oliveiro <ldo@nz.invalid> a crit:

    I also like using ?? and ?? as metasyntactic brackets, but I expect
    French people will interpret those as quotes ...

    Of course those are quotes.

    I want more paired bracketing symbols. ;)

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Niklas Karlsson@3:633/10 to All on Fri Nov 21 21:14:08 2025
    On 2025-11-21, Stphane CARPENTIER <sc@fiat-linux.fr> wrote:
    Le 18-11-2025, Eli the Bearded <*@eli.users.panix.com> a crit:
    In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:
    On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
    Without UTF-8, you could not have ??? or ??? or ?? or those curly quotes. >>> Of course you could.
    They exist just fine in Latin-1 (hmm, maybe not the quotes...).

    The Latin-1 I know does not have a Euro symbol. It does have the generic
    currency placeholder at 0xA5: ?

    They created the latin9 from the latin1 to add this ? symbol.

    I thought that was Latin-15.

    Niklas
    --
    The bloody handle on the back of an E450 isn't until you try to use it as
    such, then it becomes less of a handle and more bloody.
    -- Gary Barnes in asr

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Peter Flass@3:633/10 to All on Fri Nov 21 19:10:53 2025
    On 11/21/25 12:58, Stphane CARPENTIER wrote:
    Le 18-11-2025, Eli the Bearded <*@eli.users.panix.com> a crit:
    In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:
    On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
    Without UTF-8, you could not have ??? or ??? or ?? or those curly quotes. >>> Of course you could.
    They exist just fine in Latin-1 (hmm, maybe not the quotes...).

    The Latin-1 I know does not have a Euro symbol. It does have the generic
    currency placeholder at 0xA5: ?

    They created the latin9 from the latin1 to add this ? symbol.


    I thought it was Latin-15

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Eli the Bearded@3:633/10 to All on Sat Nov 22 03:20:31 2025
    In comp.os.linux.misc, Lawrence DOliveiro <ldo@nz.invalid> wrote:
    On 21 Nov 2025 19:55:07 GMT, Stphane CARPENTIER wrote:
    Le 17-11-2025, Lawrence D?Oliveiro <ldo@nz.invalid> a crit:
    I also like using ?? and ?? as metasyntactic brackets, but I expect
    French people will interpret those as quotes ...
    Of course those are quotes.
    I want more paired bracketing symbols. ;)

    https://qaz.wtf/qz/blosxom/2022/06/02/matchpairs

    TL;DR: 186 pairs in Unicode

    Elijah
    ------
    the vim "set matchedpairs" line is too long for Usenet

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Lawrence D?Oliveiro@3:633/10 to All on Sat Nov 22 05:57:28 2025
    On Sat, 22 Nov 2025 03:20:31 -0000 (UTC), Eli the Bearded wrote:

    In comp.os.linux.misc, Lawrence DOliveiro <ldo@nz.invalid> wrote:

    I want more paired bracketing symbols. ;)

    https://qaz.wtf/qz/blosxom/2022/06/02/matchpairs

    TL;DR: 186 pairs in Unicode

    Interesting. And more than I expected.

    Had trouble making out the difference between ?left-handed? and ?right-
    handed? versions of the ?interlaced pentagram? -- those are only obvious
    at larger sizes.

    Most of the rest (that my font can show) seem quite legible.

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Stéphane CARPENTIER@3:633/10 to All on Sat Nov 22 10:23:21 2025
    Le 22-11-2025, Peter Flass <Peter@Iron-Spring.com> a crit:
    On 11/21/25 12:58, Stphane CARPENTIER wrote:
    Le 18-11-2025, Eli the Bearded <*@eli.users.panix.com> a crit:
    In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:
    On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
    Without UTF-8, you could not have ??? or ??? or ?? or those curly quotes.
    Of course you could.
    They exist just fine in Latin-1 (hmm, maybe not the quotes...).

    The Latin-1 I know does not have a Euro symbol. It does have the generic >>> currency placeholder at 0xA5: ?

    They created the latin9 from the latin1 to add this ? symbol.


    I thought it was Latin-15

    No, it's iso-8859-15 but latin9.

    --
    Si vous avez du temps perdre :
    https://scarpet42.gitlab.io

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Johnny Billquist@3:633/10 to All on Sat Nov 22 17:55:14 2025
    On 2025-11-18 21:29, Eli the Bearded wrote:
    In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:
    On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
    Without UTF-8, you could not have ??? or ??? or ?? or those curly quotes. >> Of course you could.
    They exist just fine in Latin-1 (hmm, maybe not the quotes...).

    The Latin-1 I know does not have a Euro symbol. It does have the generic currency placeholder at 0xA5: ?

    Yeah. Sorry. That came in 8859-15.

    Elijah
    ------
    likes utf-8 better than iso-8859-$WHATEVER
    That don't even make sense. UTF-8 is just a way to encode large integers
    in a variable sequence of 8-bit bytes.
    Which of course makes it a mess to figure out how long a string actually is. But I guess what you actually mean is that you like Unicode better than 8859-whatever.
    I couldn't disagree more. Endless ways to represent the exact same
    character, and weird things like sometimes having a separate codepoint
    for units or prefixes, but sometimes using normal ASCII for them, and
    then you have sometimes different codepoints because of colors, but
    sometimes not.
    It's a trainwreck, but now we're stuck with it. :(

    Johnny


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Alexander Schreiber@3:633/10 to All on Sat Nov 22 19:20:28 2025
    Johnny Billquist <bqt@softjar.se> wrote:
    On 2025-11-18 21:29, Eli the Bearded wrote:
    In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:
    On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
    Without UTF-8, you could not have ??? or ??? or ?? or those curly quotes. >>> Of course you could.
    They exist just fine in Latin-1 (hmm, maybe not the quotes...).

    The Latin-1 I know does not have a Euro symbol. It does have the generic
    currency placeholder at 0xA5: ?

    Yeah. Sorry. That came in 8859-15.

    Elijah
    ------
    likes utf-8 better than iso-8859-$WHATEVER
    That don't even make sense. UTF-8 is just a way to encode large integers
    in a variable sequence of 8-bit bytes.
    Which of course makes it a mess to figure out how long a string actually is. But I guess what you actually mean is that you like Unicode better than 8859-whatever.
    I couldn't disagree more. Endless ways to represent the exact same character, and weird things like sometimes having a separate codepoint
    for units or prefixes, but sometimes using normal ASCII for them, and
    then you have sometimes different codepoints because of colors, but sometimes not.

    Well, a big part of the reason is that human writing systems across
    the globe are, in fact, quite an impressive mess from an engineering
    point of view, mostly not being properly designed and all that. ;-)

    It's a trainwreck, but now we're stuck with it. :(

    At least it sorta mostly kinda works for a somewhat wide range of
    languages and scripts and you can have different scripts (latin,
    cyrillic, arabic and others) in the same text. Which beats having to
    figure out which code page to use for which text by quite a margin.
    I _have_ been through the mess of "US ASCII works, good luck with
    anything beyond that" that was text processing on e.g. MS-DOS (and
    variants) and early Windows. And my native language (German) uses
    essentially US-ASCII plus only a small number of letters outside of
    that to begin with. Imagine if your native script has _no_ overlap
    with that.

    Kind regards,
    Alex.
    --
    "Opportunity is missed by most people because it is dressed in overalls and
    looks like work." -- Thomas A. Edison

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Carlos E.R.@3:633/10 to All on Sat Nov 22 20:25:17 2025
    On 2025-11-22 17:55, Johnny Billquist wrote:
    On 2025-11-18 21:29, Eli the Bearded wrote:
    In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:
    On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
    Without UTF-8, you could not have ??? or ??? or ?? or those curly
    quotes.
    Of course you could.
    They exist just fine in Latin-1 (hmm, maybe not the quotes...).

    The Latin-1 I know does not have a Euro symbol. It does have the generic
    currency placeholder at 0xA5: ?

    Yeah. Sorry. That came in 8859-15.

    Elijah
    ------
    likes utf-8 better than iso-8859-$WHATEVER
    That don't even make sense. UTF-8 is just a way to encode large integers
    in a variable sequence of 8-bit bytes.
    Which of course makes it a mess to figure out how long a string actually
    is.
    But I guess what you actually mean is that you like Unicode better than 8859-whatever.
    I couldn't disagree more. Endless ways to represent the exact same character, and weird things like sometimes having a separate codepoint
    for units or prefixes, but sometimes using normal ASCII for them, and
    then you have sometimes different codepoints because of colors, but sometimes not.
    It's a trainwreck, but now we're stuck with it. :(

    Encode large integers? No.


    <https://en.wikipedia.org/wiki/UTF-8>

    UTF-8

    UTF-8 is a character encoding standard used for electronic
    communication. Defined by the Unicode Standard, the name is derived from Unicode Transformation Format ? 8-bit.[1] As of July 2025, almost every webpage is transmitted as UTF-8.[2]

    UTF-8 supports all 1,112,064[3] valid Unicode code points using a variable-width encoding of one to four one-byte (8-bit) code units.

    Code points with lower numerical values, which tend to occur more
    frequently, are encoded using fewer bytes. It was designed for backward compatibility with ASCII: the first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single byte with
    the same binary value as ASCII, so that a UTF-8-encoded file using only
    those characters is identical to an ASCII file. Most software designed
    for any extended ASCII can read and write UTF-8, and this results in
    fewer internationalization issues than any alternative text encoding.[4][5]

    UTF-8 is dominant for all countries/languages on the internet, is used
    in most standards, often the only allowed encoding, and is supported by
    all modern operating systems and programming languages.


    --
    Cheers, Carlos.
    ES??, EU??;

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Lawrence D?Oliveiro@3:633/10 to All on Sat Nov 22 21:43:56 2025
    On Sat, 22 Nov 2025 19:20:28 +0100, Alexander Schreiber wrote:

    And my native language (German) uses essentially US-ASCII plus only
    a small number of letters outside of that to begin with. Imagine if
    your native script has _no_ overlap with that.

    In pre-Unicode days, the major Western European languages were the next- best-supported, in terms of computer encodings, after ASCII.

    You don?t have to go very far from there to find ones that were a little harder to deal with ...

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Carlos E.R.@3:633/10 to All on Sun Nov 23 00:23:57 2025
    On 2025-11-22 22:43, Lawrence D?Oliveiro wrote:
    On Sat, 22 Nov 2025 19:20:28 +0100, Alexander Schreiber wrote:

    And my native language (German) uses essentially US-ASCII plus only
    a small number of letters outside of that to begin with. Imagine if
    your native script has _no_ overlap with that.

    In pre-Unicode days, the major Western European languages were the next- best-supported, in terms of computer encodings, after ASCII.

    You don?t have to go very far from there to find ones that were a little harder to deal with ...

    It amazes me that computers can handle Chinese. Not only display, but keyboards.

    --
    Cheers, Carlos.
    ES??, EU??;

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Lawrence D?Oliveiro@3:633/10 to All on Sun Nov 23 02:56:17 2025
    On Sun, 23 Nov 2025 00:23:57 +0100, Carlos E.R. wrote:

    It amazes me that computers can handle Chinese. Not only display, but keyboards.

    It?s a trick. They enter syllables using the Roman alphabet. It then pops
    up candidate characters that they pick from.

    I once helped create a business card for our Mayor, for a trip to our
    Chinese sister city, on a Macintosh. A colleague from the Chinese language department had worked out the text; I operated the text input system to
    lay out the card. I recall the fonts were all bitmaps anyway, but the
    layout quality was deemed acceptable (what choice did they have?).

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Peter Flass@3:633/10 to All on Sat Nov 22 20:18:41 2025
    On 11/22/25 16:23, Carlos E.R. wrote:
    On 2025-11-22 22:43, Lawrence D?Oliveiro wrote:
    On Sat, 22 Nov 2025 19:20:28 +0100, Alexander Schreiber wrote:

    And my native language (German) uses essentially US-ASCII plus only
    a small number of letters outside of that to begin with. Imagine if
    your native script has _no_ overlap with that.

    In pre-Unicode days, the major Western European languages were the next-
    best-supported, in terms of computer encodings, after ASCII.

    You don?t have to go very far from there to find ones that were a little
    harder to deal with ...

    It amazes me that computers can handle Chinese. Not only display, but keyboards.


    I just read an article about the Chinese typewriter invented by the
    writer Lin Yutang. Apparently his original model has just been
    re-discovered.

    I was going to add a description, but I see it is described here:

    https://en.wikipedia.org/wiki/Chinese_typewriter#MingKwai_design

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From The Natural Philosopher@3:633/10 to All on Sun Nov 23 09:42:30 2025
    On 23/11/2025 02:17, rbowman wrote:
    On Sun, 23 Nov 2025 00:23:57 +0100, Carlos E.R. wrote:

    It amazes me that computers can handle Chinese. Not only display, but
    keyboards.

    https://www.youtube.com/watch?v=iWi-9LJ4dg4

    Japanese is as bad. There are over 2000 kanji characters you have to know
    to be reasonably literate. Both China and Japan have tried to simplify
    that character set for centuries and have gotten it down to four or five thousand though the exact count isn't known.

    I can't imagine...

    I saw a you tube video on that. Chinese is essentially a dogs breakfast
    second time around

    --
    Civilization exists by geological consent, subject to change without notice.
    ? Will Durant


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Anthk NM@3:633/10 to All on Sun Nov 23 12:48:58 2025
    On 2025-11-16, Ted Nolan <tednolan> <ted@loft.tnolan.com> wrote:
    In article <mnumuiF7n72U5@mid.individual.net>,
    rbowman <bowman@montana.com> wrote:
    On Sun, 16 Nov 2025 09:49:10 -0500, Chris Ahlstrom wrote:

    The Natural Philosopher wrote this post by blinking in Morse code:

    On 16/11/2025 05:11, Ted Nolan <tednolan> wrote:
    In article <10fasl6$3p4r1$3@dont-email.me>,
    Lawrence DOliveiro <ldo@nz.invalid> wrote:
    On 15 Nov 2025 18:48:51 GMT, Ted Nolan <tednolan> wrote:

    I find it takes a lot of munging to get vim to *really* work like >>>>>>> vi.

    To me, that sounds like someone saying ??it takes a lot of munging >>>>>> to get a Trabant to *really* work like a Morris Minor??. I
    can??t
    imagine myself wanting to use either.

    Well, squids & kids, but my fingers do vi automatically. Anything
    else not so much.

    I find that depressing.

    I used to have to write reams of code in 'vi'. Horrible

    I still do. (Though it is vim).

    Back to my original statement that most people who say they use vi are >>using vim and would be very unhappy with vi.


    I would not. Lack of utf-8 would be an issue for some things, but
    mostly not.

    With nvi (nvi2 under OpenBSD ports) I just set at ~/.exrc

    set showmode ruler
    set ts=2
    set ht=2

    And done. A status line, the mode line, sane tabs and Unicode.

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Carlos E.R.@3:633/10 to All on Sun Nov 23 14:59:38 2025
    On 2025-11-23 03:17, rbowman wrote:
    On Sun, 23 Nov 2025 00:23:57 +0100, Carlos E.R. wrote:

    It amazes me that computers can handle Chinese. Not only display, but
    keyboards.

    https://www.youtube.com/watch?v=iWi-9LJ4dg4

    Quite curious, thanks.


    Japanese is as bad. There are over 2000 kanji characters you have to know
    to be reasonably literate. Both China and Japan have tried to simplify
    that character set for centuries and have gotten it down to four or five thousand though the exact count isn't known.

    I can't imagine...


    --
    Cheers, Carlos.
    ES??, EU??;

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Lawrence D?Oliveiro@3:633/10 to All on Sun Nov 23 20:11:06 2025
    On 23 Nov 2025 17:51:18 GMT, Ted Nolan <tednolan> wrote:

    8 is the One True TS!

    Pretty useless setting, long abandoned.

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Bobbie Sellers@3:633/10 to All on Sun Nov 23 13:09:56 2025


    On 11/23/25 05:59, Carlos E.R. wrote:
    On 2025-11-23 03:17, rbowman wrote:
    On Sun, 23 Nov 2025 00:23:57 +0100, Carlos E.R. wrote:

    It amazes me that computers can handle Chinese. Not only display, but
    keyboards.

    https://www.youtube.com/watch?v=iWi-9LJ4dg4

    Quite curious, thanks.


    Japanese is as bad. There are over 2000 kanji characters you have to know
    to be reasonably literate. Both China and Japan have tried to simplify
    that character set for centuries and have gotten it down to four or five
    thousand though the exact count isn't known.

    I can't imagine...



    2000 kanji is not all.
    The syllabary of Japanese is 40 characters for Japanese and another 40 for foreign
    words. Serious students go to classes just like the Japanese do to
    learn this stuff then
    English classes as well for the Japanese to learn the technicalities of English. Back
    in the 19th Century some Japanese educators advocated moving completely to English but that would mean giving up on the language of their ancestors
    and that was
    a step too far.

    I got interested with the idea of learning enough to read manga but in my 70s
    was a bit too late for that endeavor for me.

    bliss

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Lawrence D?Oliveiro@3:633/10 to All on Sun Nov 23 22:57:08 2025
    On Sun, 23 Nov 2025 13:09:56 -0800, Bobbie Sellers wrote:

    Back in the 19th Century some Japanese educators advocated moving
    completely to English but that would mean giving up on the language
    of their ancestors and that was a step too far.

    Apparently Mao Zedong discussed with Josef Stalin the idea of
    abandoning traditional Chinese characters in favour of a Roman-based
    script. Stalin told him the Chinese writing system was beautiful, and
    should be kept.

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Eric Pozharski@3:633/10 to All on Tue Nov 25 10:26:31 2025
    with <10fvcgr$3d6mu$1@paganini.bofh.team> Waldek Hebisch wrote:
    In alt.folklore.computers Eric Pozharski <apple.universe@posteo.net>
    wrote:
    with <akjvulxcnk.ln2@Telcontar.valinor> Carlos E.R. wrote:
    On 2025-11-18 20:04, Johnny Billquist wrote:
    On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
    On 16 Nov 2025 20:19:12 GMT, Ted Nolan <tednolan> wrote:

    *SKIP* [ 5 lines 6 levels deep]
    But with the transmission you have to transmit first what charset
    you are going to use, and then you are limited by it, and the
    recipient must have the same map, and be able to use it. Perhaps he
    has to use his own map instead.
    If only there was some arrangement to make it work. And RFC2047
    readily offers some. And that would be a nail for UTF-8 coffin.
    Each ISO code page is (was???) supposed to have escapape sequence to
    switch to that code page. There is (was???) standard (ISO 2022 ???)
    that outlined how swiching was supposed to work.

    Granted, finding details of ISO through wiki isn't an easy task and
    details are rather irrelevant now anyway. That being said, scaling
    escape codes would be solution on the surface.

    IIUC Emacs Mule used this scheme (possibly modified. AFAIK they
    dumped it in favour of UTF-8.

    Imagine what a beutiful mess it would be. Instead we have UTF-8. It's
    a shame.

    --
    Torvalds' goal for Linux is very simple: World Domination
    Stallman's goal for GNU is even simpler: Freedom

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Lawrence D?Oliveiro@3:633/10 to All on Tue Nov 25 20:05:48 2025
    On Tue, 25 Nov 2025 10:26:31 +0000, Eric Pozharski wrote:

    Instead we have UTF-8. It's a shame.

    Could be worse. Could be UTF-16. (*Cough* Microsoft Windows *Cough*)

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From c186282@3:633/10 to All on Tue Nov 25 23:04:42 2025
    On 11/25/25 15:05, Lawrence D?Oliveiro wrote:
    On Tue, 25 Nov 2025 10:26:31 +0000, Eric Pozharski wrote:

    Instead we have UTF-8. It's a shame.

    Could be worse. Could be UTF-16. (*Cough* Microsoft Windows *Cough*)

    Um, kinda yea :-)

    I understand the reason for unicode, but
    that doesn't mean I like it. Always avoid
    whenever possible.


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Johnny Billquist@3:633/10 to All on Thu Nov 27 19:55:29 2025
    On 2025-11-22 19:20, Alexander Schreiber wrote:
    Johnny Billquist <bqt@softjar.se> wrote:
    On 2025-11-18 21:29, Eli the Bearded wrote:
    In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:
    On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
    Without UTF-8, you could not have ??? or ??? or ?? or those curly quotes.
    Of course you could.
    They exist just fine in Latin-1 (hmm, maybe not the quotes...).

    The Latin-1 I know does not have a Euro symbol. It does have the generic >>> currency placeholder at 0xA5: ?

    Yeah. Sorry. That came in 8859-15.

    Elijah
    ------
    likes utf-8 better than iso-8859-$WHATEVER
    That don't even make sense. UTF-8 is just a way to encode large integers
    in a variable sequence of 8-bit bytes.
    Which of course makes it a mess to figure out how long a string actually is. >> But I guess what you actually mean is that you like Unicode better than
    8859-whatever.
    I couldn't disagree more. Endless ways to represent the exact same
    character, and weird things like sometimes having a separate codepoint
    for units or prefixes, but sometimes using normal ASCII for them, and
    then you have sometimes different codepoints because of colors, but
    sometimes not.

    Well, a big part of the reason is that human writing systems across
    the globe are, in fact, quite an impressive mess from an engineering
    point of view, mostly not being properly designed and all that. ;-)

    I know. But the Unicode wreck can't be blamed on the human writing
    system "mess". It created one completely on its own.

    It's a trainwreck, but now we're stuck with it. :(

    At least it sorta mostly kinda works for a somewhat wide range of
    languages and scripts and you can have different scripts (latin,
    cyrillic, arabic and others) in the same text. Which beats having to
    figure out which code page to use for which text by quite a margin.
    I _have_ been through the mess of "US ASCII works, good luck with
    anything beyond that" that was text processing on e.g. MS-DOS (and
    variants) and early Windows. And my native language (German) uses
    essentially US-ASCII plus only a small number of letters outside of
    that to begin with. Imagine if your native script has _no_ overlap
    with that.
    Just because there was a problem it don't follow that Unicode was a good solution.

    Johnny


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Johnny Billquist@3:633/10 to All on Thu Nov 27 20:02:25 2025
    On 2025-11-22 20:25, Carlos E.R. wrote:
    On 2025-11-22 17:55, Johnny Billquist wrote:
    On 2025-11-18 21:29, Eli the Bearded wrote:
    In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:
    On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
    Without UTF-8, you could not have ??? or ??? or ?? or those curly
    quotes.
    Of course you could.
    They exist just fine in Latin-1 (hmm, maybe not the quotes...).

    The Latin-1 I know does not have a Euro symbol. It does have the generic >>> currency placeholder at 0xA5: ?

    Yeah. Sorry. That came in 8859-15.

    Elijah
    ------
    likes utf-8 better than iso-8859-$WHATEVER
    That don't even make sense. UTF-8 is just a way to encode large
    integers in a variable sequence of 8-bit bytes.
    Which of course makes it a mess to figure out how long a string
    actually is.
    But I guess what you actually mean is that you like Unicode better
    than 8859-whatever.
    I couldn't disagree more. Endless ways to represent the exact same
    character, and weird things like sometimes having a separate codepoint
    for units or prefixes, but sometimes using normal ASCII for them, and
    then you have sometimes different codepoints because of colors, but
    sometimes not.
    It's a trainwreck, but now we're stuck with it. :(

    Encode large integers? No.

    Ok. Call it "encode Unicode" then if that makes you happier. And Unicode codepoints can be described as integers (in fact, they are, which is why
    you see U+nnnn, where nnnn is a hex value, for codepoints), and have a
    range of roughly 2^22.

    UTF-8 isn't defining any characters, just defining a way to represent
    Unicode characters using a variable number of 8-bit bytes.

    Johnny


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Johnny Billquist@3:633/10 to All on Thu Nov 27 20:10:08 2025
    On 2025-11-25 21:05, Lawrence D?Oliveiro wrote:
    On Tue, 25 Nov 2025 10:26:31 +0000, Eric Pozharski wrote:

    Instead we have UTF-8. It's a shame.

    Could be worse. Could be UTF-16. (*Cough* Microsoft Windows *Cough*)

    In which way is it worse? It's the same character set, just encoded
    using one or two 16-bit values instead of 1-4 8-bit values. You still
    need to extract the actual Unicode value out of that encoding before
    showing anything and vice versa.

    Johnny


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Carlos E.R.@3:633/10 to All on Thu Nov 27 20:16:47 2025
    On 2025-11-27 20:02, Johnny Billquist wrote:
    On 2025-11-22 20:25, Carlos E.R. wrote:
    On 2025-11-22 17:55, Johnny Billquist wrote:
    On 2025-11-18 21:29, Eli the Bearded wrote:
    In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:
    On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
    Without UTF-8, you could not have ??? or ??? or ?? or those curly >>>>>> quotes.
    Of course you could.
    They exist just fine in Latin-1 (hmm, maybe not the quotes...).

    The Latin-1 I know does not have a Euro symbol. It does have the
    generic
    currency placeholder at 0xA5: ?

    Yeah. Sorry. That came in 8859-15.

    Elijah
    ------
    likes utf-8 better than iso-8859-$WHATEVER
    That don't even make sense. UTF-8 is just a way to encode large
    integers in a variable sequence of 8-bit bytes.
    Which of course makes it a mess to figure out how long a string
    actually is.
    But I guess what you actually mean is that you like Unicode better
    than 8859-whatever.
    I couldn't disagree more. Endless ways to represent the exact same
    character, and weird things like sometimes having a separate
    codepoint for units or prefixes, but sometimes using normal ASCII for
    them, and then you have sometimes different codepoints because of
    colors, but sometimes not.
    It's a trainwreck, but now we're stuck with it. :(

    Encode large integers? No.

    Ok. Call it "encode Unicode" then if that makes you happier. And Unicode codepoints can be described as integers (in fact, they are, which is why
    you see U+nnnn, where nnnn is a hex value, for codepoints), and have a
    range of roughly 2^22.

    UTF-8 isn't defining any characters, just defining a way to represent Unicode characters using a variable number of 8-bit bytes.

    And 8 bit ascii letters are also numbers representing characters. That's
    how computers work.

    --
    Cheers, Carlos.
    ES??, EU??;

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Scott Lurndal@3:633/10 to All on Thu Nov 27 20:19:28 2025
    Johnny Billquist <bqt@softjar.se> writes:
    On 2025-11-25 21:05, Lawrence D?Oliveiro wrote:
    On Tue, 25 Nov 2025 10:26:31 +0000, Eric Pozharski wrote:

    Instead we have UTF-8. It's a shame.

    Could be worse. Could be UTF-16. (*Cough* Microsoft Windows *Cough*)

    In which way is it worse? It's the same character set, just encoded
    using one or two 16-bit values instead of 1-4 8-bit values. You still
    need to extract the actual Unicode value out of that encoding before
    showing anything and vice versa.

    Because the endianness can vary, and thus UTF-16 requires a BOM.

    UTF-16 should have been a non-starter.

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Richard Kettlewell@3:633/10 to All on Thu Nov 27 20:44:03 2025
    Johnny Billquist <bqt@softjar.se> writes:
    Lawrence D?Oliveiro wrote:
    Eric Pozharski wrote:
    Instead we have UTF-8. It's a shame.
    Could be worse. Could be UTF-16. (*Cough* Microsoft Windows *Cough*)

    In which way is it worse? It's the same character set, just encoded
    using one or two 16-bit values instead of 1-4 8-bit values. You still
    need to extract the actual Unicode value out of that encoding before
    showing anything and vice versa.

    UTF-8 has the advantage that we can still use byte-oriented string representations, in programming languages, file formats, network
    protocols etc. The upgrade from ASCII is easy.

    UTF-32 loses that advantage, and that and its endianness-dependence make
    it a poor choice in most contexts, but in return you get the property
    that one code point is one code unit, useful when processing strings in Unicode-aware ways.

    UTF-16 has neither advantage. Upgrading from ASCII can?t be done
    compatibly with existing applications, and you don?t even get the
    fixed-width encoding of UTF-32 in return. It?s the worst of all possible worlds. About the only thing to recommend it is that it can be the most compact representation in certain contexts.

    --
    https://www.greenend.org.uk/rjk/

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Lawrence D?Oliveiro@3:633/10 to All on Thu Nov 27 21:18:55 2025
    On Thu, 27 Nov 2025 20:16:47 +0100, Carlos E.R. wrote:

    And 8 bit ascii letters are also numbers representing characters. That's
    how computers work.

    Everything that computers deal with is a number. Think of a computer
    program as a very large integer. There?s even a name for this: the Gdel number.

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Chris Ahlstrom@3:633/10 to All on Fri Nov 28 07:54:45 2025
    Richard Kettlewell wrote this post by blinking in Morse code:

    Johnny Billquist <bqt@softjar.se> writes:
    Lawrence D?Oliveiro wrote:
    Eric Pozharski wrote:
    Instead we have UTF-8. It's a shame.
    Could be worse. Could be UTF-16. (*Cough* Microsoft Windows *Cough*)

    In which way is it worse? It's the same character set, just encoded
    using one or two 16-bit values instead of 1-4 8-bit values. You still
    need to extract the actual Unicode value out of that encoding before
    showing anything and vice versa.

    UTF-8 has the advantage that we can still use byte-oriented string representations, in programming languages, file formats, network
    protocols etc. The upgrade from ASCII is easy.

    UTF-32 loses that advantage, and that and its endianness-dependence make
    it a poor choice in most contexts, but in return you get the property
    that one code point is one code unit, useful when processing strings in Unicode-aware ways.

    UTF-16 has neither advantage. Upgrading from ASCII can?t be done
    compatibly with existing applications, and you don?t even get the
    fixed-width encoding of UTF-32 in return. It?s the worst of all possible worlds. About the only thing to recommend it is that it can be the most compact representation in certain contexts.

    Certain contexts like ... Windows? :-)

    --
    Oh, give me a home,
    Where the buffalo roam,
    And I'll show you a house with a really messy kitchen.

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Alexander Schreiber@3:633/10 to All on Fri Nov 28 22:08:53 2025
    Johnny Billquist <bqt@softjar.se> wrote:
    On 2025-11-22 19:20, Alexander Schreiber wrote:
    Johnny Billquist <bqt@softjar.se> wrote:
    On 2025-11-18 21:29, Eli the Bearded wrote:
    In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:
    On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
    Without UTF-8, you could not have ??? or ??? or ?? or those curly quotes.
    Of course you could.
    They exist just fine in Latin-1 (hmm, maybe not the quotes...).

    The Latin-1 I know does not have a Euro symbol. It does have the generic >>>> currency placeholder at 0xA5: ?

    Yeah. Sorry. That came in 8859-15.

    Elijah
    ------
    likes utf-8 better than iso-8859-$WHATEVER
    That don't even make sense. UTF-8 is just a way to encode large integers >>> in a variable sequence of 8-bit bytes.
    Which of course makes it a mess to figure out how long a string actually is.
    But I guess what you actually mean is that you like Unicode better than
    8859-whatever.
    I couldn't disagree more. Endless ways to represent the exact same
    character, and weird things like sometimes having a separate codepoint
    for units or prefixes, but sometimes using normal ASCII for them, and
    then you have sometimes different codepoints because of colors, but
    sometimes not.

    Well, a big part of the reason is that human writing systems across
    the globe are, in fact, quite an impressive mess from an engineering
    point of view, mostly not being properly designed and all that. ;-)

    I know. But the Unicode wreck can't be blamed on the human writing
    system "mess". It created one completely on its own.

    It's a trainwreck, but now we're stuck with it. :(

    At least it sorta mostly kinda works for a somewhat wide range of
    languages and scripts and you can have different scripts (latin,
    cyrillic, arabic and others) in the same text. Which beats having to
    figure out which code page to use for which text by quite a margin.
    I _have_ been through the mess of "US ASCII works, good luck with
    anything beyond that" that was text processing on e.g. MS-DOS (and
    variants) and early Windows. And my native language (German) uses
    essentially US-ASCII plus only a small number of letters outside of
    that to begin with. Imagine if your native script has _no_ overlap
    with that.
    Just because there was a problem it don't follow that Unicode was a good solution.

    I'm not claiming it is a good solution, but it is the solution we ended up
    with that reasonably covers a lot of the problem space. Given that:
    - it covers a wide and very irregular problem space
    - it
    - it is, due to the problem scope, a design by committee
    ending with a solution being bit of a mess is hardly avoidable.

    It has the property of "working well enough most of the time", which is
    already a big impediment to anyone spending the time, money and brains
    in order to:
    - come up with a New And Improved Design That Surely Has No Warts
    - establish it as the new standard

    Honestly: not happening.

    Kind regards,
    Alex.
    --
    "Opportunity is missed by most people because it is dressed in overalls and
    looks like work." -- Thomas A. Edison

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Alexander Schreiber@3:633/10 to All on Fri Nov 28 22:10:35 2025
    Richard Kettlewell <invalid@invalid.invalid> wrote:
    Johnny Billquist <bqt@softjar.se> writes:
    Lawrence D?Oliveiro wrote:
    Eric Pozharski wrote:
    Instead we have UTF-8. It's a shame.
    Could be worse. Could be UTF-16. (*Cough* Microsoft Windows *Cough*)

    In which way is it worse? It's the same character set, just encoded
    using one or two 16-bit values instead of 1-4 8-bit values. You still
    need to extract the actual Unicode value out of that encoding before
    showing anything and vice versa.

    UTF-8 has the advantage that we can still use byte-oriented string representations, in programming languages, file formats, network
    protocols etc. The upgrade from ASCII is easy.

    UTF-32 loses that advantage, and that and its endianness-dependence make
    it a poor choice in most contexts, but in return you get the property
    that one code point is one code unit, useful when processing strings in Unicode-aware ways.

    UTF-16 has neither advantage. Upgrading from ASCII can?t be done
    compatibly with existing applications, and you don?t even get the
    fixed-width encoding of UTF-32 in return. It?s the worst of all possible worlds.

    Thus making it the _perfect_ choice of encoding for Microsoft Windows.

    SCNR,
    Alex.
    --
    "Opportunity is missed by most people because it is dressed in overalls and
    looks like work." -- Thomas A. Edison

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Goodwin@3:633/10 to All on Sat Nov 29 13:13:26 2025
    In article <slrn10ik3ub.2dppt.als@mordor.angband.thangorodrim.de>, als@usenet.thangorodrim.de says...

    Richard Kettlewell <invalid@invalid.invalid> wrote:
    Johnny Billquist <bqt@softjar.se> writes:
    Lawrence D?Oliveiro wrote:
    Eric Pozharski wrote:
    Instead we have UTF-8. It's a shame.
    Could be worse. Could be UTF-16. (*Cough* Microsoft Windows *Cough*)

    In which way is it worse? It's the same character set, just encoded
    using one or two 16-bit values instead of 1-4 8-bit values. You still
    need to extract the actual Unicode value out of that encoding before
    showing anything and vice versa.

    UTF-8 has the advantage that we can still use byte-oriented string representations, in programming languages, file formats, network
    protocols etc. The upgrade from ASCII is easy.

    UTF-32 loses that advantage, and that and its endianness-dependence make
    it a poor choice in most contexts, but in return you get the property
    that one code point is one code unit, useful when processing strings in Unicode-aware ways.

    UTF-16 has neither advantage. Upgrading from ASCII can?t be done
    compatibly with existing applications, and you don?t even get the fixed-width encoding of UTF-32 in return. It?s the worst of all possible worlds.

    Thus making it the _perfect_ choice of encoding for Microsoft Windows.

    To be fair, Windows NT was an early adopter of Unicode and at the time
    that meant 16 bits per character (UCS-2). Betas and SDKs had already
    been in developers hands for a year and the first release only a few
    months away when UTF-8 was first presented.

    Windows NT has supported UTF-8 for a couple of years now, but as its
    still a fairly recent thing there are still some rough edges and I
    expect not much uses it yet.

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Lawrence D?Oliveiro@3:633/10 to All on Sat Nov 29 02:57:15 2025
    On Sat, 29 Nov 2025 13:13:26 +1300, David Goodwin wrote:

    To be fair, Windows NT was an early adopter of Unicode and at the time
    that meant 16 bits per character (UCS-2).

    It did, indeed. Java was another one that bought into the whole UCS-2
    thing at just the wrong time.

    And then the Unicode folks went ?on second thoughts, let?s widen it to
    include past writing systems as well, not just present-day ones. And let?s
    add some other fun stuff while we?re at it?.

    Linux, on the other hand, simply ignored the whole issue. The kernel just
    says ?ASCII slash is the path component separator, ASCII NUL is the path terminator?, and leaves the rest up to userland.

    And UTF-8 fits rather neatly into that.

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From The Natural Philosopher@3:633/10 to All on Sat Nov 29 11:20:08 2025
    On 28/11/2025 21:10, Alexander Schreiber wrote:
    UTF-16 has neither advantage. Upgrading from ASCII can?t be done
    compatibly with existing applications, and you don?t even get the
    fixed-width encoding of UTF-32 in return. It?s the worst of all possible
    worlds.
    Thus making it the_perfect_ choice of encoding for Microsoft Windows.

    SCNR,
    Alex.

    Indeed.


    --
    "Nature does not give up the winter because people dislike the cold."

    ? Confucius


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Carlos E.R.@3:633/10 to All on Sat Nov 29 13:35:39 2025
    On 2025-11-29 01:13, David Goodwin wrote:
    In article <slrn10ik3ub.2dppt.als@mordor.angband.thangorodrim.de>, als@usenet.thangorodrim.de says...

    Richard Kettlewell <invalid@invalid.invalid> wrote:
    Johnny Billquist <bqt@softjar.se> writes:
    Lawrence D?Oliveiro wrote:
    Eric Pozharski wrote:
    Instead we have UTF-8. It's a shame.
    Could be worse. Could be UTF-16. (*Cough* Microsoft Windows *Cough*)

    In which way is it worse? It's the same character set, just encoded
    using one or two 16-bit values instead of 1-4 8-bit values. You still
    need to extract the actual Unicode value out of that encoding before
    showing anything and vice versa.

    UTF-8 has the advantage that we can still use byte-oriented string
    representations, in programming languages, file formats, network
    protocols etc. The upgrade from ASCII is easy.

    UTF-32 loses that advantage, and that and its endianness-dependence make >>> it a poor choice in most contexts, but in return you get the property
    that one code point is one code unit, useful when processing strings in
    Unicode-aware ways.

    UTF-16 has neither advantage. Upgrading from ASCII can?t be done
    compatibly with existing applications, and you don?t even get the
    fixed-width encoding of UTF-32 in return. It?s the worst of all possible >>> worlds.

    Thus making it the _perfect_ choice of encoding for Microsoft Windows.

    To be fair, Windows NT was an early adopter of Unicode and at the time
    that meant 16 bits per character (UCS-2). Betas and SDKs had already
    been in developers hands for a year and the first release only a few
    months away when UTF-8 was first presented.

    Windows NT has supported UTF-8 for a couple of years now, but as its
    still a fairly recent thing there are still some rough edges and I
    expect not much uses it yet.

    I'm curious. My Thunderbird in Linux uses UTF-8 by default. What would
    use TB in Windows 10 or 11?


    --
    Cheers, Carlos.
    ES??, EU??;

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Richard Kettlewell@3:633/10 to All on Sat Nov 29 13:45:47 2025
    "Carlos E.R." <robin_listas@es.invalid> writes:
    On 2025-11-29 01:13, David Goodwin wrote:
    To be fair, Windows NT was an early adopter of Unicode and at the
    time that meant 16 bits per character (UCS-2). Betas and SDKs had
    already been in developers hands for a year and the first release
    only a few months away when UTF-8 was first presented.

    Windows NT has supported UTF-8 for a couple of years now, but as its
    still a fairly recent thing there are still some rough edges and I
    expect not much uses it yet.

    I'm curious. My Thunderbird in Linux uses UTF-8 by default. What would
    use TB in Windows 10 or 11?

    Applications don?t need any OS support to use UTF-8. Thunderbird has
    used UTF-8 on Windows since at least 2007 (based on my mail archives),
    and probably since before it was spun out from the browser.

    Date: Thu, 03 May 2007 16:53:52 +0100
    User-Agent: Thunderbird 1.5.0.10 (Windows/20070221)
    Content-Type: text/plain; charset=utf-8

    Use of UTF-8 in encoding-sensitive Windows API calls is relatively
    recent, though AFAIK a more than a couple of years ago.

    As David says the reason for UTF-16 in Windows is that it adopted
    Unicode both before it outgrew 16-bit representations, and before UTF-8
    was standardized. In short, Microsoft were ahead of the curve. The slow adoption of UTF-8 is less justifiable.

    --
    https://www.greenend.org.uk/rjk/

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Stéphane CARPENTIER@3:633/10 to All on Sat Nov 29 15:06:07 2025
    Le 29-11-2025, Carlos E.R. <robin_listas@es.invalid> a crit:
    I'm curious. My Thunderbird in Linux uses UTF-8 by default. What would
    use TB in Windows 10 or 11?

    I don't know about Windows 11. But in France, Windows 10 is still using
    either CP1252 or utf-8 depending on the way you are creating documents.
    For example, if you are using, sorry I don't know the English name, the
    "block note", it's utf-8. But if you are creating a text document with
    Excel saving it as csv, it's cp1252. And there is no way around it:
    Microsoft decides for you what you should use. For FireFox, I'm not
    sure. I'd say it's cp1252 because it looks like that but I can't be 100%
    sure about that.

    For me cp1252 is just like ISO-8850-(1 or 15): a thing of the past which
    only place should be in a museum. Agreed, utf-8 has some issues, but
    everything else is just worse.

    --
    Si vous avez du temps perdre :
    https://scarpet42.gitlab.io

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Peter Flass@3:633/10 to All on Sat Nov 29 12:45:58 2025
    On 11/29/25 04:20, The Natural Philosopher wrote:
    On 28/11/2025 21:10, Alexander Schreiber wrote:
    UTF-16 has neither advantage. Upgrading from ASCII can?t be done
    compatibly with existing applications, and you don?t even get the
    fixed-width encoding of UTF-32 in return. It?s the worst of all possible >>> worlds.
    Thus making it the_perfect_ choice of encoding for Microsoft Windows.


    IBM mainframes and System i use UTF-16.



    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From The Natural Philosopher@3:633/10 to All on Sun Nov 30 10:37:07 2025
    On 29/11/2025 19:45, Peter Flass wrote:
    On 11/29/25 04:20, The Natural Philosopher wrote:
    On 28/11/2025 21:10, Alexander Schreiber wrote:
    UTF-16 has neither advantage. Upgrading from ASCII can?t be done
    compatibly with existing applications, and you don?t even get the
    fixed-width encoding of UTF-32 in return. It?s the worst of all
    possible
    worlds.
    Thus making it the_perfect_ choice of encoding for Microsoft Windows.


    IBM mainframes and System i use UTF-16.


    I would have thought the font would be a function of software, not tied
    to any hardware.

    There is probably an argument for either 32 or 64 bit characters these
    days. Same as integers grew from 16 to 64 bit...



    --
    "The great thing about Glasgow is that if there's a nuclear attack it'll
    look exactly the same afterwards."

    Billy Connolly


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Richard Kettlewell@3:633/10 to All on Sun Nov 30 11:06:54 2025
    The Natural Philosopher <tnp@invalid.invalid> writes:
    On 29/11/2025 19:45, Peter Flass wrote:
    On 11/29/25 04:20, The Natural Philosopher wrote:
    On 28/11/2025 21:10, Alexander Schreiber wrote:
    UTF-16 has neither advantage. Upgrading from ASCII can?t be done
    compatibly with existing applications, and you don?t even get the
    fixed-width encoding of UTF-32 in return. It?s the worst of all
    possible worlds.
    Thus making it the_perfect_ choice of encoding for Microsoft Windows.

    IBM mainframes and System i use UTF-16.

    I would have thought the font would be a function of software, not
    tied to any hardware.

    Yes. There?s a lot of confusion in this thread. OS APIs may work in one encoding, a language runtime might prefer another, and an application
    may represent its strings in yet another (or more than one, if it?s
    dealing with web pages, emails, etc). Hardware makes very little
    difference beyond the extent to which hardware is bound to particular
    operating systems (although trying to do UTF-32 on a Z80 would be rather inconvenient).

    There is probably an argument for either 32 or 64 bit characters these
    days. Same as integers grew from 16 to 64 bit...

    C has had a wide character type (wchar_t) available for years. It?s
    basically unusuable, almost everyone uses char and UTF-8, because all
    the existing software works that way and it?s hugely cheaper to adopt
    UTF-8 and smooth off a few sharp edges than to rewrite everything to use
    a new character type. There are absolutely use cases for switching
    temporarily to 32-bit characters but for the most part, it?s just not
    worth it.

    Go is similar. Strings are made of bytes, normally encoded as UTF-8, but
    it?s straightforward to get the 32-bit code points out when you need
    them. Rust has a 32-bit ?char? but the String type is UTF-8 (with a
    separate type for non-UTF-8 strings).

    Python takes a different approach. Its internal string representation dynamically picks 8, 16 or 32 bits depending on the string contents,
    with UTF-8 created on demand and cached.

    --
    https://www.greenend.org.uk/rjk/

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Nuno Silva@3:633/10 to All on Sun Nov 30 12:29:57 2025
    On 2025-11-30, The Natural Philosopher wrote:

    On 29/11/2025 19:45, Peter Flass wrote:
    On 11/29/25 04:20, The Natural Philosopher wrote:
    On 28/11/2025 21:10, Alexander Schreiber wrote:
    UTF-16 has neither advantage. Upgrading from ASCII can?t be done
    compatibly with existing applications, and you don?t even get the
    fixed-width encoding of UTF-32 in return. It?s the worst of all
    possible
    worlds.
    Thus making it the_perfect_ choice of encoding for Microsoft Windows. >>>>

    IBM mainframes and System i use UTF-16.


    I would have thought the font would be a function of software, not
    tied to any hardware.

    There is probably an argument for either 32 or 64 bit characters these
    days. Same as integers grew from 16 to 64 bit...

    This is not about fonts, this is about encodings.

    --
    Nuno Silva

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From The Natural Philosopher@3:633/10 to All on Sun Nov 30 13:23:02 2025
    On 30/11/2025 12:29, Nuno Silva wrote:
    On 2025-11-30, The Natural Philosopher wrote:

    On 29/11/2025 19:45, Peter Flass wrote:
    On 11/29/25 04:20, The Natural Philosopher wrote:
    On 28/11/2025 21:10, Alexander Schreiber wrote:
    UTF-16 has neither advantage. Upgrading from ASCII can?t be done
    compatibly with existing applications, and you don?t even get the
    fixed-width encoding of UTF-32 in return. It?s the worst of all
    possible
    worlds.
    Thus making it the_perfect_ choice of encoding for Microsoft Windows. >>>>>

    IBM mainframes and System i use UTF-16.


    I would have thought the font would be a function of software, not
    tied to any hardware.

    There is probably an argument for either 32 or 64 bit characters these
    days. Same as integers grew from 16 to 64 bit...

    This is not about fonts, this is about encodings.

    Well fonts that contain UTF-8 etc....

    ?at is ?e pointe...

    --
    "An intellectual is a person knowledgeable in one field who speaks out
    only in others...?

    Tom Wolfe


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Lawrence D?Oliveiro@3:633/10 to All on Sun Nov 30 16:20:11 2025
    On Sun, 30 Nov 2025 11:06:54 +0000, Richard Kettlewell wrote:

    Python takes a different approach. Its internal string
    representation dynamically picks 8, 16 or 32 bits depending on the
    string contents, with UTF-8 created on demand and cached.

    Its ?str? type (immutable) is nominally UTF-32.

    It has separate ?bytes? (immutable) and ?bytearray? (mutable) types.
    You have to explicitly convert between these and ?str?.

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Rich@3:633/10 to All on Sun Nov 30 16:56:58 2025
    In comp.os.linux.misc The Natural Philosopher <tnp@invalid.invalid> wrote:
    On 30/11/2025 12:29, Nuno Silva wrote:
    On 2025-11-30, The Natural Philosopher wrote:

    On 29/11/2025 19:45, Peter Flass wrote:
    On 11/29/25 04:20, The Natural Philosopher wrote:
    On 28/11/2025 21:10, Alexander Schreiber wrote:
    UTF-16 has neither advantage. Upgrading from ASCII can?t be done >>>>>>> compatibly with existing applications, and you don?t even get the >>>>>>> fixed-width encoding of UTF-32 in return. It?s the worst of all
    possible
    worlds.
    Thus making it the_perfect_ choice of encoding for Microsoft Windows. >>>>>>

    IBM mainframes and System i use UTF-16.


    I would have thought the font would be a function of software, not
    tied to any hardware.

    There is probably an argument for either 32 or 64 bit characters these
    days. Same as integers grew from 16 to 64 bit...

    This is not about fonts, this is about encodings.

    Well fonts that contain UTF-8 etc....

    Technically, font's don't contain UTF-8 (nor any other encoding).

    Font's contain numbered glyphs (character drawings). Usually, modern
    font's number the glyphs using the code point numbers assigned to
    characters by Unicode. But your mileage will vary with older font
    files as to what numbering scheme they used (I suppose if one looked
    long enough one could find a font file somewhere that used EBDIC
    character numbering assignments).

    UTF-8 is a code point (character number) encoding. A way to store the "numbers" that reference which font glyph to display on disk/in
    memory/on the wire/etc.


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Goodwin@3:633/10 to All on Mon Dec 1 11:09:40 2025
    In article <mp3gv8Ff0euU1@mid.individual.net>, bowman@montana.com
    says...
    of WCHAR and TCHAR based on a compiler flag, and
    so forth it's a lot of fun.

    I had a play around with this last year, and if you use TCHAR and its
    related functions consistently it turns out to be relatively easy to
    write code that can be built as either Unicode or "ANSI" (really some
    local codepage, potentially with a multibyte encoding) by just changing
    the compiler flag. But its easy to slip up if you're not careful, so you really need to be building both variants regularly and I suspect most
    didn't bother.

    The only real complication I ran into is that some newer Windows APIs
    are available only in their Unicode form so in a non-Unicode build I'd
    have to convert strings to/from Unicode in a few places. Will be
    interesting to see if that eventually changes now that UTF-8 support is
    a thing.

    Today UTF-8 is implemented as just another "non-unicode" multibyte
    character set, so if you wanted to switch your app to UTF-8 and were
    using TCHARs everywhere I think you'd just turn off the unicode compiler
    flag and tell Windows (via the app manifest) that you want to use the
    UTF-8 codepage. I've not tried this myself yet, but its on my to-do
    list.

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Goodwin@3:633/10 to All on Mon Dec 1 11:18:02 2025
    In article <10ght0p$gnag$2@dont-email.me>, rich@example.invalid says...

    UTF-8 is a code point (character number) encoding. A way to store the "numbers" that reference which font glyph to display on disk/in
    memory/on the wire/etc.


    And to further complicate matters, what looks like a single character (grapheme) to the user may be encoded as multiple code points combined together. So even if you're using UTF-32, characters are still variable length.

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Carlos E.R.@3:633/10 to All on Mon Dec 1 01:55:05 2025
    On 2025-11-30 23:18, David Goodwin wrote:
    In article <10ght0p$gnag$2@dont-email.me>, rich@example.invalid says...

    UTF-8 is a code point (character number) encoding. A way to store the
    "numbers" that reference which font glyph to display on disk/in
    memory/on the wire/etc.


    And to further complicate matters, what looks like a single character (grapheme) to the user may be encoded as multiple code points combined together. So even if you're using UTF-32, characters are still variable length.

    The flags in my signature are two chars each.

    The Spanish flag emoji is represented by the Unicode sequence U+1F1EA U+1F1F8, which combines the Regional Indicator Symbol Letters 'E' and
    'S'. In UTF-8, this corresponds to the hexadecimal byte sequence F0 9F
    87 8A F0 9F 87 88 or can be written as the HTML entity &#x1f1ea;&#x1f1f8;

    --
    Cheers, Carlos.
    ES??, EU??;

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Richard Kettlewell@3:633/10 to All on Mon Dec 1 08:54:34 2025
    Lawrence D?Oliveiro <ldo@nz.invalid> writes:
    Richard Kettlewell wrote:
    Python takes a different approach. Its internal string
    representation dynamically picks 8, 16 or 32 bits depending on the
    string contents, with UTF-8 created on demand and cached.

    Its ?str? type (immutable) is nominally UTF-32.

    No. RTFM. At the Python level, str is a sequence of values that
    represent Unicode code points. There is no statement that they are
    UTF-32. For all the Python programmer knows it could be packed 21-bit or
    3-byte fields, among other possibilities; they would not be able to tell
    the difference from Python.

    The internal representation in the C implementation of Python is as
    stated above. (In principle other implementations could make other
    decisions.)

    --
    https://www.greenend.org.uk/rjk/

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Marc Haber@3:633/10 to All on Sun Nov 30 13:29:13 2025
    Peter Flass <Peter@Iron-Spring.com> wrote:
    On 11/29/25 04:20, The Natural Philosopher wrote:
    On 28/11/2025 21:10, Alexander Schreiber wrote:
    UTF-16 has neither advantage. Upgrading from ASCII can?t be done
    compatibly with existing applications, and you don?t even get the
    fixed-width encoding of UTF-32 in return. It?s the worst of all possible >>>> worlds.
    Thus making it the_perfect_ choice of encoding for Microsoft Windows.


    IBM mainframes and System i use UTF-16.

    Doesn't that depend on the OS that is being used there?

    Gre
    Marc
    -- ---------------------------------------------------------------------------- Marc Haber | " Questions are the | Mailadresse im Header Rhein-Neckar, DE | Beginning of Wisdom " |
    Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fon: *49 6224 1600402

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Johnny Billquist@3:633/10 to All on Wed Dec 3 13:37:13 2025
    On 2025-11-28 22:08, Alexander Schreiber wrote:
    Johnny Billquist <bqt@softjar.se> wrote:
    Just because there was a problem it don't follow that Unicode was a good
    solution.

    I'm not claiming it is a good solution, but it is the solution we ended up with that reasonably covers a lot of the problem space. Given that:
    - it covers a wide and very irregular problem space
    - it
    - it is, due to the problem scope, a design by committee
    ending with a solution being bit of a mess is hardly avoidable.

    It has the property of "working well enough most of the time", which is already a big impediment to anyone spending the time, money and brains
    in order to:
    - come up with a New And Improved Design That Surely Has No Warts
    - establish it as the new standard

    Honestly: not happening.

    I know that Unicode is here to stay. Said as much before. But it has introduced a whole range of problems that people tend to pretend don't
    exist. The most immediate one coming to my mind are all kind of scammers creating fake domains to phish stuff. Using known, trusted company
    names, but letters replaced by things that look visually equivalent, but actually are other characters, and then through those domains fool
    people to give information, such as passwords, account numbers, money,
    and god knows what else.

    A big part of the problem is that Unicode don't even seem to have known
    what problem is was supposed to solve. Was it about representing
    different characters that have different meanings? Was it about
    representing same characters but with different visual effects? Was it supposed to be some kind of generic system to modify characters through
    some clever system design?
    As it is, it's sortof all of these, but none of them properly.

    And it makes it a hellhole to deal with.

    But yes. Now it exists. We are not going to replace it.

    Johnny


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Johnny Billquist@3:633/10 to All on Wed Dec 3 13:38:34 2025
    On 2025-11-27 20:16, Carlos E.R. wrote:
    On 2025-11-27 20:02, Johnny Billquist wrote:
    On 2025-11-22 20:25, Carlos E.R. wrote:
    On 2025-11-22 17:55, Johnny Billquist wrote:
    On 2025-11-18 21:29, Eli the Bearded wrote:
    In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:
    On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
    Without UTF-8, you could not have ??? or ??? or ?? or those
    curly quotes.
    Of course you could.
    They exist just fine in Latin-1 (hmm, maybe not the quotes...).

    The Latin-1 I know does not have a Euro symbol. It does have the
    generic
    currency placeholder at 0xA5: ?

    Yeah. Sorry. That came in 8859-15.

    Elijah
    ------
    likes utf-8 better than iso-8859-$WHATEVER
    That don't even make sense. UTF-8 is just a way to encode large
    integers in a variable sequence of 8-bit bytes.
    Which of course makes it a mess to figure out how long a string
    actually is.
    But I guess what you actually mean is that you like Unicode better
    than 8859-whatever.
    I couldn't disagree more. Endless ways to represent the exact same
    character, and weird things like sometimes having a separate
    codepoint for units or prefixes, but sometimes using normal ASCII
    for them, and then you have sometimes different codepoints because
    of colors, but sometimes not.
    It's a trainwreck, but now we're stuck with it. :(

    Encode large integers? No.

    Ok. Call it "encode Unicode" then if that makes you happier. And
    Unicode codepoints can be described as integers (in fact, they are,
    which is why you see U+nnnn, where nnnn is a hex value, for
    codepoints), and have a range of roughly 2^22.

    UTF-8 isn't defining any characters, just defining a way to represent
    Unicode characters using a variable number of 8-bit bytes.

    And 8 bit ascii letters are also numbers representing characters. That's
    how computers work.

    Right. So both ASCII and Unicode use numbers to represent characters.
    Note that UTF-8 didn't get mentioned in that sentence.

    Johnny


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Johnny Billquist@3:633/10 to All on Wed Dec 3 13:44:02 2025
    On 2025-11-27 21:19, Scott Lurndal wrote:
    Johnny Billquist <bqt@softjar.se> writes:
    On 2025-11-25 21:05, Lawrence D?Oliveiro wrote:
    On Tue, 25 Nov 2025 10:26:31 +0000, Eric Pozharski wrote:

    Instead we have UTF-8. It's a shame.

    Could be worse. Could be UTF-16. (*Cough* Microsoft Windows *Cough*)

    In which way is it worse? It's the same character set, just encoded
    using one or two 16-bit values instead of 1-4 8-bit values. You still
    need to extract the actual Unicode value out of that encoding before
    showing anything and vice versa.

    Because the endianness can vary, and thus UTF-16 requires a BOM.

    UTF-16 should have been a non-starter.

    Well, then use UTF-16BE or UTF-16LE, and a BOM is not only not required,
    but is actually forbidden.

    Johnny


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Carlos E.R.@3:633/10 to All on Wed Dec 3 13:56:25 2025
    On 2025-12-03 13:37, Johnny Billquist wrote:
    On 2025-11-28 22:08, Alexander Schreiber wrote:
    Johnny Billquist <bqt@softjar.se> wrote:
    Just because there was a problem it don't follow that Unicode was a good >>> solution.

    I'm not claiming it is a good solution, but it is the solution we
    ended up
    with that reasonably covers a lot of the problem space. Given that:
    - it covers a wide and very irregular problem space
    - it
    - it is, due to the problem scope, a design by committee
    ending with a solution being bit of a mess is hardly avoidable.

    It has the property of "working well enough most of the time", which is
    already a big impediment to anyone spending the time, money and brains
    in order to:
    - come up with a New And Improved Design That Surely Has No Warts
    - establish it as the new standard

    Honestly: not happening.

    I know that Unicode is here to stay. Said as much before. But it has introduced a whole range of problems that people tend to pretend don't exist. The most immediate one coming to my mind are all kind of scammers creating fake domains to phish stuff. Using known, trusted company
    names, but letters replaced by things that look visually equivalent, but actually are other characters, and then through those domains fool
    people to give information, such as passwords, account numbers, money,
    and god knows what else.

    A big part of the problem is that Unicode don't even seem to have known
    what problem is was supposed to solve.

    No? The problem is that ASCII only represent the USA view of the alphabet.

    --
    Cheers, Carlos.
    ES??, EU??;

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From The Natural Philosopher@3:633/10 to All on Wed Dec 3 13:58:54 2025
    On 03/12/2025 12:56, Carlos E.R. wrote:
    e.

    No? The problem is that ASCII only represent the USA view of the alphabet.

    The problem is that ASCII only represent the USA view of *ONE* alphabet.
    ?at is ?e problem...

    And, worse, many writing methods do not use alphabets...

    --
    It is the folly of too many to mistake the echo of a London coffee-house
    for the voice of the kingdom.

    Jonathan Swift



    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Peter Flass@3:633/10 to All on Wed Dec 3 07:39:24 2025
    On 12/3/25 05:37, Johnny Billquist wrote:
    On 2025-11-28 22:08, Alexander Schreiber wrote:
    Johnny Billquist <bqt@softjar.se> wrote:
    Just because there was a problem it don't follow that Unicode was a good >>> solution.

    I'm not claiming it is a good solution, but it is the solution we
    ended up
    with that reasonably covers a lot of the problem space. Given that:
    - it covers a wide and very irregular problem space
    - it
    - it is, due to the problem scope, a design by committee
    ending with a solution being bit of a mess is hardly avoidable.

    It has the property of "working well enough most of the time", which is
    already a big impediment to anyone spending the time, money and brains
    in order to:
    - come up with a New And Improved Design That Surely Has No Warts
    - establish it as the new standard

    Honestly: not happening.

    I know that Unicode is here to stay. Said as much before. But it has introduced a whole range of problems that people tend to pretend don't exist. The most immediate one coming to my mind are all kind of scammers creating fake domains to phish stuff. Using known, trusted company
    names, but letters replaced by things that look visually equivalent, but actually are other characters, and then through those domains fool
    people to give information, such as passwords, account numbers, money,
    and god knows what else.

    I discovered this when I tried to set up spam filters and couldn't
    figure out why they weren't working.


    A big part of the problem is that Unicode don't even seem to have known
    what problem is was supposed to solve. Was it about representing
    different characters that have different meanings? Was it about
    representing same characters but with different visual effects? Was it supposed to be some kind of generic system to modify characters through
    some clever system design?
    As it is, it's sortof all of these, but none of them properly.

    It's supposed to be about the meanings of the characters. Capital 'A' in
    any font is the same Unicode character, but two characters that look
    identical but have different meanings are two.

    The biggest problem I have with any Unicode representation except (I
    think) UTF-32 is that a program has no way of knowing how long a string
    is without encoding/decoding it. Given a string of characters in some codepage, how many bytes does it occupy when converted to UTF-8? Given a
    UTF-8 character string, how many character positions does it occupy,
    say, for example, when displayed on a screen?


    And it makes it a hellhole to deal with.





    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Diego Garcia@3:633/10 to All on Wed Dec 3 14:40:06 2025
    On Wed, 3 Dec 2025 13:37:13 +0100, Johnny Billquist wrote:


    I know that Unicode is here to stay. Said as much before. But it has introduced a whole range of problems that people tend to pretend don't exist. The most immediate one coming to my mind are all kind of scammers creating fake domains to phish stuff. Using known, trusted company
    names, but letters replaced by things that look visually equivalent, but actually are other characters, and then through those domains fool
    people to give information, such as passwords, account numbers, money,
    and god knows what else.


    That's not actually a problem. Punycode has been developed to deal
    with it.

    All software should now be offering a dual display of all FQDNs as
    both Unicode and Punycode.


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Carlos E.R.@3:633/10 to All on Wed Dec 3 15:40:17 2025
    On 2025-12-03 14:58, The Natural Philosopher wrote:
    On 03/12/2025 12:56, Carlos E.R. wrote:
    e.

    No? The problem is that ASCII only represent the USA view of the
    alphabet.

    The problem is that ASCII only represent the USA view of *ONE* alphabet.
    ?at is ?e problem...

    And, worse, many writing methods do not use alphabets...

    Right.

    --
    Cheers, Carlos.
    ES??, EU??;

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Eli the Bearded@3:633/10 to All on Thu Dec 4 07:00:08 2025
    In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:
    I know that Unicode is here to stay. Said as much before. But it has introduced a whole range of problems that people tend to pretend don't exist. The most immediate one coming to my mind are all kind of scammers creating fake domains to phish stuff. Using known, trusted company
    names, but letters replaced by things that look visually equivalent, but actually are other characters, and then through those domains fool
    people to give information, such as passwords, account numbers, money,
    and god knows what else.

    As opposed to scammers posting as J0HNNY BILLQUIST, or Johnny Bi11quist,
    or JOHNNY BILLQUlST in ordinary ASCII. More alphabets compound the
    problem, sure, but it was always there.

    A big part of the problem is that Unicode don't even seem to have known
    what problem is was supposed to solve. Was it about representing
    different characters that have different meanings? Was it about
    representing same characters but with different visual effects? Was it supposed to be some kind of generic system to modify characters through
    some clever system design?

    It's pretty much never about "visual effects" although there are semantic differences to some visually similar characters. Math is a big offender
    in wanting ? meaning something different than Z or ?. But you could argue
    that Japanese style "fullwidth" ? is a visual effect.

    I would say the problem Unicode is trying to solve, albeit with some inconsistency, is the communication of all written languages in a
    standardized system of encoding. There are huge problems in that many
    written languages have implicit presentation rules based on context. The fullwidth Roman alphaphet, for example, is there because English letters
    in Japanese text are supposed to be the same size to fit the grid of the surrounding material.

    At different stages Unicode has solved this problem in different ways.
    More recently there has been a trend towards encoding things with
    combining characters (backspace overstrike style in the old manual
    typewriter days) and with ligatures of a sort. Flags being represented
    as a pair of "regional indicator" letters, where the letters are the
    same country codes used in DNS, is an example of that.

    Elijah
    ------
    "Weird AI != Weird Al" being a confusable forming some recent jokes

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Eli the Bearded@3:633/10 to All on Thu Dec 4 07:15:57 2025
    In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:
    On 2025-11-27 20:16, Carlos E.R. wrote:
    And 8 bit ascii letters are also numbers representing characters. That's
    how computers work.

    Not all computers are 8-bit.

    Right. So both ASCII and Unicode use numbers to represent characters.
    Note that UTF-8 didn't get mentioned in that sentence.

    What is your point? Touch-tone phones and rotary (pulse) dialing use
    electrical signals to represent numbers. Most-significan-bit computers
    and least-significan-bit wire communication of ASCII or EBCDIC or UTF-16
    all encode the letter capital A differently. (Are there parity bits?
    Stop bits? More variables!)

    You'll want to know both which characters are being represented and
    which encoding has been used to make sense of a message. (And figuring
    out cryptoanalist style is fine, but you are still figuring out the set
    and the encoding to do so.)

    Unicode is a set of numbered characters. UTF-8 or UTF-16 or ... is an
    encoding for those numbers.

    Elijah
    ------
    .- ... -.-. .. .. ...-.-

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Stphane CARPENTIER@3:633/10 to All on Fri Dec 5 20:52:52 2025
    Le 04-12-2025, Eli the Bearded <*@eli.users.panix.com> a crit:
    In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:
    I know that Unicode is here to stay. Said as much before. But it has
    introduced a whole range of problems that people tend to pretend don't
    exist. The most immediate one coming to my mind are all kind of scammers
    creating fake domains to phish stuff. Using known, trusted company
    names, but letters replaced by things that look visually equivalent, but
    actually are other characters, and then through those domains fool
    people to give information, such as passwords, account numbers, money,
    and god knows what else.

    As opposed to scammers posting as J0HNNY BILLQUIST, or Johnny Bi11quist,
    or JOHNNY BILLQUlST in ordinary ASCII. More alphabets compound the
    problem, sure, but it was always there.

    Agreed. I see only one issue clearly limited to UTF-8. In most of the actual writing systems the characters are displayed from left to right, others
    from right to left and, to my knowledge only old scripts, in
    boustrophedon. And UTF-8 takes care of it, but not every tool takes care
    of it the same way. The issue being writing code in English and comments
    in Arabic on the same line. Poorly done, it just doesn't compile and
    it's not an issue. But if an attacker want to use it, your text editor
    may make you believe the code is commented when the compiler doesn't
    know it's compiled. And some mischievous code can be executed when you
    believe it's commented.

    A big part of the problem is that Unicode don't even seem to have known
    what problem is was supposed to solve. Was it about representing
    different characters that have different meanings? Was it about
    representing same characters but with different visual effects? Was it
    supposed to be some kind of generic system to modify characters through
    some clever system design?

    It's pretty much never about "visual effects" although there are semantic differences to some visually similar characters. Math is a big offender
    in wanting ? meaning something different than Z or ?. But you could argue that Japanese style "fullwidth" ? is a visual effect.

    Of course, the rendering isn't considered by the encoding. It's the
    purpose of the font. I choose fonts which doesn't make me think about
    the character written. The 0 and O doesn't have to be similar. Like 1
    and l and I can be easily differentiated. If it's not the case on your
    computer and if that matters, change the font, not the encoding.

    I would say the problem Unicode is trying to solve, albeit with some inconsistency, is the communication of all written languages in a standardized system of encoding.

    Yes. And it wasn't a small thing to solve considering the way too limited
    ASCII was everywhere.

    --
    Si vous avez du temps perdre :
    https://scarpet42.gitlab.io

    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Peter Flass@3:633/10 to All on Fri Dec 5 15:57:34 2025
    On 12/5/25 13:52, Stphane CARPENTIER wrote:
    [snip]

    Agreed. I see only one issue clearly limited to UTF-8. In most of the actual writing systems the characters are displayed from left to right, others
    from right to left and, to my knowledge only old scripts, in
    boustrophedon.

    I wonder, is there any way to do this now without a lot of work? Are the right-to-left charachers different from the left-to-right?

    Of course, the rendering isn't considered by the encoding. It's the
    purpose of the font. I choose fonts which doesn't make me think about
    the character written. The 0 and O doesn't have to be similar. Like 1
    and l and I can be easily differentiated. If it's not the case on your computer and if that matters, change the font, not the encoding.


    I usually spend a lot of time settling on fonts for an editor. Right now
    I'm using "IBM Plex Mono", but I've tried a bunch.


    --- PyGate Linux v1.5.1
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)