Forum: d0p3 BBS

Re: Recent history of vi

From Nuno Silva@3:633/10 to All on Sun Nov 16 00:43:50 2025

On 2025-11-15, rbowman wrote:

On Sat, 15 Nov 2025 09:59:39 +0000, Nuno Silva wrote:

On 2025-11-15, rbowman wrote:

On Fri, 14 Nov 2025 16:09:47 -0500, c186282 wrote:

WordStar and close variants were VERY popular back in the day. Kind >>>> of everyone's "first word processor".
Everyone used it alongside Lotus-123.

It was bundled on the Osborne 1 CP/M machine. I got a lot of miles out
of it as a programming editor in the text mode. When I finally moved to
the DOS world I bought Brief.

https://en.wikipedia.org/wiki/Brief_(text_editor)

'ed' wasn't much fun. I think I way have had a freeware clone of vi
that was no Joy either. I guessing 95% of the people who say 'I use vi'
never have. Most Linux distros bring up Vim if you type 'vi'. One
exception is Arch. 'vi' is a hard link to ex which comes up in the
visual mode for that old timey flavor.

IIRC that source was lost or elusive for a long time, or perhaps held
back by lack of permission to distribute?

Like Unix itself ed and vi had licensing problems.

https://tech.slashdot.org/story/02/01/24/0146248/caldera-releases- original-unices-under-bsd-license

Somehow at least the legacy vi code escaped the AT&T, UNIX System Laboratories, Novel, Caldera, SCO mess.

How did it escape the SCO mess? Wouldn't such a release from Caldera
rely on Santa Cruz Operation having gotten ownership of the code from
Novell? And if I'm reading Wikipedia right,[0] Novell still having the
rights played a role in the later mess involving the SCO Group?

Or is there something that I'm overlooking here?

[0] https://enwp.org/SCO_v._Novell

--
Nuno Silva

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Pancho@3:633/10 to All on Sun Nov 16 09:14:32 2025

On 11/16/25 00:43, Eli the Bearded wrote:

It's fine to to like nano or emacs or vscode or whatever. But that
just means you are not coming from a place that can judge my
appreciation of the features of vi(m).

Yes, it is a question of taste and not morals.

My taste includes both vi and vscode. ;-)

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From The Natural Philosopher@3:633/10 to All on Sun Nov 16 10:33:32 2025

On 16/11/2025 05:11, Ted Nolan <tednolan> wrote:

In article <10fasl6$3p4r1$3@dont-email.me>,
Lawrence D�Oliveiro <ldo@nz.invalid> wrote:

On 15 Nov 2025 18:48:51 GMT, Ted Nolan <tednolan> wrote:

I find it takes a lot of munging to get vim to *really* work like vi.

To me, that sounds like someone saying �??it takes a lot of munging to get a >> Trabant to *really* work like a Morris Minor�??. I can�??t imagine myself
wanting to use either.

Well, squids & kids, but my fingers do vi automatically. Anything else
not so much.

I find that depressing.

I used to have to write reams of code in 'vi'. Horrible

Today I use whatever-it-is-that-typing-vi-brings-up to edit the odd
config file with root permissions, as it is marginally quicker than
invoking a GUI text editor and managing root perms.
And its quicker than learning nano and I don't need joe for a single
line of /etc/whatever.

--
"Women actually are capable of being far more than the feminists will
let them."

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Chris Ahlstrom@3:633/10 to All on Sun Nov 16 09:31:08 2025

rbowman wrote this post by blinking in Morse code:

On 15 Nov 2025 18:48:51 GMT, Ted Nolan <tednolan> wrote:

I find it takes a lot of munging to get vim to *really* work like vi.
The one on FreeBSD which I think is technically "nex" is much closer out
of the box.

:he compatible has the disclaimer

When this option is set, numerous other options are set to make Vim as Vi-compatible as possible.

The Arch vi is the real thing.

You sure about that. ex-vi, vim.tiny, nvi are all close, but not
even this one is the "real" vi:

<https://github.com/n-t-roff/heirloom-ex-vi>

For example, it adds UTF-8 support.

I've no idea what version it is because
real vi doesn't do --version or much of anything useful.

I first learned vi using pc-vi around 1985. My first try at it
confused the *hell* out of me, but eventually I got the hang of
it. I also used microEmacs for awhile.

Apparently Bill Joy hates vi now. It's like Dennis Ritchie using
Windows :-(

But...

<https://anders.unix.se/2015/10/26/interview-with-dennis-ritchie-2003/?>

Q: Could you please describe a typical work day at Bell Labs?
What software do you use?

A: I tend to come in late unless there?s a meeting, but spend
a fair amount of time tending to e-mail communication. My
own environment (on PC hardware) actually runs Windows NT,
but it is used mainly as a graphics terminal connected to a
Plan 9 server, in a way approximately analogous to an X
windows client. The connection at home is now via cable
modem (until last summer ISDN), and Ethernet at the office.
Any editing, software work, and mail is done in this
exported Plan 9. For stuff like getting Excel and Word
things, plus much WWW browsing, I revert to NT.

--
I'm not a level-headed person... -- Bruce Perens

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Chris Ahlstrom@3:633/10 to All on Sun Nov 16 09:35:17 2025

Eli the Bearded wrote this post by blinking in Morse code:

In comp.os.linux.misc, rbowman <bowman@montana.com> wrote:

<brevsnip>

The Arch vi is the real thing. I've no idea what version it is because
real vi doesn't do --version or much of anything useful.

In vi, the standard way to get the version is with ":version". It looks
like arch is using Heirloom Vi:

https://ex-vi.sourceforge.net/

In nvi, :version yields

Version nvi-1.81.6 (2007-11-18) The CSRG, University of California, Berkeley.

That is a port of old code with many multibyte (eg UTF-8) fixes. It
should work with hardcopy terminals, which a lot of other vi
implementations (including vim) will not do. Those others expect you
to use ex mode on hardcopy terminals.

I learned vi on Digital Unix, A/UX, HP-UX, SunOS 4, and Solaris
2.(various), but I dabbled in vi clones for a long time, and was using
vim back in the 2.x versions. Elvis is still the default vi in
Slackware, and I've used recent versions of elvis for that reason. nvi
is default on NetBSD, and probably that FreeBSD one mentioned above. I
use NetBSD regularly and other BSDs very rarely.

In the vim distro there are sample macro packages. The ones to run
Conway's Game of Life were written by me on a Solaris box. The Solaris
vi can run them, but eventually it crashes out because there is a bug
that makes real vi (at least real vi of that era) forget marks after a
while. Vim will just work. Neovim fails to even start.

On the Debian system I'm working on right now those macros are in /usr/share/vim/vim90/macros/life/

Elijah
------
admits elvis is a pretty good vi imitation, but still not perfect

I wouldn't mind an old vi clone with syntax highlight. That's my
biggest crutch for writing code.

--
The early bird gets the coffee left over from the night before.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Chris Ahlstrom@3:633/10 to All on Sun Nov 16 09:49:10 2025

The Natural Philosopher wrote this post by blinking in Morse code:

On 16/11/2025 05:11, Ted Nolan <tednolan> wrote:

In article <10fasl6$3p4r1$3@dont-email.me>,
Lawrence D�Oliveiro <ldo@nz.invalid> wrote:

On 15 Nov 2025 18:48:51 GMT, Ted Nolan <tednolan> wrote:

I find it takes a lot of munging to get vim to *really* work like vi.

To me, that sounds like someone saying �??it takes a lot of munging to get a
Trabant to *really* work like a Morris Minor�??. I can�??t imagine myself >>> wanting to use either.

Well, squids & kids, but my fingers do vi automatically. Anything else
not so much.

I find that depressing.

I used to have to write reams of code in 'vi'. Horrible

I still do. (Though it is vim).

Today I use whatever-it-is-that-typing-vi-brings-up to edit the odd
config file with root permissions, as it is marginally quicker than
invoking a GUI text editor and managing root perms.
And its quicker than learning nano and I don't need joe for a single
line of /etc/whatever.

We are old and stuck in our ways :-)

My first editor (other than a keypunch) was TECO, in a computer
class, on a PDP-11, early 1980's.

--
You would if you could but you can't so you won't.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Chris Ahlstrom@3:633/10 to All on Sun Nov 16 09:50:35 2025

Waldek Hebisch wrote this post by blinking in Morse code:

In alt.folklore.computers rbowman <bowman@montana.com> wrote:

On Sat, 15 Nov 2025 09:59:39 +0000, Nuno Silva wrote:

On 2025-11-15, rbowman wrote:

On Fri, 14 Nov 2025 16:09:47 -0500, c186282 wrote:

WordStar and close variants were VERY popular back in the day. Kind >>>>> of everyone's "first word processor".
Everyone used it alongside Lotus-123.

It was bundled on the Osborne 1 CP/M machine. I got a lot of miles out >>>> of it as a programming editor in the text mode. When I finally moved to >>>> the DOS world I bought Brief.

https://en.wikipedia.org/wiki/Brief_(text_editor)

'ed' wasn't much fun. I think I way have had a freeware clone of vi
that was no Joy either. I guessing 95% of the people who say 'I use vi' >>>> never have. Most Linux distros bring up Vim if you type 'vi'. One
exception is Arch. 'vi' is a hard link to ex which comes up in the
visual mode for that old timey flavor.

IIRC that source was lost or elusive for a long time, or perhaps held
back by lack of permission to distribute?

Like Unix itself ed and vi had licensing problems.

https://tech.slashdot.org/story/02/01/24/0146248/caldera-releases-
original-unices-under-bsd-license

Somehow at least the legacy vi code escaped the AT&T, UNIX System
Laboratories, Novel, Caldera, SCO mess.

Around 1992 I fetched 'elvis' from the net. I did not use it
much, but a guy which was previously using real vi found it
to be a reasonable replacement.

Later, there was nvi. And after that Linux distributions
switched to vim.

Stevie was a clone of vi and Vim followed on Stevie.

There was an implementation of SteVIe for the Atari ST iirc.

--
Children are natural mimics who act like their parents despite every
effort to teach them good manners.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Mechanicjay@3:633/10 to All on Sun Nov 16 15:40:22 2025

On Sun, 16 Nov 2025 09:14:32 +0000, Pancho <Pancho.Jones@protonmail.com> wrote: >On 11/16/25 00:43, Eli the Bearded wrote:

It's fine to to like nano or emacs or vscode or whatever. But that
just means you are not coming from a place that can judge my
appreciation of the features of vi(m).

Yes, it is a question of taste and not morals.

My taste includes both vi and vscode. ;-)

Some years back I moved all my PHP work out of Eclipse and into vim. With a
a few plugins I get modern conveniences like a debug console, code style enforcement and syntax validation. I was inspired to make this move by a friend
of mine who does core engine work at mongoDB describing his VIM setup.

Between my fingers knowing vi and not having to think about it, and the distraction free environment that doesn't fight me, it greatly increased my overall satisfaction with time spent writing software.

:version on my workstation shows:
VIM - Vi IMproved 9.1 (2024 Jan 02, compiled Oct 10 2025 02:26:29)

--
Sent from my Personal DECstation 5000/25

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Mechanicjay@3:633/10 to All on Sun Nov 16 15:47:30 2025

On 16 Nov 2025 09:35:17 -0500, Chris Ahlstrom <OFeem1987@teleworm.us> wrote: >Eli the Bearded wrote this post by blinking in Morse code:

In comp.os.linux.misc, rbowman <bowman@montana.com> wrote:

<brevsnip>

The Arch vi is the real thing. I've no idea what version it is because
real vi doesn't do --version or much of anything useful.

In vi, the standard way to get the version is with ":version". It looks
like arch is using Heirloom Vi:

https://ex-vi.sourceforge.net/

In nvi, :version yields

Version nvi-1.81.6 (2007-11-18) The CSRG, University of California, Berkeley.

That is a port of old code with many multibyte (eg UTF-8) fixes. It
should work with hardcopy terminals, which a lot of other vi
implementations (including vim) will not do. Those others expect you
to use ex mode on hardcopy terminals.

I learned vi on Digital Unix, A/UX, HP-UX, SunOS 4, and Solaris
2.(various), but I dabbled in vi clones for a long time, and was using
vim back in the 2.x versions. Elvis is still the default vi in
Slackware, and I've used recent versions of elvis for that reason. nvi
is default on NetBSD, and probably that FreeBSD one mentioned above. I
use NetBSD regularly and other BSDs very rarely.

On this Ultrix 4.5 box, :version yeilds:
Version 3.7, 18-Oct-85

I was missing some nice features, like the ruler at the bottom and the ability to set an autowrap at 80 cols, which makes posting a message like this much easier, so I built me a newer vim:
VIM - Vi IMproved 6.3 (2004 June 7, compiled Nov 11 2025 00:38:15)

--
Sent from my Personal DECstation 5000/25

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Richard Kettlewell@3:633/10 to All on Sun Nov 16 16:14:20 2025

Nuno Silva <nunojsilva@invalid.invalid> writes:

On 2025-11-15, rbowman wrote:

IIRC that source was lost or elusive for a long time, or perhaps held
back by lack of permission to distribute?

Like Unix itself ed and vi had licensing problems.

https://tech.slashdot.org/story/02/01/24/0146248/caldera-releases-
original-unices-under-bsd-license

Somehow at least the legacy vi code escaped the AT&T, UNIX System
Laboratories, Novel, Caldera, SCO mess.

How did it escape the SCO mess? Wouldn't such a release from Caldera
rely on Santa Cruz Operation having gotten ownership of the code from
Novell? And if I'm reading Wikipedia right,[0] Novell still having the rights played a role in the later mess involving the SCO Group?

Or is there something that I'm overlooking here?

[0] https://enwp.org/SCO_v._Novell

I think the timeline is as follows:

1976 Original ex, a fork of ed
1977 vi, as a mode of ex
1992 USL lawsuit filed
1993 USL purchased by Novell
1993 32V copyright invalidated by court[1]
1994 4.4BSD-Lite released, excluding vi
1994 USL lawsuit settled
1995 Novell sell UnixWare to SCO
2000 SCO sell Unix assets to Caldera Systems
2002 Caldera releases 32V and V1...V7 under a BSD licence
2002 Caldera renames to SCO Group
2003 SCO Group (i.e. Caldera) sues IBM over supposed AT&T code in Linux
2004 SCO Group sues Novell over ownership of AT&T code
2007 SCO loses against Novell
2007 SCO bankrupt, trustees continue legal action
2021 SCO Group case lawsuits against IBM finally settled

I suspect the 1993 decision means that vi was actually free and clear
from that point onwards, with the 2002 relicensing being a legal no-op;
at any rate it was before the point that Caldera turned evil and before
anyone picked a fight over what Caldera had actually bought.

4.4BSD-Lite already had nvi and other clones emerged; possibly nobody
cared much about the original as a result.

[1] https://web.archive.org/web/20180307020845/http://sco.tuxrocks.com/Docs/USL/Doc-92.html
?Consequently, I find that Plaintiff has failed to demonstrate a
likelihood that it can successfully defend its copyright in
32V. Plaintiff's claims of copyright violations are not a basis for
injunctive relief.?

--
https://www.greenend.org.uk/rjk/

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Joe Makowiec@3:633/10 to All on Sun Nov 16 18:38:36 2025

On 16 Nov 2025 in comp.os.linux.misc, Mechanicjay wrote:

Some years back I moved all my PHP work out of Eclipse and into vim.
With a a few plugins I get modern conveniences like a debug
console, code style enforcement and syntax validation.

If you don't mind my asking, which plugins? I see several web pages out
there with suggestions, but I'd be curious to see another set.

--
Joe Makowiec
http://makowiec.org/
Email: http://makowiec.org/contact/?Joe
Usenet Improvement Project: http://twovoyagers.com/improve-usenet.org/

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From The Natural Philosopher@3:633/10 to All on Sun Nov 16 20:26:27 2025

On 16/11/2025 14:49, Chris Ahlstrom wrote:

I used to have to write reams of code in 'vi'. Horrible

I still do. (Though it is vim).

I have a GUI. Geany is SO much nicer...
--
"Nature does not give up the winter because people dislike the cold."

? Confucius

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From The Natural Philosopher@3:633/10 to All on Sun Nov 16 20:30:19 2025

On 16/11/2025 20:27, Ted Nolan <tednolan> wrote:

I tend to be looking at a log file, control-z out to check something else
and then "fg" back

Not since a GUI gave me unlimited consoles...on the same monitor

--
?A leader is best When people barely know he exists. Of a good leader,
who talks little,When his work is done, his aim fulfilled,They will say,
?We did this ourselves.?

? Lao Tzu, Tao Te Ching

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lawrence D?Oliveiro@3:633/10 to All on Sun Nov 16 20:57:23 2025

On Sun, 16 Nov 2025 10:33:32 +0000, The Natural Philosopher wrote:

Today I use whatever-it-is-that-typing-vi-brings-up to edit the odd
config file with root permissions, as it is marginally quicker than
invoking a GUI text editor and managing root perms.

emacs -nw

Don?t do GUI stuff as root.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lawrence D?Oliveiro@3:633/10 to All on Sun Nov 16 20:59:35 2025

On 16 Nov 2025 20:19:12 GMT, Ted Nolan <tednolan> wrote:

Lack of utf-8 would be an issue for some things, but mostly not.

Without UTF-8, you could not have ??? or ??? or ?�? or those curly quotes.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lawrence D?Oliveiro@3:633/10 to All on Sun Nov 16 21:01:13 2025

On Sun, 16 Nov 2025 09:14:32 +0000, Pancho wrote:

My taste includes both vi and vscode. ;-)

Visual Studio Code makes Emacs look petite.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lawrence D?Oliveiro@3:633/10 to All on Sun Nov 16 21:04:20 2025

On Sun, 16 Nov 2025 20:30:19 +0000, The Natural Philosopher wrote:

On 16/11/2025 20:27, Ted Nolan <tednolan> wrote:

I tend to be looking at a log file, control-z out to check something
else and then "fg" back

Not since a GUI gave me unlimited consoles...on the same monitor

I have maybe 20 Konsole tabs currently open. No need to look at logs in
any editor: just use regular log-display commands (e.g. journalctl) for
that. If I need to, I can copy and paste between editor windows and
terminal windows.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lawrence D?Oliveiro@3:633/10 to All on Sun Nov 16 21:19:31 2025

On 16 Nov 2025 21:04:20 GMT, Ted Nolan <tednolan> wrote:

In article <10fddvm$dsjl$3@dont-email.me>,
Lawrence D�Oliveiro <ldo@nz.invalid> wrote:

On 16 Nov 2025 20:19:12 GMT, Ted Nolan <tednolan> wrote:

Lack of utf-8 would be an issue for some things, but mostly not.

Without UTF-8, you could not have �??�?��?? or �????�??

Right. Don't need those.

You can?t even see them properly!

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Nuno Silva@3:633/10 to All on Sun Nov 16 23:13:58 2025

On 2025-11-16, Lawrence D?Oliveiro wrote:

On 16 Nov 2025 20:19:12 GMT, Ted Nolan <tednolan> wrote:

Lack of utf-8 would be an issue for some things, but mostly not.

Without UTF-8, you could not have ??? or ??? or ?�? or those curly quotes.

Those three are all in iso8859-15 and in Mac OS Roman...

And I'd guess only the first isn't in latin1.

As for curly quotes, I think not having these might actually be a
feature :-P

--
Nuno Silva

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Niklas Karlsson@3:633/10 to All on Sun Nov 16 23:51:47 2025

On 2025-11-16, Nuno Silva <nunojsilva@invalid.invalid> wrote:

On 2025-11-16, Lawrence D?Oliveiro wrote:

On 16 Nov 2025 20:19:12 GMT, Ted Nolan <tednolan> wrote:

Lack of utf-8 would be an issue for some things, but mostly not.

Without UTF-8, you could not have ??? or ??? or ?�? or those curly quotes.

Those three are all in iso8859-15 and in Mac OS Roman...

Huh. I wouldn't have expected "?" to be in Mac OS Roman, but you're
right. As of a certain version, anyhow.

Niklas
--
"Avoid hyperbole at all costs, its the most destructive argument on
the planet" - Mark McIntyre in comp.lang.c

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lawrence D?Oliveiro@3:633/10 to All on Mon Nov 17 00:43:05 2025

On 16 Nov 2025 23:18:50 GMT, Ted Nolan <tednolan> wrote:

Bingo on the "smart" quotes!

I also like using ?�? and ?�? as metasyntactic brackets, but I expect
French people will interpret those as quotes ...

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Eli the Bearded@3:633/10 to All on Mon Nov 17 02:32:32 2025

In comp.os.linux.misc, Chris Ahlstrom <OFeem1987@teleworm.us> wrote:

You sure about that. ex-vi, vim.tiny, nvi are all close, but not
even this one is the "real" vi:

<https://github.com/n-t-roff/heirloom-ex-vi>

For example, it adds UTF-8 support.

https://github.com/n-t-roff/ex-1.1

Starts in ex mode, to get vi, you need to ask for it at the : prompt.

There's also

https://github.com/n-t-roff/ex-2.2

for a more modern flavor.

Elijah
------
will the real Ship of Theseus please sail home?

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Chris Ahlstrom@3:633/10 to All on Mon Nov 17 11:24:36 2025

rbowman wrote this post by blinking in Morse code:

On Sun, 16 Nov 2025 09:50:35 -0500, Chris Ahlstrom wrote:

There was an implementation of SteVIe for the Atari ST iirc.

ST Editor for Vi Enthusiasts.

Moolenaar extended Stevie for his Amiga. The Amiga spawned a lot of software.

https://en.wikipedia.org/wiki/Fred_Fish

Cool!

Also I installed the gulam unix-like shell on the ST, using vi to
edit. Oh so long ago.

--
A friend of mine is into Voodoo Acupuncture. You don't have to go.
You'll just be walking down the street and... Ooohh, that's much better.
-- Steven Wright

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Chris Ahlstrom@3:633/10 to All on Mon Nov 17 11:39:23 2025

rbowman wrote this post by blinking in Morse code:

On Sun, 16 Nov 2025 20:30:19 +0000, The Natural Philosopher wrote:

On 16/11/2025 20:27, Ted Nolan <tednolan> wrote:

I tend to be looking at a log file, control-z out to check something
else and then "fg" back

Not since a GUI gave me unlimited consoles...on the same monitor

While I use Vim for quick edits of a config file or with ssh, gVim is what
I mostly use for that reason. I may use the menu once in a blue moon to change the font or theme.

I use gvim for editing two to 4 files side-by-side, especially
when copying code.

Also gvimdiff and git difftool: [diff] tool = gvimdiff.

For me, gvimdiff beats both kdiff3 and Beyond Compare for
highlighting differences.

--
System going down at 1:45 this afternoon for disk crashing.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Chris Ahlstrom@3:633/10 to All on Mon Nov 17 11:44:37 2025

The Natural Philosopher wrote this post by blinking in Morse code:

On 16/11/2025 14:49, Chris Ahlstrom wrote:

I used to have to write reams of code in 'vi'. Horrible

That reminds me of when the DOS team I once worked on used edlin
as *the* editor. Wotta a pain editing in 64k segments on a 1000k
file.

I see that FreeDOS provides a fairly faithful edlin.

Even the primitive vi was nicer than TECO and edlin.

I still do. (Though it is vim).

I have a GUI. Geany is SO much nicer...

gvim when it's a help, vim usually, and always when ssh'ing.

--
Debian is the Jedi operating system: "Always two there are, a master and
an apprentice".
-- Simon Richter on debian-devel

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Scott Lurndal@3:633/10 to All on Mon Nov 17 19:39:24 2025

Nuno Silva <nunojsilva@invalid.invalid> writes:

On 2025-11-15, rbowman wrote:

IIRC that source was lost or elusive for a long time, or perhaps held
back by lack of permission to distribute?

Like Unix itself ed and vi had licensing problems.

https://tech.slashdot.org/story/02/01/24/0146248/caldera-releases-
original-unices-under-bsd-license

Somehow at least the legacy vi code escaped the AT&T, UNIX System
Laboratories, Novel, Caldera, SCO mess.

How did it escape the SCO mess? Wouldn't such a release from Caldera
rely on Santa Cruz Operation having gotten ownership of the code from
Novell? And if I'm reading Wikipedia right,[0] Novell still having the >rights played a role in the later mess involving the SCO Group?

Or is there something that I'm overlooking here?

Bill Joy's changes to ex(1) that implemented vi(1) mode were
initially released in BSD. It was later opensourced via Solaris.

https://en.wikipedia.org/wiki/Vi_(text_editor)

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Mechanicjay@3:633/10 to All on Tue Nov 18 03:55:50 2025

On 16 Nov 2025 Joe Makowiec <makowiec@invalid.invalid> wrote:

On 16 Nov 2025 in comp.os.linux.misc, Mechanicjay wrote:

Some years back I moved all my PHP work out of Eclipse and into vim.
With a a few plugins I get modern conveniences like a debug
console, code style enforcement and syntax validation.

If you don't mind my asking, which plugins? I see several web pages out >there with suggestions, but I'd be curious to see another set.

Sure thing!

Here's the plugin section from my .vimrc:

Plugin 'VundleVim/Vundle.vim'
Plugin 'itchyny/lightline.vim'
Plugin 'tpope/vim-fugitive'
Plugin 'joonty/vdebug'
Bundle 'joonty/vim-phpqa.git'
Bundle 'stephpy/vim-php-cs-fixer'

Then at the bottom some calls to run the code style fixer and generate new ctags on each save:

autocmd BufWritePost *.php silent! call PhpCsFixerFixFile()
autocmd BufWritePost *.php silent! !eval 'ctags -f php.tags --languages=PHP -R' &

This of course requires having some other software installed on the workstation, such as PHP Code Sniffer, PHP Mess Detector, ctags, the pecl Xdebug
extention and...I think that's it. I set this up quite a few years ago and haven't had to mess with it much. It would be a bit of a process of discovery to get it all setup with the pieces in place again.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From The Natural Philosopher@3:633/10 to All on Tue Nov 18 12:09:42 2025

On 18/11/2025 08:02, Ian wrote:

On 2025-11-17, Lawrence D?Oliveiro <ldo@nz.invalid> wrote:

On Mon, 17 Nov 2025 08:24:27 -0000 (UTC), Ian wrote:

* Ubuntu isn't my preferred choice for servers, or anything really, but this
particular application was developed for it, and I haven't got the time or >>> inclination to port it to a different distribution.

What exactly was there about it that needed porting?

I have no idea. It is available as an "apt-get install" on the latest Ubuntu, from the standard repos, documented, tested and "supported". It isn't available
in the standard repos on other distributions, so that would need time and effort to locate a compatible 3rd party binary, or compile from source. Even if that "just works" it's alredy more effort and risk than installing Ubuntu and using the provided package, as this is on a dedicated VM anyway.

Sometimes You just need things to work, and don't want another adventure...

Hear hear!

Dependency hell from non-distro packages...
--
Truth welcomes investigation because truth knows investigation will lead
to converts. It is deception that uses all the other techniques.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Johnny Billquist@3:633/10 to All on Tue Nov 18 20:04:38 2025

On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:

On 16 Nov 2025 20:19:12 GMT, Ted Nolan <tednolan> wrote:

Lack of utf-8 would be an issue for some things, but mostly not.

Without UTF-8, you could not have ??? or ??? or ?�? or those curly quotes.

Of course you could.
They exist just fine in Latin-1 (hmm, maybe not the quotes...).

Johnny

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Eli the Bearded@3:633/10 to All on Tue Nov 18 20:29:55 2025

In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:

On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:

Without UTF-8, you could not have ??? or ??? or ?�? or those curly quotes.

Of course you could.
They exist just fine in Latin-1 (hmm, maybe not the quotes...).

The Latin-1 I know does not have a Euro symbol. It does have the generic currency placeholder at 0xA5: ?

Elijah
------
likes utf-8 better than iso-8859-$WHATEVER

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Carlos E.R.@3:633/10 to All on Wed Nov 19 02:37:46 2025

On 2025-11-18 20:04, Johnny Billquist wrote:

On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:

On 16 Nov 2025 20:19:12 GMT, Ted Nolan <tednolan> wrote:

Lack of utf-8 would be an issue for some things, but mostly not.

Without UTF-8, you could not have ??? or ??? or ?�? or those curly
quotes.

Of course you could.
They exist just fine in Latin-1 (hmm, maybe not the quotes...).

But with the transmission you have to transmit first what charset you
are going to use, and then you are limited by it, and the recipient must
have the same map, and be able to use it. Perhaps he has to use his own
map instead.

--
Cheers, Carlos.
ES??, EU??;

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Richard Kettlewell@3:633/10 to All on Wed Nov 19 08:24:02 2025

Eli the Bearded <*@eli.users.panix.com> writes:

Johnny Billquist <bqt@softjar.se> wrote:

On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:

Without UTF-8, you could not have ??? or ??? or ?�? or those curly quotes. >> Of course you could.

They exist just fine in Latin-1 (hmm, maybe not the quotes...).

The Latin-1 I know does not have a Euro symbol. It does have the generic currency placeholder at 0xA5: ?

Correct, latin-1 doesn?t have the euro symbol; latin-15 does. Neither
have proper quotes. By now both are retrocomputing, really.

--
https://www.greenend.org.uk/rjk/

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Joe Makowiec@3:633/10 to All on Wed Nov 19 13:11:43 2025

On 17 Nov 2025 in comp.os.linux.misc, Mechanicjay wrote:

I set this up quite a few years ago and haven't had to mess with it
much. It would be a bit of a process of discovery to get it all
setup with the pieces in place again.

Thanks. I know what you mean - spend hours or days getting something set
up; it just runs; something else updates, which blows up the original
thing; spend hours or days relearning the original thing... I tend to
leave myself hints in config files, but that doesn't always help.

--
Joe Makowiec
http://makowiec.org/
Email: http://makowiec.org/contact/?Joe
Usenet Improvement Project: http://twovoyagers.com/improve-usenet.org/

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Eric Pozharski@3:633/10 to All on Wed Nov 19 13:02:05 2025

with <akjvulxcnk.ln2@Telcontar.valinor> Carlos E.R. wrote:

On 2025-11-18 20:04, Johnny Billquist wrote:

On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:

On 16 Nov 2025 20:19:12 GMT, Ted Nolan <tednolan> wrote:

Lack of utf-8 would be an issue for some things, but mostly not.

Without UTF-8, you could not have ??? or ??? or ?�? or those curly
quotes.

Of course you could. They exist just fine in Latin-1 (hmm, maybe not
the quotes...).

But with the transmission you have to transmit first what charset you
are going to use, and then you are limited by it, and the recipient
must have the same map, and be able to use it. Perhaps he has to use
his own map instead.

If only there was some arrangement to make it work. And RFC2047 readily
offers some. And that would be a nail for UTF-8 coffin.

--
Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lawrence D?Oliveiro@3:633/10 to All on Wed Nov 19 20:12:03 2025

On Wed, 19 Nov 2025 13:11:43 -0000 (UTC), Joe Makowiec wrote:

On 17 Nov 2025 in comp.os.linux.misc, Mechanicjay wrote:

I set this up quite a few years ago and haven't had to mess with it
much. It would be a bit of a process of discovery to get it all
setup with the pieces in place again.

Thanks. I know what you mean - spend hours or days getting something set
up; it just runs; something else updates, which blows up the original
thing; spend hours or days relearning the original thing... I tend to
leave myself hints in config files, but that doesn't always help.

This is why I have taken to writing up notes about certain custom builds.

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Stéphane CARPENTIER@3:633/10 to All on Fri Nov 21 19:55:07 2025

Le 17-11-2025, Lawrence D?Oliveiro <ldo@nz.invalid> a �crit�:

On 16 Nov 2025 23:18:50 GMT, Ted Nolan <tednolan> wrote:

Bingo on the "smart" quotes!

I also like using ?�? and ?�? as metasyntactic brackets, but I expect
French people will interpret those as quotes ...

Of course those are quotes.

--
Si vous avez du temps � perdre :
https://scarpet42.gitlab.io

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Stéphane CARPENTIER@3:633/10 to All on Fri Nov 21 19:58:12 2025

Le 18-11-2025, Eli the Bearded <*@eli.users.panix.com> a �crit�:

In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:

On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:

Without UTF-8, you could not have ??? or ??? or ?�? or those curly quotes. >> Of course you could.

They exist just fine in Latin-1 (hmm, maybe not the quotes...).

The Latin-1 I know does not have a Euro symbol. It does have the generic currency placeholder at 0xA5: ?

They created the latin9 from the latin1 to add this ? symbol.

--
Si vous avez du temps � perdre :
https://scarpet42.gitlab.io

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lawrence D?Oliveiro@3:633/10 to All on Fri Nov 21 20:27:28 2025

On 21 Nov 2025 19:55:07 GMT, St�phane CARPENTIER wrote:

Le 17-11-2025, Lawrence D?Oliveiro <ldo@nz.invalid> a �crit�:

I also like using ?�? and ?�? as metasyntactic brackets, but I expect
French people will interpret those as quotes ...

Of course those are quotes.

I want more paired bracketing symbols. ;)

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Niklas Karlsson@3:633/10 to All on Fri Nov 21 21:14:08 2025

On 2025-11-21, St�phane CARPENTIER <sc@fiat-linux.fr> wrote:

Le 18-11-2025, Eli the Bearded <*@eli.users.panix.com> a �crit�:

In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:

On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:

Without UTF-8, you could not have ??? or ??? or ?�? or those curly quotes. >>> Of course you could.

They exist just fine in Latin-1 (hmm, maybe not the quotes...).

The Latin-1 I know does not have a Euro symbol. It does have the generic
currency placeholder at 0xA5: ?

They created the latin9 from the latin1 to add this ? symbol.

I thought that was Latin-15.

Niklas
--
The bloody handle on the back of an E450 isn't until you try to use it as
such, then it becomes less of a handle and more bloody.
-- Gary Barnes in asr

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Peter Flass@3:633/10 to All on Fri Nov 21 19:10:53 2025

On 11/21/25 12:58, St�phane CARPENTIER wrote:

Le 18-11-2025, Eli the Bearded <*@eli.users.panix.com> a �crit�:

In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:

On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:

Without UTF-8, you could not have ??? or ??? or ?�? or those curly quotes. >>> Of course you could.

They exist just fine in Latin-1 (hmm, maybe not the quotes...).

The Latin-1 I know does not have a Euro symbol. It does have the generic
currency placeholder at 0xA5: ?

They created the latin9 from the latin1 to add this ? symbol.

I thought it was Latin-15

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Eli the Bearded@3:633/10 to All on Sat Nov 22 03:20:31 2025

In comp.os.linux.misc, Lawrence DOliveiro <ldo@nz.invalid> wrote:

On 21 Nov 2025 19:55:07 GMT, St�phane CARPENTIER wrote:

Le 17-11-2025, Lawrence D?Oliveiro <ldo@nz.invalid> a �crit�:

I also like using ?�? and ?�? as metasyntactic brackets, but I expect
French people will interpret those as quotes ...

Of course those are quotes.

I want more paired bracketing symbols. ;)

https://qaz.wtf/qz/blosxom/2022/06/02/matchpairs

TL;DR: 186 pairs in Unicode

Elijah
------
the vim "set matchedpairs" line is too long for Usenet

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lawrence D?Oliveiro@3:633/10 to All on Sat Nov 22 05:57:28 2025

On Sat, 22 Nov 2025 03:20:31 -0000 (UTC), Eli the Bearded wrote:

In comp.os.linux.misc, Lawrence DOliveiro <ldo@nz.invalid> wrote:

I want more paired bracketing symbols. ;)

https://qaz.wtf/qz/blosxom/2022/06/02/matchpairs

TL;DR: 186 pairs in Unicode

Interesting. And more than I expected.

Had trouble making out the difference between ?left-handed? and ?right-
handed? versions of the ?interlaced pentagram? -- those are only obvious
at larger sizes.

Most of the rest (that my font can show) seem quite legible.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Stéphane CARPENTIER@3:633/10 to All on Sat Nov 22 10:23:21 2025

Le 22-11-2025, Peter Flass <Peter@Iron-Spring.com> a �crit�:

On 11/21/25 12:58, St�phane CARPENTIER wrote:

Le 18-11-2025, Eli the Bearded <*@eli.users.panix.com> a �crit�:

In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:

On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:

Without UTF-8, you could not have ??? or ??? or ?�? or those curly quotes.

Of course you could.
They exist just fine in Latin-1 (hmm, maybe not the quotes...).

The Latin-1 I know does not have a Euro symbol. It does have the generic >>> currency placeholder at 0xA5: ?

They created the latin9 from the latin1 to add this ? symbol.

I thought it was Latin-15

No, it's iso-8859-15 but latin9.

--
Si vous avez du temps � perdre :
https://scarpet42.gitlab.io

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Johnny Billquist@3:633/10 to All on Sat Nov 22 17:55:14 2025

On 2025-11-18 21:29, Eli the Bearded wrote:

In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:

On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:

Without UTF-8, you could not have ??? or ??? or ?�? or those curly quotes. >> Of course you could.

They exist just fine in Latin-1 (hmm, maybe not the quotes...).

The Latin-1 I know does not have a Euro symbol. It does have the generic currency placeholder at 0xA5: ?

Yeah. Sorry. That came in 8859-15.

Elijah
------
likes utf-8 better than iso-8859-$WHATEVER

That don't even make sense. UTF-8 is just a way to encode large integers
in a variable sequence of 8-bit bytes.
Which of course makes it a mess to figure out how long a string actually is. But I guess what you actually mean is that you like Unicode better than 8859-whatever.
I couldn't disagree more. Endless ways to represent the exact same
character, and weird things like sometimes having a separate codepoint
for units or prefixes, but sometimes using normal ASCII for them, and
then you have sometimes different codepoints because of colors, but
sometimes not.
It's a trainwreck, but now we're stuck with it. :(

Johnny

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Alexander Schreiber@3:633/10 to All on Sat Nov 22 19:20:28 2025

Johnny Billquist <bqt@softjar.se> wrote:

On 2025-11-18 21:29, Eli the Bearded wrote:

In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:

On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:

Without UTF-8, you could not have ??? or ??? or ?�? or those curly quotes. >>> Of course you could.

They exist just fine in Latin-1 (hmm, maybe not the quotes...).

The Latin-1 I know does not have a Euro symbol. It does have the generic
currency placeholder at 0xA5: ?

Yeah. Sorry. That came in 8859-15.

Elijah
------
likes utf-8 better than iso-8859-$WHATEVER

That don't even make sense. UTF-8 is just a way to encode large integers
in a variable sequence of 8-bit bytes.
Which of course makes it a mess to figure out how long a string actually is. But I guess what you actually mean is that you like Unicode better than 8859-whatever.
I couldn't disagree more. Endless ways to represent the exact same character, and weird things like sometimes having a separate codepoint
for units or prefixes, but sometimes using normal ASCII for them, and
then you have sometimes different codepoints because of colors, but sometimes not.

Well, a big part of the reason is that human writing systems across
the globe are, in fact, quite an impressive mess from an engineering
point of view, mostly not being properly designed and all that. ;-)

It's a trainwreck, but now we're stuck with it. :(

At least it sorta mostly kinda works for a somewhat wide range of
languages and scripts and you can have different scripts (latin,
cyrillic, arabic and others) in the same text. Which beats having to
figure out which code page to use for which text by quite a margin.
I _have_ been through the mess of "US ASCII works, good luck with
anything beyond that" that was text processing on e.g. MS-DOS (and
variants) and early Windows. And my native language (German) uses
essentially US-ASCII plus only a small number of letters outside of
that to begin with. Imagine if your native script has _no_ overlap
with that.

Kind regards,
Alex.
--
"Opportunity is missed by most people because it is dressed in overalls and
looks like work." -- Thomas A. Edison

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Carlos E.R.@3:633/10 to All on Sat Nov 22 20:25:17 2025

On 2025-11-22 17:55, Johnny Billquist wrote:

On 2025-11-18 21:29, Eli the Bearded wrote:

In comp.os.linux.misc, Johnny Billquist� <bqt@softjar.se> wrote:

On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:

Without UTF-8, you could not have ??? or ??? or ?�? or those curly
quotes.

Of course you could.
They exist just fine in Latin-1 (hmm, maybe not the quotes...).

The Latin-1 I know does not have a Euro symbol. It does have the generic
currency placeholder at 0xA5: ?

Yeah. Sorry. That came in 8859-15.

Elijah
------
likes utf-8 better than iso-8859-$WHATEVER

That don't even make sense. UTF-8 is just a way to encode large integers
in a variable sequence of 8-bit bytes.
Which of course makes it a mess to figure out how long a string actually
is.
But I guess what you actually mean is that you like Unicode better than 8859-whatever.
I couldn't disagree more. Endless ways to represent the exact same character, and weird things like sometimes having a separate codepoint
for units or prefixes, but sometimes using normal ASCII for them, and
then you have sometimes different codepoints because of colors, but sometimes not.
It's a trainwreck, but now we're stuck with it. :(

Encode large integers? No.

<https://en.wikipedia.org/wiki/UTF-8>

UTF-8

UTF-8 is a character encoding standard used for electronic
communication. Defined by the Unicode Standard, the name is derived from Unicode Transformation Format ? 8-bit.[1] As of July 2025, almost every webpage is transmitted as UTF-8.[2]

UTF-8 supports all 1,112,064[3] valid Unicode code points using a variable-width encoding of one to four one-byte (8-bit) code units.

Code points with lower numerical values, which tend to occur more
frequently, are encoded using fewer bytes. It was designed for backward compatibility with ASCII: the first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single byte with
the same binary value as ASCII, so that a UTF-8-encoded file using only
those characters is identical to an ASCII file. Most software designed
for any extended ASCII can read and write UTF-8, and this results in
fewer internationalization issues than any alternative text encoding.[4][5]

UTF-8 is dominant for all countries/languages on the internet, is used
in most standards, often the only allowed encoding, and is supported by
all modern operating systems and programming languages.

--
Cheers, Carlos.
ES??, EU??;

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lawrence D?Oliveiro@3:633/10 to All on Sat Nov 22 21:43:56 2025

On Sat, 22 Nov 2025 19:20:28 +0100, Alexander Schreiber wrote:

And my native language (German) uses essentially US-ASCII plus only
a small number of letters outside of that to begin with. Imagine if
your native script has _no_ overlap with that.

In pre-Unicode days, the major Western European languages were the next- best-supported, in terms of computer encodings, after ASCII.

You don?t have to go very far from there to find ones that were a little harder to deal with ...

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Carlos E.R.@3:633/10 to All on Sun Nov 23 00:23:57 2025

On 2025-11-22 22:43, Lawrence D?Oliveiro wrote:

On Sat, 22 Nov 2025 19:20:28 +0100, Alexander Schreiber wrote:

And my native language (German) uses essentially US-ASCII plus only
a small number of letters outside of that to begin with. Imagine if
your native script has _no_ overlap with that.

In pre-Unicode days, the major Western European languages were the next- best-supported, in terms of computer encodings, after ASCII.

You don?t have to go very far from there to find ones that were a little harder to deal with ...

It amazes me that computers can handle Chinese. Not only display, but keyboards.

--
Cheers, Carlos.
ES??, EU??;

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lawrence D?Oliveiro@3:633/10 to All on Sun Nov 23 02:56:17 2025

On Sun, 23 Nov 2025 00:23:57 +0100, Carlos E.R. wrote:

It amazes me that computers can handle Chinese. Not only display, but keyboards.

It?s a trick. They enter syllables using the Roman alphabet. It then pops
up candidate characters that they pick from.

I once helped create a business card for our Mayor, for a trip to our
Chinese sister city, on a Macintosh. A colleague from the Chinese language department had worked out the text; I operated the text input system to
lay out the card. I recall the fonts were all bitmaps anyway, but the
layout quality was deemed acceptable (what choice did they have?).

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Peter Flass@3:633/10 to All on Sat Nov 22 20:18:41 2025

On 11/22/25 16:23, Carlos E.R. wrote:

On 2025-11-22 22:43, Lawrence D?Oliveiro wrote:

On Sat, 22 Nov 2025 19:20:28 +0100, Alexander Schreiber wrote:

And my native language (German) uses essentially US-ASCII plus only
a small number of letters outside of that to begin with. Imagine if
your native script has _no_ overlap with that.

In pre-Unicode days, the major Western European languages were the next-
best-supported, in terms of computer encodings, after ASCII.

You don?t have to go very far from there to find ones that were a little
harder to deal with ...

It amazes me that computers can handle Chinese. Not only display, but keyboards.

I just read an article about the Chinese typewriter invented by the
writer Lin Yutang. Apparently his original model has just been
re-discovered.

I was going to add a description, but I see it is described here:

https://en.wikipedia.org/wiki/Chinese_typewriter#MingKwai_design

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From The Natural Philosopher@3:633/10 to All on Sun Nov 23 09:42:30 2025

On 23/11/2025 02:17, rbowman wrote:

On Sun, 23 Nov 2025 00:23:57 +0100, Carlos E.R. wrote:

It amazes me that computers can handle Chinese. Not only display, but
keyboards.

https://www.youtube.com/watch?v=iWi-9LJ4dg4

Japanese is as bad. There are over 2000 kanji characters you have to know
to be reasonably literate. Both China and Japan have tried to simplify
that character set for centuries and have gotten it down to four or five thousand though the exact count isn't known.

I can't imagine...

I saw a you tube video on that. Chinese is essentially a dogs breakfast
second time around

--
Civilization exists by geological consent, subject to change without notice.
? Will Durant

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Anthk NM@3:633/10 to All on Sun Nov 23 12:48:58 2025

On 2025-11-16, Ted Nolan <tednolan> <ted@loft.tnolan.com> wrote:

In article <mnumuiF7n72U5@mid.individual.net>,
rbowman <bowman@montana.com> wrote:

On Sun, 16 Nov 2025 09:49:10 -0500, Chris Ahlstrom wrote:

The Natural Philosopher wrote this post by blinking in Morse code:

On 16/11/2025 05:11, Ted Nolan <tednolan> wrote:

In article <10fasl6$3p4r1$3@dont-email.me>,
Lawrence D�Oliveiro <ldo@nz.invalid> wrote:

On 15 Nov 2025 18:48:51 GMT, Ted Nolan <tednolan> wrote:

I find it takes a lot of munging to get vim to *really* work like >>>>>>> vi.

To me, that sounds like someone saying �??it takes a lot of munging >>>>>> to get a Trabant to *really* work like a Morris Minor�??. I

can�??t

imagine myself wanting to use either.

Well, squids & kids, but my fingers do vi automatically. Anything
else not so much.

I find that depressing.

I used to have to write reams of code in 'vi'. Horrible

I still do. (Though it is vim).

Back to my original statement that most people who say they use vi are >>using vim and would be very unhappy with vi.

I would not. Lack of utf-8 would be an issue for some things, but
mostly not.

With nvi (nvi2 under OpenBSD ports) I just set at ~/.exrc

set showmode ruler
set ts=2
set ht=2

And done. A status line, the mode line, sane tabs and Unicode.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Carlos E.R.@3:633/10 to All on Sun Nov 23 14:59:38 2025

On 2025-11-23 03:17, rbowman wrote:

On Sun, 23 Nov 2025 00:23:57 +0100, Carlos E.R. wrote:

It amazes me that computers can handle Chinese. Not only display, but
keyboards.

https://www.youtube.com/watch?v=iWi-9LJ4dg4

Quite curious, thanks.

Japanese is as bad. There are over 2000 kanji characters you have to know
to be reasonably literate. Both China and Japan have tried to simplify
that character set for centuries and have gotten it down to four or five thousand though the exact count isn't known.

I can't imagine...

--
Cheers, Carlos.
ES??, EU??;

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lawrence D?Oliveiro@3:633/10 to All on Sun Nov 23 20:11:06 2025

On 23 Nov 2025 17:51:18 GMT, Ted Nolan <tednolan> wrote:

8 is the One True TS!

Pretty useless setting, long abandoned.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bobbie Sellers@3:633/10 to All on Sun Nov 23 13:09:56 2025

On 11/23/25 05:59, Carlos E.R. wrote:

On 2025-11-23 03:17, rbowman wrote:

On Sun, 23 Nov 2025 00:23:57 +0100, Carlos E.R. wrote:

It amazes me that computers can handle Chinese. Not only display, but
keyboards.

https://www.youtube.com/watch?v=iWi-9LJ4dg4

Quite curious, thanks.

Japanese is as bad. There are over 2000 kanji characters you have to know
to be reasonably literate. Both China and Japan have tried to simplify
that character set for centuries and have gotten it down to four or five
thousand though the exact count isn't known.

I can't imagine...

2000 kanji is not all.
The syllabary of Japanese is 40 characters for Japanese and another 40 for foreign
words. Serious students go to classes just like the Japanese do to
learn this stuff then
English classes as well for the Japanese to learn the technicalities of English. Back
in the 19th Century some Japanese educators advocated moving completely to English but that would mean giving up on the language of their ancestors
and that was
a step too far.

I got interested with the idea of learning enough to read manga but in my 70s
was a bit too late for that endeavor for me.

bliss

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lawrence D?Oliveiro@3:633/10 to All on Sun Nov 23 22:57:08 2025

On Sun, 23 Nov 2025 13:09:56 -0800, Bobbie Sellers wrote:

Back in the 19th Century some Japanese educators advocated moving
completely to English but that would mean giving up on the language
of their ancestors and that was a step too far.

Apparently Mao Zedong discussed with Josef Stalin the idea of
abandoning traditional Chinese characters in favour of a Roman-based
script. Stalin told him the Chinese writing system was beautiful, and
should be kept.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Eric Pozharski@3:633/10 to All on Tue Nov 25 10:26:31 2025

with <10fvcgr$3d6mu$1@paganini.bofh.team> Waldek Hebisch wrote:

In alt.folklore.computers Eric Pozharski <apple.universe@posteo.net>
wrote:

with <akjvulxcnk.ln2@Telcontar.valinor> Carlos E.R. wrote:

On 2025-11-18 20:04, Johnny Billquist wrote:

On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:

On 16 Nov 2025 20:19:12 GMT, Ted Nolan <tednolan> wrote:

*SKIP* [ 5 lines 6 levels deep]

But with the transmission you have to transmit first what charset
you are going to use, and then you are limited by it, and the
recipient must have the same map, and be able to use it. Perhaps he
has to use his own map instead.

If only there was some arrangement to make it work. And RFC2047
readily offers some. And that would be a nail for UTF-8 coffin.

Each ISO code page is (was???) supposed to have escapape sequence to
switch to that code page. There is (was???) standard (ISO 2022 ???)
that outlined how swiching was supposed to work.

Granted, finding details of ISO through wiki isn't an easy task and
details are rather irrelevant now anyway. That being said, scaling
escape codes would be solution on the surface.

IIUC Emacs Mule used this scheme (possibly modified. AFAIK they
dumped it in favour of UTF-8.

Imagine what a beutiful mess it would be. Instead we have UTF-8. It's
a shame.

--
Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lawrence D?Oliveiro@3:633/10 to All on Tue Nov 25 20:05:48 2025

On Tue, 25 Nov 2025 10:26:31 +0000, Eric Pozharski wrote:

Instead we have UTF-8. It's a shame.

Could be worse. Could be UTF-16. (*Cough* Microsoft Windows *Cough*)

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From c186282@3:633/10 to All on Tue Nov 25 23:04:42 2025

On 11/25/25 15:05, Lawrence D?Oliveiro wrote:

On Tue, 25 Nov 2025 10:26:31 +0000, Eric Pozharski wrote:

Instead we have UTF-8. It's a shame.

Could be worse. Could be UTF-16. (*Cough* Microsoft Windows *Cough*)

Um, kinda yea :-)

I understand the reason for unicode, but
that doesn't mean I like it. Always avoid
whenever possible.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Johnny Billquist@3:633/10 to All on Thu Nov 27 19:55:29 2025

On 2025-11-22 19:20, Alexander Schreiber wrote:

Johnny Billquist <bqt@softjar.se> wrote:

On 2025-11-18 21:29, Eli the Bearded wrote:

In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:

On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:

Without UTF-8, you could not have ??? or ??? or ?�? or those curly quotes.

Of course you could.
They exist just fine in Latin-1 (hmm, maybe not the quotes...).

The Latin-1 I know does not have a Euro symbol. It does have the generic >>> currency placeholder at 0xA5: ?

Yeah. Sorry. That came in 8859-15.

Elijah
------
likes utf-8 better than iso-8859-$WHATEVER

That don't even make sense. UTF-8 is just a way to encode large integers
in a variable sequence of 8-bit bytes.
Which of course makes it a mess to figure out how long a string actually is. >> But I guess what you actually mean is that you like Unicode better than
8859-whatever.
I couldn't disagree more. Endless ways to represent the exact same
character, and weird things like sometimes having a separate codepoint
for units or prefixes, but sometimes using normal ASCII for them, and
then you have sometimes different codepoints because of colors, but
sometimes not.

Well, a big part of the reason is that human writing systems across
the globe are, in fact, quite an impressive mess from an engineering
point of view, mostly not being properly designed and all that. ;-)

I know. But the Unicode wreck can't be blamed on the human writing
system "mess". It created one completely on its own.

It's a trainwreck, but now we're stuck with it. :(

At least it sorta mostly kinda works for a somewhat wide range of
languages and scripts and you can have different scripts (latin,
cyrillic, arabic and others) in the same text. Which beats having to
figure out which code page to use for which text by quite a margin.
I _have_ been through the mess of "US ASCII works, good luck with
anything beyond that" that was text processing on e.g. MS-DOS (and
variants) and early Windows. And my native language (German) uses
essentially US-ASCII plus only a small number of letters outside of
that to begin with. Imagine if your native script has _no_ overlap
with that.

Just because there was a problem it don't follow that Unicode was a good solution.

Johnny

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Johnny Billquist@3:633/10 to All on Thu Nov 27 20:02:25 2025

On 2025-11-22 20:25, Carlos E.R. wrote:

On 2025-11-22 17:55, Johnny Billquist wrote:

On 2025-11-18 21:29, Eli the Bearded wrote:

In comp.os.linux.misc, Johnny Billquist� <bqt@softjar.se> wrote:

On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:

Without UTF-8, you could not have ??? or ??? or ?�? or those curly
quotes.

Of course you could.
They exist just fine in Latin-1 (hmm, maybe not the quotes...).

The Latin-1 I know does not have a Euro symbol. It does have the generic >>> currency placeholder at 0xA5: ?

Yeah. Sorry. That came in 8859-15.

Elijah
------
likes utf-8 better than iso-8859-$WHATEVER

That don't even make sense. UTF-8 is just a way to encode large
integers in a variable sequence of 8-bit bytes.
Which of course makes it a mess to figure out how long a string
actually is.
But I guess what you actually mean is that you like Unicode better
than 8859-whatever.
I couldn't disagree more. Endless ways to represent the exact same
character, and weird things like sometimes having a separate codepoint
for units or prefixes, but sometimes using normal ASCII for them, and
then you have sometimes different codepoints because of colors, but
sometimes not.
It's a trainwreck, but now we're stuck with it. :(

Encode large integers? No.

Ok. Call it "encode Unicode" then if that makes you happier. And Unicode codepoints can be described as integers (in fact, they are, which is why
you see U+nnnn, where nnnn is a hex value, for codepoints), and have a
range of roughly 2^22.

UTF-8 isn't defining any characters, just defining a way to represent
Unicode characters using a variable number of 8-bit bytes.

Johnny

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Johnny Billquist@3:633/10 to All on Thu Nov 27 20:10:08 2025

On 2025-11-25 21:05, Lawrence D?Oliveiro wrote:

On Tue, 25 Nov 2025 10:26:31 +0000, Eric Pozharski wrote:

Instead we have UTF-8. It's a shame.

Could be worse. Could be UTF-16. (*Cough* Microsoft Windows *Cough*)

In which way is it worse? It's the same character set, just encoded
using one or two 16-bit values instead of 1-4 8-bit values. You still
need to extract the actual Unicode value out of that encoding before
showing anything and vice versa.

Johnny

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Carlos E.R.@3:633/10 to All on Thu Nov 27 20:16:47 2025

On 2025-11-27 20:02, Johnny Billquist wrote:

On 2025-11-22 20:25, Carlos E.R. wrote:

On 2025-11-22 17:55, Johnny Billquist wrote:

On 2025-11-18 21:29, Eli the Bearded wrote:

In comp.os.linux.misc, Johnny Billquist� <bqt@softjar.se> wrote:

On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:

Without UTF-8, you could not have ??? or ??? or ?�? or those curly >>>>>> quotes.

Of course you could.
They exist just fine in Latin-1 (hmm, maybe not the quotes...).

The Latin-1 I know does not have a Euro symbol. It does have the
generic
currency placeholder at 0xA5: ?

Yeah. Sorry. That came in 8859-15.

Elijah
------
likes utf-8 better than iso-8859-$WHATEVER

That don't even make sense. UTF-8 is just a way to encode large
integers in a variable sequence of 8-bit bytes.
Which of course makes it a mess to figure out how long a string
actually is.
But I guess what you actually mean is that you like Unicode better
than 8859-whatever.
I couldn't disagree more. Endless ways to represent the exact same
character, and weird things like sometimes having a separate
codepoint for units or prefixes, but sometimes using normal ASCII for
them, and then you have sometimes different codepoints because of
colors, but sometimes not.
It's a trainwreck, but now we're stuck with it. :(

Encode large integers? No.

Ok. Call it "encode Unicode" then if that makes you happier. And Unicode codepoints can be described as integers (in fact, they are, which is why
you see U+nnnn, where nnnn is a hex value, for codepoints), and have a
range of roughly 2^22.

UTF-8 isn't defining any characters, just defining a way to represent Unicode characters using a variable number of 8-bit bytes.

And 8 bit ascii letters are also numbers representing characters. That's
how computers work.

--
Cheers, Carlos.
ES??, EU??;

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Scott Lurndal@3:633/10 to All on Thu Nov 27 20:19:28 2025

Johnny Billquist <bqt@softjar.se> writes:

On 2025-11-25 21:05, Lawrence D?Oliveiro wrote:

On Tue, 25 Nov 2025 10:26:31 +0000, Eric Pozharski wrote:

Instead we have UTF-8. It's a shame.

Could be worse. Could be UTF-16. (*Cough* Microsoft Windows *Cough*)

In which way is it worse? It's the same character set, just encoded
using one or two 16-bit values instead of 1-4 8-bit values. You still
need to extract the actual Unicode value out of that encoding before
showing anything and vice versa.

Because the endianness can vary, and thus UTF-16 requires a BOM.

UTF-16 should have been a non-starter.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Richard Kettlewell@3:633/10 to All on Thu Nov 27 20:44:03 2025

Johnny Billquist <bqt@softjar.se> writes:

Lawrence D?Oliveiro wrote:

Eric Pozharski wrote:

Instead we have UTF-8. It's a shame.

Could be worse. Could be UTF-16. (*Cough* Microsoft Windows *Cough*)

In which way is it worse? It's the same character set, just encoded
using one or two 16-bit values instead of 1-4 8-bit values. You still
need to extract the actual Unicode value out of that encoding before
showing anything and vice versa.

UTF-8 has the advantage that we can still use byte-oriented string representations, in programming languages, file formats, network
protocols etc. The upgrade from ASCII is easy.

UTF-32 loses that advantage, and that and its endianness-dependence make
it a poor choice in most contexts, but in return you get the property
that one code point is one code unit, useful when processing strings in Unicode-aware ways.

UTF-16 has neither advantage. Upgrading from ASCII can?t be done
compatibly with existing applications, and you don?t even get the
fixed-width encoding of UTF-32 in return. It?s the worst of all possible worlds. About the only thing to recommend it is that it can be the most compact representation in certain contexts.

--
https://www.greenend.org.uk/rjk/

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lawrence D?Oliveiro@3:633/10 to All on Thu Nov 27 21:18:55 2025

On Thu, 27 Nov 2025 20:16:47 +0100, Carlos E.R. wrote:

And 8 bit ascii letters are also numbers representing characters. That's
how computers work.

Everything that computers deal with is a number. Think of a computer
program as a very large integer. There?s even a name for this: the G�del number.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Chris Ahlstrom@3:633/10 to All on Fri Nov 28 07:54:45 2025

Richard Kettlewell wrote this post by blinking in Morse code:

Johnny Billquist <bqt@softjar.se> writes:

Lawrence D?Oliveiro wrote:

Eric Pozharski wrote:

Instead we have UTF-8. It's a shame.

Could be worse. Could be UTF-16. (*Cough* Microsoft Windows *Cough*)

In which way is it worse? It's the same character set, just encoded
using one or two 16-bit values instead of 1-4 8-bit values. You still
need to extract the actual Unicode value out of that encoding before
showing anything and vice versa.

UTF-8 has the advantage that we can still use byte-oriented string representations, in programming languages, file formats, network
protocols etc. The upgrade from ASCII is easy.

UTF-32 loses that advantage, and that and its endianness-dependence make
it a poor choice in most contexts, but in return you get the property
that one code point is one code unit, useful when processing strings in Unicode-aware ways.

UTF-16 has neither advantage. Upgrading from ASCII can?t be done
compatibly with existing applications, and you don?t even get the
fixed-width encoding of UTF-32 in return. It?s the worst of all possible worlds. About the only thing to recommend it is that it can be the most compact representation in certain contexts.

Certain contexts like ... Windows? :-)

--
Oh, give me a home,
Where the buffalo roam,
And I'll show you a house with a really messy kitchen.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Alexander Schreiber@3:633/10 to All on Fri Nov 28 22:08:53 2025

Johnny Billquist <bqt@softjar.se> wrote:

On 2025-11-22 19:20, Alexander Schreiber wrote:

Johnny Billquist <bqt@softjar.se> wrote:

On 2025-11-18 21:29, Eli the Bearded wrote:

In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:

On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:

Without UTF-8, you could not have ??? or ??? or ?�? or those curly quotes.

Of course you could.
They exist just fine in Latin-1 (hmm, maybe not the quotes...).

The Latin-1 I know does not have a Euro symbol. It does have the generic >>>> currency placeholder at 0xA5: ?

Yeah. Sorry. That came in 8859-15.

Elijah
------
likes utf-8 better than iso-8859-$WHATEVER

That don't even make sense. UTF-8 is just a way to encode large integers >>> in a variable sequence of 8-bit bytes.
Which of course makes it a mess to figure out how long a string actually is.
But I guess what you actually mean is that you like Unicode better than
8859-whatever.
I couldn't disagree more. Endless ways to represent the exact same
character, and weird things like sometimes having a separate codepoint
for units or prefixes, but sometimes using normal ASCII for them, and
then you have sometimes different codepoints because of colors, but
sometimes not.

Well, a big part of the reason is that human writing systems across
the globe are, in fact, quite an impressive mess from an engineering
point of view, mostly not being properly designed and all that. ;-)

I know. But the Unicode wreck can't be blamed on the human writing
system "mess". It created one completely on its own.

It's a trainwreck, but now we're stuck with it. :(

At least it sorta mostly kinda works for a somewhat wide range of
languages and scripts and you can have different scripts (latin,
cyrillic, arabic and others) in the same text. Which beats having to
figure out which code page to use for which text by quite a margin.
I _have_ been through the mess of "US ASCII works, good luck with
anything beyond that" that was text processing on e.g. MS-DOS (and
variants) and early Windows. And my native language (German) uses
essentially US-ASCII plus only a small number of letters outside of
that to begin with. Imagine if your native script has _no_ overlap
with that.

Just because there was a problem it don't follow that Unicode was a good solution.

I'm not claiming it is a good solution, but it is the solution we ended up
with that reasonably covers a lot of the problem space. Given that:
- it covers a wide and very irregular problem space
- it
- it is, due to the problem scope, a design by committee
ending with a solution being bit of a mess is hardly avoidable.

It has the property of "working well enough most of the time", which is
already a big impediment to anyone spending the time, money and brains
in order to:
- come up with a New And Improved Design That Surely Has No Warts
- establish it as the new standard

Honestly: not happening.

Kind regards,
Alex.
--
"Opportunity is missed by most people because it is dressed in overalls and
looks like work." -- Thomas A. Edison

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Alexander Schreiber@3:633/10 to All on Fri Nov 28 22:10:35 2025

Richard Kettlewell <invalid@invalid.invalid> wrote:

Johnny Billquist <bqt@softjar.se> writes:

Lawrence D?Oliveiro wrote:

Eric Pozharski wrote:

Instead we have UTF-8. It's a shame.

Could be worse. Could be UTF-16. (*Cough* Microsoft Windows *Cough*)

In which way is it worse? It's the same character set, just encoded
using one or two 16-bit values instead of 1-4 8-bit values. You still
need to extract the actual Unicode value out of that encoding before
showing anything and vice versa.

UTF-8 has the advantage that we can still use byte-oriented string representations, in programming languages, file formats, network
protocols etc. The upgrade from ASCII is easy.

UTF-32 loses that advantage, and that and its endianness-dependence make
it a poor choice in most contexts, but in return you get the property
that one code point is one code unit, useful when processing strings in Unicode-aware ways.

UTF-16 has neither advantage. Upgrading from ASCII can?t be done
compatibly with existing applications, and you don?t even get the
fixed-width encoding of UTF-32 in return. It?s the worst of all possible worlds.

Thus making it the _perfect_ choice of encoding for Microsoft Windows.

SCNR,
Alex.
--
"Opportunity is missed by most people because it is dressed in overalls and
looks like work." -- Thomas A. Edison

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Goodwin@3:633/10 to All on Sat Nov 29 13:13:26 2025

In article <slrn10ik3ub.2dppt.als@mordor.angband.thangorodrim.de>, als@usenet.thangorodrim.de says...

Richard Kettlewell <invalid@invalid.invalid> wrote:

Johnny Billquist <bqt@softjar.se> writes:

Lawrence D?Oliveiro wrote:

Eric Pozharski wrote:

Instead we have UTF-8. It's a shame.

Could be worse. Could be UTF-16. (*Cough* Microsoft Windows *Cough*)

In which way is it worse? It's the same character set, just encoded
using one or two 16-bit values instead of 1-4 8-bit values. You still
need to extract the actual Unicode value out of that encoding before
showing anything and vice versa.

UTF-8 has the advantage that we can still use byte-oriented string representations, in programming languages, file formats, network
protocols etc. The upgrade from ASCII is easy.

UTF-32 loses that advantage, and that and its endianness-dependence make
it a poor choice in most contexts, but in return you get the property
that one code point is one code unit, useful when processing strings in Unicode-aware ways.

UTF-16 has neither advantage. Upgrading from ASCII can?t be done
compatibly with existing applications, and you don?t even get the fixed-width encoding of UTF-32 in return. It?s the worst of all possible worlds.

Thus making it the _perfect_ choice of encoding for Microsoft Windows.

To be fair, Windows NT was an early adopter of Unicode and at the time
that meant 16 bits per character (UCS-2). Betas and SDKs had already
been in developers hands for a year and the first release only a few
months away when UTF-8 was first presented.

Windows NT has supported UTF-8 for a couple of years now, but as its
still a fairly recent thing there are still some rough edges and I
expect not much uses it yet.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lawrence D?Oliveiro@3:633/10 to All on Sat Nov 29 02:57:15 2025

On Sat, 29 Nov 2025 13:13:26 +1300, David Goodwin wrote:

To be fair, Windows NT was an early adopter of Unicode and at the time
that meant 16 bits per character (UCS-2).

It did, indeed. Java was another one that bought into the whole UCS-2
thing at just the wrong time.

And then the Unicode folks went ?on second thoughts, let?s widen it to
include past writing systems as well, not just present-day ones. And let?s
add some other fun stuff while we?re at it?.

Linux, on the other hand, simply ignored the whole issue. The kernel just
says ?ASCII slash is the path component separator, ASCII NUL is the path terminator?, and leaves the rest up to userland.

And UTF-8 fits rather neatly into that.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From The Natural Philosopher@3:633/10 to All on Sat Nov 29 11:20:08 2025

On 28/11/2025 21:10, Alexander Schreiber wrote:

UTF-16 has neither advantage. Upgrading from ASCII can?t be done
compatibly with existing applications, and you don?t even get the
fixed-width encoding of UTF-32 in return. It?s the worst of all possible
worlds.

Thus making it the_perfect_ choice of encoding for Microsoft Windows.

SCNR,
Alex.

Indeed.

--
"Nature does not give up the winter because people dislike the cold."

? Confucius

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Carlos E.R.@3:633/10 to All on Sat Nov 29 13:35:39 2025

On 2025-11-29 01:13, David Goodwin wrote:

In article <slrn10ik3ub.2dppt.als@mordor.angband.thangorodrim.de>, als@usenet.thangorodrim.de says...

Richard Kettlewell <invalid@invalid.invalid> wrote:

Johnny Billquist <bqt@softjar.se> writes:

Lawrence D?Oliveiro wrote:

Eric Pozharski wrote:

Instead we have UTF-8. It's a shame.

Could be worse. Could be UTF-16. (*Cough* Microsoft Windows *Cough*)

In which way is it worse? It's the same character set, just encoded
using one or two 16-bit values instead of 1-4 8-bit values. You still
need to extract the actual Unicode value out of that encoding before
showing anything and vice versa.

UTF-8 has the advantage that we can still use byte-oriented string
representations, in programming languages, file formats, network
protocols etc. The upgrade from ASCII is easy.

UTF-32 loses that advantage, and that and its endianness-dependence make >>> it a poor choice in most contexts, but in return you get the property
that one code point is one code unit, useful when processing strings in
Unicode-aware ways.

UTF-16 has neither advantage. Upgrading from ASCII can?t be done
compatibly with existing applications, and you don?t even get the
fixed-width encoding of UTF-32 in return. It?s the worst of all possible >>> worlds.

Thus making it the _perfect_ choice of encoding for Microsoft Windows.

To be fair, Windows NT was an early adopter of Unicode and at the time
that meant 16 bits per character (UCS-2). Betas and SDKs had already
been in developers hands for a year and the first release only a few
months away when UTF-8 was first presented.

Windows NT has supported UTF-8 for a couple of years now, but as its
still a fairly recent thing there are still some rough edges and I
expect not much uses it yet.

I'm curious. My Thunderbird in Linux uses UTF-8 by default. What would
use TB in Windows 10 or 11?

--
Cheers, Carlos.
ES??, EU??;

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Richard Kettlewell@3:633/10 to All on Sat Nov 29 13:45:47 2025

"Carlos E.R." <robin_listas@es.invalid> writes:

On 2025-11-29 01:13, David Goodwin wrote:

To be fair, Windows NT was an early adopter of Unicode and at the
time that meant 16 bits per character (UCS-2). Betas and SDKs had
already been in developers hands for a year and the first release
only a few months away when UTF-8 was first presented.

Windows NT has supported UTF-8 for a couple of years now, but as its
still a fairly recent thing there are still some rough edges and I
expect not much uses it yet.

I'm curious. My Thunderbird in Linux uses UTF-8 by default. What would
use TB in Windows 10 or 11?

Applications don?t need any OS support to use UTF-8. Thunderbird has
used UTF-8 on Windows since at least 2007 (based on my mail archives),
and probably since before it was spun out from the browser.

Date: Thu, 03 May 2007 16:53:52 +0100
User-Agent: Thunderbird 1.5.0.10 (Windows/20070221)
Content-Type: text/plain; charset=utf-8

Use of UTF-8 in encoding-sensitive Windows API calls is relatively
recent, though AFAIK a more than a couple of years ago.

As David says the reason for UTF-16 in Windows is that it adopted
Unicode both before it outgrew 16-bit representations, and before UTF-8
was standardized. In short, Microsoft were ahead of the curve. The slow adoption of UTF-8 is less justifiable.

--
https://www.greenend.org.uk/rjk/

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Stéphane CARPENTIER@3:633/10 to All on Sat Nov 29 15:06:07 2025

Le 29-11-2025, Carlos E.R. <robin_listas@es.invalid> a �crit�:

I'm curious. My Thunderbird in Linux uses UTF-8 by default. What would
use TB in Windows 10 or 11?

I don't know about Windows 11. But in France, Windows 10 is still using
either CP1252 or utf-8 depending on the way you are creating documents.
For example, if you are using, sorry I don't know the English name, the
"block note", it's utf-8. But if you are creating a text document with
Excel saving it as csv, it's cp1252. And there is no way around it:
Microsoft decides for you what you should use. For FireFox, I'm not
sure. I'd say it's cp1252 because it looks like that but I can't be 100%
sure about that.

For me cp1252 is just like ISO-8850-(1 or 15): a thing of the past which
only place should be in a museum. Agreed, utf-8 has some issues, but
everything else is just worse.

--
Si vous avez du temps � perdre :
https://scarpet42.gitlab.io

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Peter Flass@3:633/10 to All on Sat Nov 29 12:45:58 2025

On 11/29/25 04:20, The Natural Philosopher wrote:

On 28/11/2025 21:10, Alexander Schreiber wrote:

UTF-16 has neither advantage. Upgrading from ASCII can?t be done
compatibly with existing applications, and you don?t even get the
fixed-width encoding of UTF-32 in return. It?s the worst of all possible >>> worlds.

Thus making it the_perfect_� choice of encoding for Microsoft Windows.

IBM mainframes and System i use UTF-16.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From The Natural Philosopher@3:633/10 to All on Sun Nov 30 10:37:07 2025

On 29/11/2025 19:45, Peter Flass wrote:

On 11/29/25 04:20, The Natural Philosopher wrote:

On 28/11/2025 21:10, Alexander Schreiber wrote:

UTF-16 has neither advantage. Upgrading from ASCII can?t be done
compatibly with existing applications, and you don?t even get the
fixed-width encoding of UTF-32 in return. It?s the worst of all
possible
worlds.

Thus making it the_perfect_� choice of encoding for Microsoft Windows.

IBM mainframes and System i use UTF-16.

I would have thought the font would be a function of software, not tied
to any hardware.

There is probably an argument for either 32 or 64 bit characters these
days. Same as integers grew from 16 to 64 bit...

--
"The great thing about Glasgow is that if there's a nuclear attack it'll
look exactly the same afterwards."

Billy Connolly

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Richard Kettlewell@3:633/10 to All on Sun Nov 30 11:06:54 2025

The Natural Philosopher <tnp@invalid.invalid> writes:

On 29/11/2025 19:45, Peter Flass wrote:

On 11/29/25 04:20, The Natural Philosopher wrote:

On 28/11/2025 21:10, Alexander Schreiber wrote:

UTF-16 has neither advantage. Upgrading from ASCII can?t be done
compatibly with existing applications, and you don?t even get the
fixed-width encoding of UTF-32 in return. It?s the worst of all
possible worlds.

Thus making it the_perfect_� choice of encoding for Microsoft Windows.

IBM mainframes and System i use UTF-16.

I would have thought the font would be a function of software, not
tied to any hardware.

Yes. There?s a lot of confusion in this thread. OS APIs may work in one encoding, a language runtime might prefer another, and an application
may represent its strings in yet another (or more than one, if it?s
dealing with web pages, emails, etc). Hardware makes very little
difference beyond the extent to which hardware is bound to particular
operating systems (although trying to do UTF-32 on a Z80 would be rather inconvenient).

There is probably an argument for either 32 or 64 bit characters these
days. Same as integers grew from 16 to 64 bit...

C has had a wide character type (wchar_t) available for years. It?s
basically unusuable, almost everyone uses char and UTF-8, because all
the existing software works that way and it?s hugely cheaper to adopt
UTF-8 and smooth off a few sharp edges than to rewrite everything to use
a new character type. There are absolutely use cases for switching
temporarily to 32-bit characters but for the most part, it?s just not
worth it.

Go is similar. Strings are made of bytes, normally encoded as UTF-8, but
it?s straightforward to get the 32-bit code points out when you need
them. Rust has a 32-bit ?char? but the String type is UTF-8 (with a
separate type for non-UTF-8 strings).

Python takes a different approach. Its internal string representation dynamically picks 8, 16 or 32 bits depending on the string contents,
with UTF-8 created on demand and cached.

--
https://www.greenend.org.uk/rjk/

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Nuno Silva@3:633/10 to All on Sun Nov 30 12:29:57 2025

On 2025-11-30, The Natural Philosopher wrote:

On 29/11/2025 19:45, Peter Flass wrote:

On 11/29/25 04:20, The Natural Philosopher wrote:

On 28/11/2025 21:10, Alexander Schreiber wrote:

UTF-16 has neither advantage. Upgrading from ASCII can?t be done
compatibly with existing applications, and you don?t even get the
fixed-width encoding of UTF-32 in return. It?s the worst of all
possible
worlds.

Thus making it the_perfect_� choice of encoding for Microsoft Windows. >>>>

IBM mainframes and System i use UTF-16.

I would have thought the font would be a function of software, not
tied to any hardware.

There is probably an argument for either 32 or 64 bit characters these
days. Same as integers grew from 16 to 64 bit...

This is not about fonts, this is about encodings.

--
Nuno Silva

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From The Natural Philosopher@3:633/10 to All on Sun Nov 30 13:23:02 2025

On 30/11/2025 12:29, Nuno Silva wrote:

On 2025-11-30, The Natural Philosopher wrote:

On 29/11/2025 19:45, Peter Flass wrote:

On 11/29/25 04:20, The Natural Philosopher wrote:

On 28/11/2025 21:10, Alexander Schreiber wrote:

UTF-16 has neither advantage. Upgrading from ASCII can?t be done
compatibly with existing applications, and you don?t even get the
fixed-width encoding of UTF-32 in return. It?s the worst of all
possible
worlds.

Thus making it the_perfect_� choice of encoding for Microsoft Windows. >>>>>

IBM mainframes and System i use UTF-16.

I would have thought the font would be a function of software, not
tied to any hardware.

There is probably an argument for either 32 or 64 bit characters these
days. Same as integers grew from 16 to 64 bit...

This is not about fonts, this is about encodings.

Well fonts that contain UTF-8 etc....

?at is ?e pointe...

--
"An intellectual is a person knowledgeable in one field who speaks out
only in others...?

Tom Wolfe

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lawrence D?Oliveiro@3:633/10 to All on Sun Nov 30 16:20:11 2025

On Sun, 30 Nov 2025 11:06:54 +0000, Richard Kettlewell wrote:

Python takes a different approach. Its internal string
representation dynamically picks 8, 16 or 32 bits depending on the
string contents, with UTF-8 created on demand and cached.

Its ?str? type (immutable) is nominally UTF-32.

It has separate ?bytes? (immutable) and ?bytearray? (mutable) types.
You have to explicitly convert between these and ?str?.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Rich@3:633/10 to All on Sun Nov 30 16:56:58 2025

In comp.os.linux.misc The Natural Philosopher <tnp@invalid.invalid> wrote:

On 30/11/2025 12:29, Nuno Silva wrote:

On 2025-11-30, The Natural Philosopher wrote:

On 29/11/2025 19:45, Peter Flass wrote:

On 11/29/25 04:20, The Natural Philosopher wrote:

On 28/11/2025 21:10, Alexander Schreiber wrote:

UTF-16 has neither advantage. Upgrading from ASCII can?t be done >>>>>>> compatibly with existing applications, and you don?t even get the >>>>>>> fixed-width encoding of UTF-32 in return. It?s the worst of all
possible
worlds.

Thus making it the_perfect_� choice of encoding for Microsoft Windows. >>>>>>

IBM mainframes and System i use UTF-16.

I would have thought the font would be a function of software, not
tied to any hardware.

There is probably an argument for either 32 or 64 bit characters these
days. Same as integers grew from 16 to 64 bit...

This is not about fonts, this is about encodings.

Well fonts that contain UTF-8 etc....

Technically, font's don't contain UTF-8 (nor any other encoding).

Font's contain numbered glyphs (character drawings). Usually, modern
font's number the glyphs using the code point numbers assigned to
characters by Unicode. But your mileage will vary with older font
files as to what numbering scheme they used (I suppose if one looked
long enough one could find a font file somewhere that used EBDIC
character numbering assignments).

UTF-8 is a code point (character number) encoding. A way to store the "numbers" that reference which font glyph to display on disk/in
memory/on the wire/etc.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Goodwin@3:633/10 to All on Mon Dec 1 11:09:40 2025

In article <mp3gv8Ff0euU1@mid.individual.net>, bowman@montana.com
says...

of WCHAR and TCHAR based on a compiler flag, and
so forth it's a lot of fun.

I had a play around with this last year, and if you use TCHAR and its
related functions consistently it turns out to be relatively easy to
write code that can be built as either Unicode or "ANSI" (really some
local codepage, potentially with a multibyte encoding) by just changing
the compiler flag. But its easy to slip up if you're not careful, so you really need to be building both variants regularly and I suspect most
didn't bother.

The only real complication I ran into is that some newer Windows APIs
are available only in their Unicode form so in a non-Unicode build I'd
have to convert strings to/from Unicode in a few places. Will be
interesting to see if that eventually changes now that UTF-8 support is
a thing.

Today UTF-8 is implemented as just another "non-unicode" multibyte
character set, so if you wanted to switch your app to UTF-8 and were
using TCHARs everywhere I think you'd just turn off the unicode compiler
flag and tell Windows (via the app manifest) that you want to use the
UTF-8 codepage. I've not tried this myself yet, but its on my to-do
list.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Goodwin@3:633/10 to All on Mon Dec 1 11:18:02 2025

In article <10ght0p$gnag$2@dont-email.me>, rich@example.invalid says...

UTF-8 is a code point (character number) encoding. A way to store the "numbers" that reference which font glyph to display on disk/in
memory/on the wire/etc.

And to further complicate matters, what looks like a single character (grapheme) to the user may be encoded as multiple code points combined together. So even if you're using UTF-32, characters are still variable length.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Carlos E.R.@3:633/10 to All on Mon Dec 1 01:55:05 2025

On 2025-11-30 23:18, David Goodwin wrote:

In article <10ght0p$gnag$2@dont-email.me>, rich@example.invalid says...

UTF-8 is a code point (character number) encoding. A way to store the
"numbers" that reference which font glyph to display on disk/in
memory/on the wire/etc.

And to further complicate matters, what looks like a single character (grapheme) to the user may be encoded as multiple code points combined together. So even if you're using UTF-32, characters are still variable length.

The flags in my signature are two chars each.

�The Spanish flag emoji is represented by the Unicode sequence U+1F1EA U+1F1F8, which combines the Regional Indicator Symbol Letters 'E' and
'S'. In UTF-8, this corresponds to the hexadecimal byte sequence F0 9F
87 8A F0 9F 87 88 or can be written as the HTML entity 🇪🇸�

--
Cheers, Carlos.
ES??, EU??;

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Richard Kettlewell@3:633/10 to All on Mon Dec 1 08:54:34 2025

Lawrence D?Oliveiro <ldo@nz.invalid> writes:

Richard Kettlewell wrote:

Python takes a different approach. Its internal string
representation dynamically picks 8, 16 or 32 bits depending on the
string contents, with UTF-8 created on demand and cached.

Its ?str? type (immutable) is nominally UTF-32.

No. RTFM. At the Python level, str is a sequence of values that
represent Unicode code points. There is no statement that they are
UTF-32. For all the Python programmer knows it could be packed 21-bit or
3-byte fields, among other possibilities; they would not be able to tell
the difference from Python.

The internal representation in the C implementation of Python is as
stated above. (In principle other implementations could make other
decisions.)

--
https://www.greenend.org.uk/rjk/

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Marc Haber@3:633/10 to All on Sun Nov 30 13:29:13 2025

Peter Flass <Peter@Iron-Spring.com> wrote:

On 11/29/25 04:20, The Natural Philosopher wrote:

On 28/11/2025 21:10, Alexander Schreiber wrote:

UTF-16 has neither advantage. Upgrading from ASCII can?t be done
compatibly with existing applications, and you don?t even get the
fixed-width encoding of UTF-32 in return. It?s the worst of all possible >>>> worlds.

Thus making it the_perfect_� choice of encoding for Microsoft Windows.

IBM mainframes and System i use UTF-16.

Doesn't that depend on the OS that is being used there?

Gr��e
Marc
-- ---------------------------------------------------------------------------- Marc Haber | " Questions are the | Mailadresse im Header Rhein-Neckar, DE | Beginning of Wisdom " |
Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fon: *49 6224 1600402

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Johnny Billquist@3:633/10 to All on Wed Dec 3 13:37:13 2025

On 2025-11-28 22:08, Alexander Schreiber wrote:

Johnny Billquist <bqt@softjar.se> wrote:

Just because there was a problem it don't follow that Unicode was a good
solution.

I'm not claiming it is a good solution, but it is the solution we ended up with that reasonably covers a lot of the problem space. Given that:
- it covers a wide and very irregular problem space
- it
- it is, due to the problem scope, a design by committee
ending with a solution being bit of a mess is hardly avoidable.

It has the property of "working well enough most of the time", which is already a big impediment to anyone spending the time, money and brains
in order to:
- come up with a New And Improved Design That Surely Has No Warts
- establish it as the new standard

Honestly: not happening.

I know that Unicode is here to stay. Said as much before. But it has introduced a whole range of problems that people tend to pretend don't
exist. The most immediate one coming to my mind are all kind of scammers creating fake domains to phish stuff. Using known, trusted company
names, but letters replaced by things that look visually equivalent, but actually are other characters, and then through those domains fool
people to give information, such as passwords, account numbers, money,
and god knows what else.

A big part of the problem is that Unicode don't even seem to have known
what problem is was supposed to solve. Was it about representing
different characters that have different meanings? Was it about
representing same characters but with different visual effects? Was it supposed to be some kind of generic system to modify characters through
some clever system design?
As it is, it's sortof all of these, but none of them properly.

And it makes it a hellhole to deal with.

But yes. Now it exists. We are not going to replace it.

Johnny

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Johnny Billquist@3:633/10 to All on Wed Dec 3 13:38:34 2025

On 2025-11-27 20:16, Carlos E.R. wrote:

On 2025-11-27 20:02, Johnny Billquist wrote:

On 2025-11-22 20:25, Carlos E.R. wrote:

On 2025-11-22 17:55, Johnny Billquist wrote:

On 2025-11-18 21:29, Eli the Bearded wrote:

In comp.os.linux.misc, Johnny Billquist� <bqt@softjar.se> wrote:

On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:

Without UTF-8, you could not have ??? or ??? or ?�? or those
curly quotes.

Of course you could.
They exist just fine in Latin-1 (hmm, maybe not the quotes...).

The Latin-1 I know does not have a Euro symbol. It does have the
generic
currency placeholder at 0xA5: ?

Yeah. Sorry. That came in 8859-15.

Elijah
------
likes utf-8 better than iso-8859-$WHATEVER

That don't even make sense. UTF-8 is just a way to encode large
integers in a variable sequence of 8-bit bytes.
Which of course makes it a mess to figure out how long a string
actually is.
But I guess what you actually mean is that you like Unicode better
than 8859-whatever.
I couldn't disagree more. Endless ways to represent the exact same
character, and weird things like sometimes having a separate
codepoint for units or prefixes, but sometimes using normal ASCII
for them, and then you have sometimes different codepoints because
of colors, but sometimes not.
It's a trainwreck, but now we're stuck with it. :(

Encode large integers? No.

Ok. Call it "encode Unicode" then if that makes you happier. And
Unicode codepoints can be described as integers (in fact, they are,
which is why you see U+nnnn, where nnnn is a hex value, for
codepoints), and have a range of roughly 2^22.

UTF-8 isn't defining any characters, just defining a way to represent
Unicode characters using a variable number of 8-bit bytes.

And 8 bit ascii letters are also numbers representing characters. That's
how computers work.

Right. So both ASCII and Unicode use numbers to represent characters.
Note that UTF-8 didn't get mentioned in that sentence.

Johnny

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Johnny Billquist@3:633/10 to All on Wed Dec 3 13:44:02 2025

On 2025-11-27 21:19, Scott Lurndal wrote:

Johnny Billquist <bqt@softjar.se> writes:

On 2025-11-25 21:05, Lawrence D?Oliveiro wrote:

On Tue, 25 Nov 2025 10:26:31 +0000, Eric Pozharski wrote:

Instead we have UTF-8. It's a shame.

Could be worse. Could be UTF-16. (*Cough* Microsoft Windows *Cough*)

In which way is it worse? It's the same character set, just encoded
using one or two 16-bit values instead of 1-4 8-bit values. You still
need to extract the actual Unicode value out of that encoding before
showing anything and vice versa.

Because the endianness can vary, and thus UTF-16 requires a BOM.

UTF-16 should have been a non-starter.

Well, then use UTF-16BE or UTF-16LE, and a BOM is not only not required,
but is actually forbidden.

Johnny

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Carlos E.R.@3:633/10 to All on Wed Dec 3 13:56:25 2025

On 2025-12-03 13:37, Johnny Billquist wrote:

On 2025-11-28 22:08, Alexander Schreiber wrote:

Johnny Billquist <bqt@softjar.se> wrote:

Just because there was a problem it don't follow that Unicode was a good >>> solution.

I'm not claiming it is a good solution, but it is the solution we
ended up
with that reasonably covers a lot of the problem space. Given that:
� - it covers a wide and very irregular problem space
� - it
� - it is, due to the problem scope, a design by committee
ending with a solution being bit of a mess is hardly avoidable.

It has the property of "working well enough most of the time", which is
already a big impediment to anyone spending the time, money and brains
in order to:
� - come up with a New And Improved Design That Surely Has No Warts
� - establish it as the new standard

Honestly: not happening.

I know that Unicode is here to stay. Said as much before. But it has introduced a whole range of problems that people tend to pretend don't exist. The most immediate one coming to my mind are all kind of scammers creating fake domains to phish stuff. Using known, trusted company
names, but letters replaced by things that look visually equivalent, but actually are other characters, and then through those domains fool
people to give information, such as passwords, account numbers, money,
and god knows what else.

A big part of the problem is that Unicode don't even seem to have known
what problem is was supposed to solve.

No? The problem is that ASCII only represent the USA view of the alphabet.

--
Cheers, Carlos.
ES??, EU??;

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From The Natural Philosopher@3:633/10 to All on Wed Dec 3 13:58:54 2025

On 03/12/2025 12:56, Carlos E.R. wrote:
e.

No? The problem is that ASCII only represent the USA view of the alphabet.

The problem is that ASCII only represent the USA view of *ONE* alphabet.
?at is ?e problem...

And, worse, many writing methods do not use alphabets...

--
It is the folly of too many to mistake the echo of a London coffee-house
for the voice of the kingdom.

Jonathan Swift

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Peter Flass@3:633/10 to All on Wed Dec 3 07:39:24 2025

On 12/3/25 05:37, Johnny Billquist wrote:

On 2025-11-28 22:08, Alexander Schreiber wrote:

Johnny Billquist <bqt@softjar.se> wrote:

Just because there was a problem it don't follow that Unicode was a good >>> solution.

I'm not claiming it is a good solution, but it is the solution we
ended up
with that reasonably covers a lot of the problem space. Given that:
� - it covers a wide and very irregular problem space
� - it
� - it is, due to the problem scope, a design by committee
ending with a solution being bit of a mess is hardly avoidable.

It has the property of "working well enough most of the time", which is
already a big impediment to anyone spending the time, money and brains
in order to:
� - come up with a New And Improved Design That Surely Has No Warts
� - establish it as the new standard

Honestly: not happening.

I know that Unicode is here to stay. Said as much before. But it has introduced a whole range of problems that people tend to pretend don't exist. The most immediate one coming to my mind are all kind of scammers creating fake domains to phish stuff. Using known, trusted company
names, but letters replaced by things that look visually equivalent, but actually are other characters, and then through those domains fool
people to give information, such as passwords, account numbers, money,
and god knows what else.

I discovered this when I tried to set up spam filters and couldn't
figure out why they weren't working.

A big part of the problem is that Unicode don't even seem to have known
what problem is was supposed to solve. Was it about representing
different characters that have different meanings? Was it about
representing same characters but with different visual effects? Was it supposed to be some kind of generic system to modify characters through
some clever system design?
As it is, it's sortof all of these, but none of them properly.

It's supposed to be about the meanings of the characters. Capital 'A' in
any font is the same Unicode character, but two characters that look
identical but have different meanings are two.

The biggest problem I have with any Unicode representation except (I
think) UTF-32 is that a program has no way of knowing how long a string
is without encoding/decoding it. Given a string of characters in some codepage, how many bytes does it occupy when converted to UTF-8? Given a
UTF-8 character string, how many character positions does it occupy,
say, for example, when displayed on a screen?

And it makes it a hellhole to deal with.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Diego Garcia@3:633/10 to All on Wed Dec 3 14:40:06 2025

On Wed, 3 Dec 2025 13:37:13 +0100, Johnny Billquist wrote:

I know that Unicode is here to stay. Said as much before. But it has introduced a whole range of problems that people tend to pretend don't exist. The most immediate one coming to my mind are all kind of scammers creating fake domains to phish stuff. Using known, trusted company
names, but letters replaced by things that look visually equivalent, but actually are other characters, and then through those domains fool
people to give information, such as passwords, account numbers, money,
and god knows what else.

That's not actually a problem. Punycode has been developed to deal
with it.

All software should now be offering a dual display of all FQDNs as
both Unicode and Punycode.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Carlos E.R.@3:633/10 to All on Wed Dec 3 15:40:17 2025

On 2025-12-03 14:58, The Natural Philosopher wrote:

On 03/12/2025 12:56, Carlos E.R. wrote:
e.

No? The problem is that ASCII only represent the USA view of the
alphabet.

The problem is that ASCII only represent the USA view of *ONE* alphabet.
?at is ?e problem...

And, worse, many writing methods do not use alphabets...

Right.

--
Cheers, Carlos.
ES??, EU??;

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Eli the Bearded@3:633/10 to All on Thu Dec 4 07:00:08 2025

In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:

I know that Unicode is here to stay. Said as much before. But it has introduced a whole range of problems that people tend to pretend don't exist. The most immediate one coming to my mind are all kind of scammers creating fake domains to phish stuff. Using known, trusted company
names, but letters replaced by things that look visually equivalent, but actually are other characters, and then through those domains fool
people to give information, such as passwords, account numbers, money,
and god knows what else.

As opposed to scammers posting as J0HNNY BILLQUIST, or Johnny Bi11quist,
or JOHNNY BILLQUlST in ordinary ASCII. More alphabets compound the
problem, sure, but it was always there.

A big part of the problem is that Unicode don't even seem to have known
what problem is was supposed to solve. Was it about representing
different characters that have different meanings? Was it about
representing same characters but with different visual effects? Was it supposed to be some kind of generic system to modify characters through
some clever system design?

It's pretty much never about "visual effects" although there are semantic differences to some visually similar characters. Math is a big offender
in wanting ? meaning something different than Z or ?. But you could argue
that Japanese style "fullwidth" ? is a visual effect.

I would say the problem Unicode is trying to solve, albeit with some inconsistency, is the communication of all written languages in a
standardized system of encoding. There are huge problems in that many
written languages have implicit presentation rules based on context. The fullwidth Roman alphaphet, for example, is there because English letters
in Japanese text are supposed to be the same size to fit the grid of the surrounding material.

At different stages Unicode has solved this problem in different ways.
More recently there has been a trend towards encoding things with
combining characters (backspace overstrike style in the old manual
typewriter days) and with ligatures of a sort. Flags being represented
as a pair of "regional indicator" letters, where the letters are the
same country codes used in DNS, is an example of that.

Elijah
------
"Weird AI != Weird Al" being a confusable forming some recent jokes

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Eli the Bearded@3:633/10 to All on Thu Dec 4 07:15:57 2025

In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:

On 2025-11-27 20:16, Carlos E.R. wrote:

And 8 bit ascii letters are also numbers representing characters. That's
how computers work.

Not all computers are 8-bit.

Right. So both ASCII and Unicode use numbers to represent characters.
Note that UTF-8 didn't get mentioned in that sentence.

What is your point? Touch-tone phones and rotary (pulse) dialing use
electrical signals to represent numbers. Most-significan-bit computers
and least-significan-bit wire communication of ASCII or EBCDIC or UTF-16
all encode the letter capital A differently. (Are there parity bits?
Stop bits? More variables!)

You'll want to know both which characters are being represented and
which encoding has been used to make sense of a message. (And figuring
out cryptoanalist style is fine, but you are still figuring out the set
and the encoding to do so.)

Unicode is a set of numbered characters. UTF-8 or UTF-16 or ... is an
encoding for those numbers.

Elijah
------
.- ... -.-. .. .. ...-.-

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From St�phane CARPENTIER@3:633/10 to All on Fri Dec 5 20:52:52 2025

Le 04-12-2025, Eli the Bearded <*@eli.users.panix.com> a �crit�:

In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:

I know that Unicode is here to stay. Said as much before. But it has
introduced a whole range of problems that people tend to pretend don't
exist. The most immediate one coming to my mind are all kind of scammers
creating fake domains to phish stuff. Using known, trusted company
names, but letters replaced by things that look visually equivalent, but
actually are other characters, and then through those domains fool
people to give information, such as passwords, account numbers, money,
and god knows what else.

As opposed to scammers posting as J0HNNY BILLQUIST, or Johnny Bi11quist,
or JOHNNY BILLQUlST in ordinary ASCII. More alphabets compound the
problem, sure, but it was always there.

Agreed. I see only one issue clearly limited to UTF-8. In most of the actual writing systems the characters are displayed from left to right, others
from right to left and, to my knowledge only old scripts, in
boustrophedon. And UTF-8 takes care of it, but not every tool takes care
of it the same way. The issue being writing code in English and comments
in Arabic on the same line. Poorly done, it just doesn't compile and
it's not an issue. But if an attacker want to use it, your text editor
may make you believe the code is commented when the compiler doesn't
know it's compiled. And some mischievous code can be executed when you
believe it's commented.

A big part of the problem is that Unicode don't even seem to have known
what problem is was supposed to solve. Was it about representing
different characters that have different meanings? Was it about
representing same characters but with different visual effects? Was it
supposed to be some kind of generic system to modify characters through
some clever system design?

It's pretty much never about "visual effects" although there are semantic differences to some visually similar characters. Math is a big offender
in wanting ? meaning something different than Z or ?. But you could argue that Japanese style "fullwidth" ? is a visual effect.

Of course, the rendering isn't considered by the encoding. It's the
purpose of the font. I choose fonts which doesn't make me think about
the character written. The 0 and O doesn't have to be similar. Like 1
and l and I can be easily differentiated. If it's not the case on your
computer and if that matters, change the font, not the encoding.

I would say the problem Unicode is trying to solve, albeit with some inconsistency, is the communication of all written languages in a standardized system of encoding.

Yes. And it wasn't a small thing to solve considering the way too limited
ASCII was everywhere.

--
Si vous avez du temps � perdre :
https://scarpet42.gitlab.io

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Peter Flass@3:633/10 to All on Fri Dec 5 15:57:34 2025

On 12/5/25 13:52, St�phane CARPENTIER wrote:
[snip]

Agreed. I see only one issue clearly limited to UTF-8. In most of the actual writing systems the characters are displayed from left to right, others
from right to left and, to my knowledge only old scripts, in
boustrophedon.

I wonder, is there any way to do this now without a lot of work? Are the right-to-left charachers different from the left-to-right?

Of course, the rendering isn't considered by the encoding. It's the
purpose of the font. I choose fonts which doesn't make me think about
the character written. The 0 and O doesn't have to be similar. Like 1
and l and I can be easily differentiated. If it's not the case on your computer and if that matters, change the font, not the encoding.

I usually spend a lot of time settling on fonts for an editor. Right now
I'm using "IBM Plex Mono", but I've tried a bunch.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

Who's Online
Recent Visitors
- Guest
  Sat Nov 22 17:37:30 2025
  from Meh. Nah via Telnet
- Guest
  Wed Nov 26 06:46:07 2025
  from Gremlintown, Az via Telnet
- Guest
  Thu Nov 27 12:02:51 2025
  from Gremlintown, Az via Raw
- John F Kennedy
  Sat Dec 6 10:35:57 2025
  from crazyworldbbs.com:2323 via Telnet

System Info

Sysop:	Tetrazocine
Location:	Melbourne, VIC, Australia
Users:	14
Nodes:	8 (0 / 8)
Uptime:	135:01:35
Calls:	185
Calls today:	1
Files:	21,502
Messages:	82,193

Re: Recent history of vi

Who's Online

Recent Visitors

System Info