On Sat, 15 Nov 2025 09:59:39 +0000, Nuno Silva wrote:
On 2025-11-15, rbowman wrote:
On Fri, 14 Nov 2025 16:09:47 -0500, c186282 wrote:
WordStar and close variants were VERY popular back in the day. Kind >>>> of everyone's "first word processor".
Everyone used it alongside Lotus-123.
It was bundled on the Osborne 1 CP/M machine. I got a lot of miles out
of it as a programming editor in the text mode. When I finally moved to
the DOS world I bought Brief.
https://en.wikipedia.org/wiki/Brief_(text_editor)
'ed' wasn't much fun. I think I way have had a freeware clone of vi
that was no Joy either. I guessing 95% of the people who say 'I use vi'
never have. Most Linux distros bring up Vim if you type 'vi'. One
exception is Arch. 'vi' is a hard link to ex which comes up in the
visual mode for that old timey flavor.
IIRC that source was lost or elusive for a long time, or perhaps held
back by lack of permission to distribute?
Like Unix itself ed and vi had licensing problems.
https://tech.slashdot.org/story/02/01/24/0146248/caldera-releases- original-unices-under-bsd-license
Somehow at least the legacy vi code escaped the AT&T, UNIX System Laboratories, Novel, Caldera, SCO mess.
It's fine to to like nano or emacs or vscode or whatever. But that
just means you are not coming from a place that can judge my
appreciation of the features of vi(m).
In article <10fasl6$3p4r1$3@dont-email.me>,
Lawrence DOliveiro <ldo@nz.invalid> wrote:
On 15 Nov 2025 18:48:51 GMT, Ted Nolan <tednolan> wrote:
I find it takes a lot of munging to get vim to *really* work like vi.
To me, that sounds like someone saying ??it takes a lot of munging to get a >> Trabant to *really* work like a Morris Minor??. I can??t imagine myself
wanting to use either.
Well, squids & kids, but my fingers do vi automatically. Anything else
not so much.
On 15 Nov 2025 18:48:51 GMT, Ted Nolan <tednolan> wrote:
I find it takes a lot of munging to get vim to *really* work like vi.
The one on FreeBSD which I think is technically "nex" is much closer out
of the box.
:he compatible has the disclaimer
When this option is set, numerous other options are set to make Vim as Vi-compatible as possible.
The Arch vi is the real thing.
I've no idea what version it is because
real vi doesn't do --version or much of anything useful.
In comp.os.linux.misc, rbowman <bowman@montana.com> wrote:
<brevsnip>
The Arch vi is the real thing. I've no idea what version it is because
real vi doesn't do --version or much of anything useful.
In vi, the standard way to get the version is with ":version". It looks
like arch is using Heirloom Vi:
https://ex-vi.sourceforge.net/
That is a port of old code with many multibyte (eg UTF-8) fixes. It
should work with hardcopy terminals, which a lot of other vi
implementations (including vim) will not do. Those others expect you
to use ex mode on hardcopy terminals.
I learned vi on Digital Unix, A/UX, HP-UX, SunOS 4, and Solaris
2.(various), but I dabbled in vi clones for a long time, and was using
vim back in the 2.x versions. Elvis is still the default vi in
Slackware, and I've used recent versions of elvis for that reason. nvi
is default on NetBSD, and probably that FreeBSD one mentioned above. I
use NetBSD regularly and other BSDs very rarely.
In the vim distro there are sample macro packages. The ones to run
Conway's Game of Life were written by me on a Solaris box. The Solaris
vi can run them, but eventually it crashes out because there is a bug
that makes real vi (at least real vi of that era) forget marks after a
while. Vim will just work. Neovim fails to even start.
On the Debian system I'm working on right now those macros are in /usr/share/vim/vim90/macros/life/
Elijah
------
admits elvis is a pretty good vi imitation, but still not perfect
On 16/11/2025 05:11, Ted Nolan <tednolan> wrote:
In article <10fasl6$3p4r1$3@dont-email.me>,
Lawrence DOliveiro <ldo@nz.invalid> wrote:
On 15 Nov 2025 18:48:51 GMT, Ted Nolan <tednolan> wrote:
I find it takes a lot of munging to get vim to *really* work like vi.
To me, that sounds like someone saying ??it takes a lot of munging to get a
Trabant to *really* work like a Morris Minor??. I can??t imagine myself >>> wanting to use either.
Well, squids & kids, but my fingers do vi automatically. Anything else
not so much.
I find that depressing.
I used to have to write reams of code in 'vi'. Horrible
Today I use whatever-it-is-that-typing-vi-brings-up to edit the odd
config file with root permissions, as it is marginally quicker than
invoking a GUI text editor and managing root perms.
And its quicker than learning nano and I don't need joe for a single
line of /etc/whatever.
In alt.folklore.computers rbowman <bowman@montana.com> wrote:
On Sat, 15 Nov 2025 09:59:39 +0000, Nuno Silva wrote:
On 2025-11-15, rbowman wrote:
On Fri, 14 Nov 2025 16:09:47 -0500, c186282 wrote:
WordStar and close variants were VERY popular back in the day. Kind >>>>> of everyone's "first word processor".
Everyone used it alongside Lotus-123.
It was bundled on the Osborne 1 CP/M machine. I got a lot of miles out >>>> of it as a programming editor in the text mode. When I finally moved to >>>> the DOS world I bought Brief.
https://en.wikipedia.org/wiki/Brief_(text_editor)
'ed' wasn't much fun. I think I way have had a freeware clone of vi
that was no Joy either. I guessing 95% of the people who say 'I use vi' >>>> never have. Most Linux distros bring up Vim if you type 'vi'. One
exception is Arch. 'vi' is a hard link to ex which comes up in the
visual mode for that old timey flavor.
IIRC that source was lost or elusive for a long time, or perhaps held
back by lack of permission to distribute?
Like Unix itself ed and vi had licensing problems.
https://tech.slashdot.org/story/02/01/24/0146248/caldera-releases-
original-unices-under-bsd-license
Somehow at least the legacy vi code escaped the AT&T, UNIX System
Laboratories, Novel, Caldera, SCO mess.
Around 1992 I fetched 'elvis' from the net. I did not use it
much, but a guy which was previously using real vi found it
to be a reasonable replacement.
Later, there was nvi. And after that Linux distributions
switched to vim.
Stevie was a clone of vi and Vim followed on Stevie.
It's fine to to like nano or emacs or vscode or whatever. But that
just means you are not coming from a place that can judge my
appreciation of the features of vi(m).
Yes, it is a question of taste and not morals.
My taste includes both vi and vscode. ;-)
In comp.os.linux.misc, rbowman <bowman@montana.com> wrote:
<brevsnip>
The Arch vi is the real thing. I've no idea what version it is because
real vi doesn't do --version or much of anything useful.
In vi, the standard way to get the version is with ":version". It looks
like arch is using Heirloom Vi:
https://ex-vi.sourceforge.net/
In nvi, :version yields
Version nvi-1.81.6 (2007-11-18) The CSRG, University of California, Berkeley.
That is a port of old code with many multibyte (eg UTF-8) fixes. It
should work with hardcopy terminals, which a lot of other vi
implementations (including vim) will not do. Those others expect you
to use ex mode on hardcopy terminals.
I learned vi on Digital Unix, A/UX, HP-UX, SunOS 4, and Solaris
2.(various), but I dabbled in vi clones for a long time, and was using
vim back in the 2.x versions. Elvis is still the default vi in
Slackware, and I've used recent versions of elvis for that reason. nvi
is default on NetBSD, and probably that FreeBSD one mentioned above. I
use NetBSD regularly and other BSDs very rarely.
On 2025-11-15, rbowman wrote:
IIRC that source was lost or elusive for a long time, or perhaps held
back by lack of permission to distribute?
Like Unix itself ed and vi had licensing problems.
https://tech.slashdot.org/story/02/01/24/0146248/caldera-releases-
original-unices-under-bsd-license
Somehow at least the legacy vi code escaped the AT&T, UNIX System
Laboratories, Novel, Caldera, SCO mess.
How did it escape the SCO mess? Wouldn't such a release from Caldera
rely on Santa Cruz Operation having gotten ownership of the code from
Novell? And if I'm reading Wikipedia right,[0] Novell still having the rights played a role in the later mess involving the SCO Group?
Or is there something that I'm overlooking here?
[0] https://enwp.org/SCO_v._Novell
Some years back I moved all my PHP work out of Eclipse and into vim.
With a a few plugins I get modern conveniences like a debug
console, code style enforcement and syntax validation.
I used to have to write reams of code in 'vi'. HorribleI still do. (Though it is vim).
I tend to be looking at a log file, control-z out to check something else
and then "fg" back
Today I use whatever-it-is-that-typing-vi-brings-up to edit the odd
config file with root permissions, as it is marginally quicker than
invoking a GUI text editor and managing root perms.
Lack of utf-8 would be an issue for some things, but mostly not.
My taste includes both vi and vscode. ;-)
On 16/11/2025 20:27, Ted Nolan <tednolan> wrote:
I tend to be looking at a log file, control-z out to check something
else and then "fg" back
Not since a GUI gave me unlimited consoles...on the same monitor
In article <10fddvm$dsjl$3@dont-email.me>,
Lawrence DOliveiro <ldo@nz.invalid> wrote:
Right. Don't need those.
On 16 Nov 2025 20:19:12 GMT, Ted Nolan <tednolan> wrote:
Lack of utf-8 would be an issue for some things, but mostly not.
Without UTF-8, you could not have ????? or ??????
On 16 Nov 2025 20:19:12 GMT, Ted Nolan <tednolan> wrote:
Lack of utf-8 would be an issue for some things, but mostly not.
Without UTF-8, you could not have ??? or ??? or ?? or those curly quotes.
On 2025-11-16, Lawrence D?Oliveiro wrote:
On 16 Nov 2025 20:19:12 GMT, Ted Nolan <tednolan> wrote:
Lack of utf-8 would be an issue for some things, but mostly not.
Without UTF-8, you could not have ??? or ??? or ?? or those curly quotes.
Those three are all in iso8859-15 and in Mac OS Roman...
Bingo on the "smart" quotes!
You sure about that. ex-vi, vim.tiny, nvi are all close, but not
even this one is the "real" vi:
<https://github.com/n-t-roff/heirloom-ex-vi>
For example, it adds UTF-8 support.
On Sun, 16 Nov 2025 09:50:35 -0500, Chris Ahlstrom wrote:
There was an implementation of SteVIe for the Atari ST iirc.
ST Editor for Vi Enthusiasts.
Moolenaar extended Stevie for his Amiga. The Amiga spawned a lot of software.
https://en.wikipedia.org/wiki/Fred_Fish
On Sun, 16 Nov 2025 20:30:19 +0000, The Natural Philosopher wrote:
On 16/11/2025 20:27, Ted Nolan <tednolan> wrote:
I tend to be looking at a log file, control-z out to check something
else and then "fg" back
Not since a GUI gave me unlimited consoles...on the same monitor
While I use Vim for quick edits of a config file or with ssh, gVim is what
I mostly use for that reason. I may use the menu once in a blue moon to change the font or theme.
On 16/11/2025 14:49, Chris Ahlstrom wrote:
I used to have to write reams of code in 'vi'. Horrible
I still do. (Though it is vim).
I have a GUI. Geany is SO much nicer...
On 2025-11-15, rbowman wrote:
IIRC that source was lost or elusive for a long time, or perhaps held
back by lack of permission to distribute?
Like Unix itself ed and vi had licensing problems.
https://tech.slashdot.org/story/02/01/24/0146248/caldera-releases-
original-unices-under-bsd-license
Somehow at least the legacy vi code escaped the AT&T, UNIX System
Laboratories, Novel, Caldera, SCO mess.
How did it escape the SCO mess? Wouldn't such a release from Caldera
rely on Santa Cruz Operation having gotten ownership of the code from
Novell? And if I'm reading Wikipedia right,[0] Novell still having the >rights played a role in the later mess involving the SCO Group?
Or is there something that I'm overlooking here?
On 16 Nov 2025 in comp.os.linux.misc, Mechanicjay wrote:
Some years back I moved all my PHP work out of Eclipse and into vim.
With a a few plugins I get modern conveniences like a debug
console, code style enforcement and syntax validation.
If you don't mind my asking, which plugins? I see several web pages out >there with suggestions, but I'd be curious to see another set.
On 2025-11-17, Lawrence D?Oliveiro <ldo@nz.invalid> wrote:
On Mon, 17 Nov 2025 08:24:27 -0000 (UTC), Ian wrote:
* Ubuntu isn't my preferred choice for servers, or anything really, but this
particular application was developed for it, and I haven't got the time or >>> inclination to port it to a different distribution.
What exactly was there about it that needed porting?
I have no idea. It is available as an "apt-get install" on the latest Ubuntu, from the standard repos, documented, tested and "supported". It isn't available
in the standard repos on other distributions, so that would need time and effort to locate a compatible 3rd party binary, or compile from source. Even if that "just works" it's alredy more effort and risk than installing Ubuntu and using the provided package, as this is on a dedicated VM anyway.
Sometimes You just need things to work, and don't want another adventure...
On 16 Nov 2025 20:19:12 GMT, Ted Nolan <tednolan> wrote:Of course you could.
Lack of utf-8 would be an issue for some things, but mostly not.
Without UTF-8, you could not have ??? or ??? or ?? or those curly quotes.
On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
Without UTF-8, you could not have ??? or ??? or ?? or those curly quotes.Of course you could.
They exist just fine in Latin-1 (hmm, maybe not the quotes...).
On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
On 16 Nov 2025 20:19:12 GMT, Ted Nolan <tednolan> wrote:Of course you could.
Lack of utf-8 would be an issue for some things, but mostly not.
Without UTF-8, you could not have ??? or ??? or ?? or those curly
quotes.
They exist just fine in Latin-1 (hmm, maybe not the quotes...).
Johnny Billquist <bqt@softjar.se> wrote:
On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
Without UTF-8, you could not have ??? or ??? or ?? or those curly quotes. >> Of course you could.They exist just fine in Latin-1 (hmm, maybe not the quotes...).
The Latin-1 I know does not have a Euro symbol. It does have the generic currency placeholder at 0xA5: ?
I set this up quite a few years ago and haven't had to mess with it
much. It would be a bit of a process of discovery to get it all
setup with the pieces in place again.
On 2025-11-18 20:04, Johnny Billquist wrote:
On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
On 16 Nov 2025 20:19:12 GMT, Ted Nolan <tednolan> wrote:
But with the transmission you have to transmit first what charset youOf course you could. They exist just fine in Latin-1 (hmm, maybe notLack of utf-8 would be an issue for some things, but mostly not.Without UTF-8, you could not have ??? or ??? or ?? or those curly
quotes.
the quotes...).
are going to use, and then you are limited by it, and the recipient
must have the same map, and be able to use it. Perhaps he has to use
his own map instead.
On 17 Nov 2025 in comp.os.linux.misc, Mechanicjay wrote:
I set this up quite a few years ago and haven't had to mess with it
much. It would be a bit of a process of discovery to get it all
setup with the pieces in place again.
Thanks. I know what you mean - spend hours or days getting something set
up; it just runs; something else updates, which blows up the original
thing; spend hours or days relearning the original thing... I tend to
leave myself hints in config files, but that doesn't always help.
On 16 Nov 2025 23:18:50 GMT, Ted Nolan <tednolan> wrote:
Bingo on the "smart" quotes!
I also like using ?? and ?? as metasyntactic brackets, but I expect
French people will interpret those as quotes ...
In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:
On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
Without UTF-8, you could not have ??? or ??? or ?? or those curly quotes. >> Of course you could.They exist just fine in Latin-1 (hmm, maybe not the quotes...).
The Latin-1 I know does not have a Euro symbol. It does have the generic currency placeholder at 0xA5: ?
Le 17-11-2025, Lawrence D?Oliveiro <ldo@nz.invalid> a crit:
I also like using ?? and ?? as metasyntactic brackets, but I expect
French people will interpret those as quotes ...
Of course those are quotes.
Le 18-11-2025, Eli the Bearded <*@eli.users.panix.com> a crit:
In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:
On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
Without UTF-8, you could not have ??? or ??? or ?? or those curly quotes. >>> Of course you could.They exist just fine in Latin-1 (hmm, maybe not the quotes...).
The Latin-1 I know does not have a Euro symbol. It does have the generic
currency placeholder at 0xA5: ?
They created the latin9 from the latin1 to add this ? symbol.
Le 18-11-2025, Eli the Bearded <*@eli.users.panix.com> a crit:
In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:
On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
Without UTF-8, you could not have ??? or ??? or ?? or those curly quotes. >>> Of course you could.They exist just fine in Latin-1 (hmm, maybe not the quotes...).
The Latin-1 I know does not have a Euro symbol. It does have the generic
currency placeholder at 0xA5: ?
They created the latin9 from the latin1 to add this ? symbol.
On 21 Nov 2025 19:55:07 GMT, Stphane CARPENTIER wrote:
Le 17-11-2025, Lawrence D?Oliveiro <ldo@nz.invalid> a crit:I want more paired bracketing symbols. ;)
I also like using ?? and ?? as metasyntactic brackets, but I expectOf course those are quotes.
French people will interpret those as quotes ...
In comp.os.linux.misc, Lawrence DOliveiro <ldo@nz.invalid> wrote:
I want more paired bracketing symbols. ;)
https://qaz.wtf/qz/blosxom/2022/06/02/matchpairs
TL;DR: 186 pairs in Unicode
On 11/21/25 12:58, Stphane CARPENTIER wrote:
Le 18-11-2025, Eli the Bearded <*@eli.users.panix.com> a crit:
In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:
On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
Without UTF-8, you could not have ??? or ??? or ?? or those curly quotes.Of course you could.
They exist just fine in Latin-1 (hmm, maybe not the quotes...).
The Latin-1 I know does not have a Euro symbol. It does have the generic >>> currency placeholder at 0xA5: ?
They created the latin9 from the latin1 to add this ? symbol.
I thought it was Latin-15
In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:
On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
Without UTF-8, you could not have ??? or ??? or ?? or those curly quotes. >> Of course you could.They exist just fine in Latin-1 (hmm, maybe not the quotes...).
The Latin-1 I know does not have a Euro symbol. It does have the generic currency placeholder at 0xA5: ?
ElijahThat don't even make sense. UTF-8 is just a way to encode large integers
------
likes utf-8 better than iso-8859-$WHATEVER
On 2025-11-18 21:29, Eli the Bearded wrote:
In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:
On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
Without UTF-8, you could not have ??? or ??? or ?? or those curly quotes. >>> Of course you could.They exist just fine in Latin-1 (hmm, maybe not the quotes...).
The Latin-1 I know does not have a Euro symbol. It does have the generic
currency placeholder at 0xA5: ?
Yeah. Sorry. That came in 8859-15.
ElijahThat don't even make sense. UTF-8 is just a way to encode large integers
------
likes utf-8 better than iso-8859-$WHATEVER
in a variable sequence of 8-bit bytes.
Which of course makes it a mess to figure out how long a string actually is. But I guess what you actually mean is that you like Unicode better than 8859-whatever.
I couldn't disagree more. Endless ways to represent the exact same character, and weird things like sometimes having a separate codepoint
for units or prefixes, but sometimes using normal ASCII for them, and
then you have sometimes different codepoints because of colors, but sometimes not.
It's a trainwreck, but now we're stuck with it. :(
On 2025-11-18 21:29, Eli the Bearded wrote:
In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:
On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
Without UTF-8, you could not have ??? or ??? or ?? or those curlyOf course you could.
quotes.
They exist just fine in Latin-1 (hmm, maybe not the quotes...).
The Latin-1 I know does not have a Euro symbol. It does have the generic
currency placeholder at 0xA5: ?
Yeah. Sorry. That came in 8859-15.
ElijahThat don't even make sense. UTF-8 is just a way to encode large integers
------
likes utf-8 better than iso-8859-$WHATEVER
in a variable sequence of 8-bit bytes.
Which of course makes it a mess to figure out how long a string actually
is.
But I guess what you actually mean is that you like Unicode better than 8859-whatever.
I couldn't disagree more. Endless ways to represent the exact same character, and weird things like sometimes having a separate codepoint
for units or prefixes, but sometimes using normal ASCII for them, and
then you have sometimes different codepoints because of colors, but sometimes not.
It's a trainwreck, but now we're stuck with it. :(
And my native language (German) uses essentially US-ASCII plus only
a small number of letters outside of that to begin with. Imagine if
your native script has _no_ overlap with that.
On Sat, 22 Nov 2025 19:20:28 +0100, Alexander Schreiber wrote:
And my native language (German) uses essentially US-ASCII plus only
a small number of letters outside of that to begin with. Imagine if
your native script has _no_ overlap with that.
In pre-Unicode days, the major Western European languages were the next- best-supported, in terms of computer encodings, after ASCII.
You don?t have to go very far from there to find ones that were a little harder to deal with ...
It amazes me that computers can handle Chinese. Not only display, but keyboards.
On 2025-11-22 22:43, Lawrence D?Oliveiro wrote:
On Sat, 22 Nov 2025 19:20:28 +0100, Alexander Schreiber wrote:
And my native language (German) uses essentially US-ASCII plus only
a small number of letters outside of that to begin with. Imagine if
your native script has _no_ overlap with that.
In pre-Unicode days, the major Western European languages were the next-
best-supported, in terms of computer encodings, after ASCII.
You don?t have to go very far from there to find ones that were a little
harder to deal with ...
It amazes me that computers can handle Chinese. Not only display, but keyboards.
On Sun, 23 Nov 2025 00:23:57 +0100, Carlos E.R. wrote:
It amazes me that computers can handle Chinese. Not only display, but
keyboards.
https://www.youtube.com/watch?v=iWi-9LJ4dg4
Japanese is as bad. There are over 2000 kanji characters you have to know
to be reasonably literate. Both China and Japan have tried to simplify
that character set for centuries and have gotten it down to four or five thousand though the exact count isn't known.
I can't imagine...
In article <mnumuiF7n72U5@mid.individual.net>,
rbowman <bowman@montana.com> wrote:
On Sun, 16 Nov 2025 09:49:10 -0500, Chris Ahlstrom wrote:
The Natural Philosopher wrote this post by blinking in Morse code:can??t
On 16/11/2025 05:11, Ted Nolan <tednolan> wrote:
In article <10fasl6$3p4r1$3@dont-email.me>,
Lawrence DOliveiro <ldo@nz.invalid> wrote:
On 15 Nov 2025 18:48:51 GMT, Ted Nolan <tednolan> wrote:
I find it takes a lot of munging to get vim to *really* work like >>>>>>> vi.
To me, that sounds like someone saying ??it takes a lot of munging >>>>>> to get a Trabant to *really* work like a Morris Minor??. I
imagine myself wanting to use either.
Well, squids & kids, but my fingers do vi automatically. Anything
else not so much.
I find that depressing.
I used to have to write reams of code in 'vi'. Horrible
I still do. (Though it is vim).
Back to my original statement that most people who say they use vi are >>using vim and would be very unhappy with vi.
I would not. Lack of utf-8 would be an issue for some things, but
mostly not.
On Sun, 23 Nov 2025 00:23:57 +0100, Carlos E.R. wrote:
It amazes me that computers can handle Chinese. Not only display, but
keyboards.
https://www.youtube.com/watch?v=iWi-9LJ4dg4
Japanese is as bad. There are over 2000 kanji characters you have to know
to be reasonably literate. Both China and Japan have tried to simplify
that character set for centuries and have gotten it down to four or five thousand though the exact count isn't known.
I can't imagine...
8 is the One True TS!
On 2025-11-23 03:17, rbowman wrote:
On Sun, 23 Nov 2025 00:23:57 +0100, Carlos E.R. wrote:
It amazes me that computers can handle Chinese. Not only display, but
keyboards.
https://www.youtube.com/watch?v=iWi-9LJ4dg4
Quite curious, thanks.
Japanese is as bad. There are over 2000 kanji characters you have to know
to be reasonably literate. Both China and Japan have tried to simplify
that character set for centuries and have gotten it down to four or five
thousand though the exact count isn't known.
I can't imagine...
Back in the 19th Century some Japanese educators advocated moving
completely to English but that would mean giving up on the language
of their ancestors and that was a step too far.
In alt.folklore.computers Eric Pozharski <apple.universe@posteo.net>
wrote:
with <akjvulxcnk.ln2@Telcontar.valinor> Carlos E.R. wrote:
On 2025-11-18 20:04, Johnny Billquist wrote:
On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
On 16 Nov 2025 20:19:12 GMT, Ted Nolan <tednolan> wrote:
Each ISO code page is (was???) supposed to have escapape sequence toBut with the transmission you have to transmit first what charsetIf only there was some arrangement to make it work. And RFC2047
you are going to use, and then you are limited by it, and the
recipient must have the same map, and be able to use it. Perhaps he
has to use his own map instead.
readily offers some. And that would be a nail for UTF-8 coffin.
switch to that code page. There is (was???) standard (ISO 2022 ???)
that outlined how swiching was supposed to work.
IIUC Emacs Mule used this scheme (possibly modified. AFAIK they
dumped it in favour of UTF-8.
Instead we have UTF-8. It's a shame.
On Tue, 25 Nov 2025 10:26:31 +0000, Eric Pozharski wrote:
Instead we have UTF-8. It's a shame.
Could be worse. Could be UTF-16. (*Cough* Microsoft Windows *Cough*)
Johnny Billquist <bqt@softjar.se> wrote:
On 2025-11-18 21:29, Eli the Bearded wrote:
In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:
On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
Without UTF-8, you could not have ??? or ??? or ?? or those curly quotes.Of course you could.
They exist just fine in Latin-1 (hmm, maybe not the quotes...).
The Latin-1 I know does not have a Euro symbol. It does have the generic >>> currency placeholder at 0xA5: ?
Yeah. Sorry. That came in 8859-15.
ElijahThat don't even make sense. UTF-8 is just a way to encode large integers
------
likes utf-8 better than iso-8859-$WHATEVER
in a variable sequence of 8-bit bytes.
Which of course makes it a mess to figure out how long a string actually is. >> But I guess what you actually mean is that you like Unicode better than
8859-whatever.
I couldn't disagree more. Endless ways to represent the exact same
character, and weird things like sometimes having a separate codepoint
for units or prefixes, but sometimes using normal ASCII for them, and
then you have sometimes different codepoints because of colors, but
sometimes not.
Well, a big part of the reason is that human writing systems across
the globe are, in fact, quite an impressive mess from an engineering
point of view, mostly not being properly designed and all that. ;-)
Just because there was a problem it don't follow that Unicode was a good solution.It's a trainwreck, but now we're stuck with it. :(
At least it sorta mostly kinda works for a somewhat wide range of
languages and scripts and you can have different scripts (latin,
cyrillic, arabic and others) in the same text. Which beats having to
figure out which code page to use for which text by quite a margin.
I _have_ been through the mess of "US ASCII works, good luck with
anything beyond that" that was text processing on e.g. MS-DOS (and
variants) and early Windows. And my native language (German) uses
essentially US-ASCII plus only a small number of letters outside of
that to begin with. Imagine if your native script has _no_ overlap
with that.
On 2025-11-22 17:55, Johnny Billquist wrote:
On 2025-11-18 21:29, Eli the Bearded wrote:
In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:
On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
Without UTF-8, you could not have ??? or ??? or ?? or those curlyOf course you could.
quotes.
They exist just fine in Latin-1 (hmm, maybe not the quotes...).
The Latin-1 I know does not have a Euro symbol. It does have the generic >>> currency placeholder at 0xA5: ?
Yeah. Sorry. That came in 8859-15.
ElijahThat don't even make sense. UTF-8 is just a way to encode large
------
likes utf-8 better than iso-8859-$WHATEVER
integers in a variable sequence of 8-bit bytes.
Which of course makes it a mess to figure out how long a string
actually is.
But I guess what you actually mean is that you like Unicode better
than 8859-whatever.
I couldn't disagree more. Endless ways to represent the exact same
character, and weird things like sometimes having a separate codepoint
for units or prefixes, but sometimes using normal ASCII for them, and
then you have sometimes different codepoints because of colors, but
sometimes not.
It's a trainwreck, but now we're stuck with it. :(
Encode large integers? No.
On Tue, 25 Nov 2025 10:26:31 +0000, Eric Pozharski wrote:
Instead we have UTF-8. It's a shame.
Could be worse. Could be UTF-16. (*Cough* Microsoft Windows *Cough*)
On 2025-11-22 20:25, Carlos E.R. wrote:
On 2025-11-22 17:55, Johnny Billquist wrote:
On 2025-11-18 21:29, Eli the Bearded wrote:
In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:
On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
Without UTF-8, you could not have ??? or ??? or ?? or those curly >>>>>> quotes.Of course you could.
They exist just fine in Latin-1 (hmm, maybe not the quotes...).
The Latin-1 I know does not have a Euro symbol. It does have the
generic
currency placeholder at 0xA5: ?
Yeah. Sorry. That came in 8859-15.
ElijahThat don't even make sense. UTF-8 is just a way to encode large
------
likes utf-8 better than iso-8859-$WHATEVER
integers in a variable sequence of 8-bit bytes.
Which of course makes it a mess to figure out how long a string
actually is.
But I guess what you actually mean is that you like Unicode better
than 8859-whatever.
I couldn't disagree more. Endless ways to represent the exact same
character, and weird things like sometimes having a separate
codepoint for units or prefixes, but sometimes using normal ASCII for
them, and then you have sometimes different codepoints because of
colors, but sometimes not.
It's a trainwreck, but now we're stuck with it. :(
Encode large integers? No.
Ok. Call it "encode Unicode" then if that makes you happier. And Unicode codepoints can be described as integers (in fact, they are, which is why
you see U+nnnn, where nnnn is a hex value, for codepoints), and have a
range of roughly 2^22.
UTF-8 isn't defining any characters, just defining a way to represent Unicode characters using a variable number of 8-bit bytes.
On 2025-11-25 21:05, Lawrence D?Oliveiro wrote:
On Tue, 25 Nov 2025 10:26:31 +0000, Eric Pozharski wrote:
Instead we have UTF-8. It's a shame.
Could be worse. Could be UTF-16. (*Cough* Microsoft Windows *Cough*)
In which way is it worse? It's the same character set, just encoded
using one or two 16-bit values instead of 1-4 8-bit values. You still
need to extract the actual Unicode value out of that encoding before
showing anything and vice versa.
Lawrence D?Oliveiro wrote:
Eric Pozharski wrote:
Instead we have UTF-8. It's a shame.Could be worse. Could be UTF-16. (*Cough* Microsoft Windows *Cough*)
In which way is it worse? It's the same character set, just encoded
using one or two 16-bit values instead of 1-4 8-bit values. You still
need to extract the actual Unicode value out of that encoding before
showing anything and vice versa.
And 8 bit ascii letters are also numbers representing characters. That's
how computers work.
Johnny Billquist <bqt@softjar.se> writes:
Lawrence D?Oliveiro wrote:
Eric Pozharski wrote:
Instead we have UTF-8. It's a shame.Could be worse. Could be UTF-16. (*Cough* Microsoft Windows *Cough*)
In which way is it worse? It's the same character set, just encoded
using one or two 16-bit values instead of 1-4 8-bit values. You still
need to extract the actual Unicode value out of that encoding before
showing anything and vice versa.
UTF-8 has the advantage that we can still use byte-oriented string representations, in programming languages, file formats, network
protocols etc. The upgrade from ASCII is easy.
UTF-32 loses that advantage, and that and its endianness-dependence make
it a poor choice in most contexts, but in return you get the property
that one code point is one code unit, useful when processing strings in Unicode-aware ways.
UTF-16 has neither advantage. Upgrading from ASCII can?t be done
compatibly with existing applications, and you don?t even get the
fixed-width encoding of UTF-32 in return. It?s the worst of all possible worlds. About the only thing to recommend it is that it can be the most compact representation in certain contexts.
On 2025-11-22 19:20, Alexander Schreiber wrote:
Johnny Billquist <bqt@softjar.se> wrote:
On 2025-11-18 21:29, Eli the Bearded wrote:
In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:
On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
Without UTF-8, you could not have ??? or ??? or ?? or those curly quotes.Of course you could.
They exist just fine in Latin-1 (hmm, maybe not the quotes...).
The Latin-1 I know does not have a Euro symbol. It does have the generic >>>> currency placeholder at 0xA5: ?
Yeah. Sorry. That came in 8859-15.
ElijahThat don't even make sense. UTF-8 is just a way to encode large integers >>> in a variable sequence of 8-bit bytes.
------
likes utf-8 better than iso-8859-$WHATEVER
Which of course makes it a mess to figure out how long a string actually is.
But I guess what you actually mean is that you like Unicode better than
8859-whatever.
I couldn't disagree more. Endless ways to represent the exact same
character, and weird things like sometimes having a separate codepoint
for units or prefixes, but sometimes using normal ASCII for them, and
then you have sometimes different codepoints because of colors, but
sometimes not.
Well, a big part of the reason is that human writing systems across
the globe are, in fact, quite an impressive mess from an engineering
point of view, mostly not being properly designed and all that. ;-)
I know. But the Unicode wreck can't be blamed on the human writing
system "mess". It created one completely on its own.
Just because there was a problem it don't follow that Unicode was a good solution.It's a trainwreck, but now we're stuck with it. :(
At least it sorta mostly kinda works for a somewhat wide range of
languages and scripts and you can have different scripts (latin,
cyrillic, arabic and others) in the same text. Which beats having to
figure out which code page to use for which text by quite a margin.
I _have_ been through the mess of "US ASCII works, good luck with
anything beyond that" that was text processing on e.g. MS-DOS (and
variants) and early Windows. And my native language (German) uses
essentially US-ASCII plus only a small number of letters outside of
that to begin with. Imagine if your native script has _no_ overlap
with that.
Johnny Billquist <bqt@softjar.se> writes:
Lawrence D?Oliveiro wrote:
Eric Pozharski wrote:
Instead we have UTF-8. It's a shame.Could be worse. Could be UTF-16. (*Cough* Microsoft Windows *Cough*)
In which way is it worse? It's the same character set, just encoded
using one or two 16-bit values instead of 1-4 8-bit values. You still
need to extract the actual Unicode value out of that encoding before
showing anything and vice versa.
UTF-8 has the advantage that we can still use byte-oriented string representations, in programming languages, file formats, network
protocols etc. The upgrade from ASCII is easy.
UTF-32 loses that advantage, and that and its endianness-dependence make
it a poor choice in most contexts, but in return you get the property
that one code point is one code unit, useful when processing strings in Unicode-aware ways.
UTF-16 has neither advantage. Upgrading from ASCII can?t be done
compatibly with existing applications, and you don?t even get the
fixed-width encoding of UTF-32 in return. It?s the worst of all possible worlds.
Richard Kettlewell <invalid@invalid.invalid> wrote:
Johnny Billquist <bqt@softjar.se> writes:
Lawrence D?Oliveiro wrote:
Eric Pozharski wrote:
Instead we have UTF-8. It's a shame.Could be worse. Could be UTF-16. (*Cough* Microsoft Windows *Cough*)
In which way is it worse? It's the same character set, just encoded
using one or two 16-bit values instead of 1-4 8-bit values. You still
need to extract the actual Unicode value out of that encoding before
showing anything and vice versa.
UTF-8 has the advantage that we can still use byte-oriented string representations, in programming languages, file formats, network
protocols etc. The upgrade from ASCII is easy.
UTF-32 loses that advantage, and that and its endianness-dependence make
it a poor choice in most contexts, but in return you get the property
that one code point is one code unit, useful when processing strings in Unicode-aware ways.
UTF-16 has neither advantage. Upgrading from ASCII can?t be done
compatibly with existing applications, and you don?t even get the fixed-width encoding of UTF-32 in return. It?s the worst of all possible worlds.
Thus making it the _perfect_ choice of encoding for Microsoft Windows.
To be fair, Windows NT was an early adopter of Unicode and at the time
that meant 16 bits per character (UCS-2).
UTF-16 has neither advantage. Upgrading from ASCII can?t be doneThus making it the_perfect_ choice of encoding for Microsoft Windows.
compatibly with existing applications, and you don?t even get the
fixed-width encoding of UTF-32 in return. It?s the worst of all possible
worlds.
SCNR,
Alex.
In article <slrn10ik3ub.2dppt.als@mordor.angband.thangorodrim.de>, als@usenet.thangorodrim.de says...
Richard Kettlewell <invalid@invalid.invalid> wrote:
Johnny Billquist <bqt@softjar.se> writes:
Lawrence D?Oliveiro wrote:
Eric Pozharski wrote:
Instead we have UTF-8. It's a shame.Could be worse. Could be UTF-16. (*Cough* Microsoft Windows *Cough*)
In which way is it worse? It's the same character set, just encoded
using one or two 16-bit values instead of 1-4 8-bit values. You still
need to extract the actual Unicode value out of that encoding before
showing anything and vice versa.
UTF-8 has the advantage that we can still use byte-oriented string
representations, in programming languages, file formats, network
protocols etc. The upgrade from ASCII is easy.
UTF-32 loses that advantage, and that and its endianness-dependence make >>> it a poor choice in most contexts, but in return you get the property
that one code point is one code unit, useful when processing strings in
Unicode-aware ways.
UTF-16 has neither advantage. Upgrading from ASCII can?t be done
compatibly with existing applications, and you don?t even get the
fixed-width encoding of UTF-32 in return. It?s the worst of all possible >>> worlds.
Thus making it the _perfect_ choice of encoding for Microsoft Windows.
To be fair, Windows NT was an early adopter of Unicode and at the time
that meant 16 bits per character (UCS-2). Betas and SDKs had already
been in developers hands for a year and the first release only a few
months away when UTF-8 was first presented.
Windows NT has supported UTF-8 for a couple of years now, but as its
still a fairly recent thing there are still some rough edges and I
expect not much uses it yet.
On 2025-11-29 01:13, David Goodwin wrote:
To be fair, Windows NT was an early adopter of Unicode and at the
time that meant 16 bits per character (UCS-2). Betas and SDKs had
already been in developers hands for a year and the first release
only a few months away when UTF-8 was first presented.
Windows NT has supported UTF-8 for a couple of years now, but as its
still a fairly recent thing there are still some rough edges and I
expect not much uses it yet.
I'm curious. My Thunderbird in Linux uses UTF-8 by default. What would
use TB in Windows 10 or 11?
I'm curious. My Thunderbird in Linux uses UTF-8 by default. What would
use TB in Windows 10 or 11?
On 28/11/2025 21:10, Alexander Schreiber wrote:
UTF-16 has neither advantage. Upgrading from ASCII can?t be doneThus making it the_perfect_ choice of encoding for Microsoft Windows.
compatibly with existing applications, and you don?t even get the
fixed-width encoding of UTF-32 in return. It?s the worst of all possible >>> worlds.
On 11/29/25 04:20, The Natural Philosopher wrote:
On 28/11/2025 21:10, Alexander Schreiber wrote:
UTF-16 has neither advantage. Upgrading from ASCII can?t be doneThus making it the_perfect_ choice of encoding for Microsoft Windows.
compatibly with existing applications, and you don?t even get the
fixed-width encoding of UTF-32 in return. It?s the worst of all
possible
worlds.
IBM mainframes and System i use UTF-16.
On 29/11/2025 19:45, Peter Flass wrote:
On 11/29/25 04:20, The Natural Philosopher wrote:
On 28/11/2025 21:10, Alexander Schreiber wrote:
UTF-16 has neither advantage. Upgrading from ASCII can?t be doneThus making it the_perfect_ choice of encoding for Microsoft Windows.
compatibly with existing applications, and you don?t even get the
fixed-width encoding of UTF-32 in return. It?s the worst of all
possible worlds.
IBM mainframes and System i use UTF-16.
I would have thought the font would be a function of software, not
tied to any hardware.
There is probably an argument for either 32 or 64 bit characters these
days. Same as integers grew from 16 to 64 bit...
On 29/11/2025 19:45, Peter Flass wrote:
On 11/29/25 04:20, The Natural Philosopher wrote:I would have thought the font would be a function of software, not
On 28/11/2025 21:10, Alexander Schreiber wrote:
UTF-16 has neither advantage. Upgrading from ASCII can?t be doneThus making it the_perfect_ choice of encoding for Microsoft Windows. >>>>
compatibly with existing applications, and you don?t even get the
fixed-width encoding of UTF-32 in return. It?s the worst of all
possible
worlds.
IBM mainframes and System i use UTF-16.
tied to any hardware.
There is probably an argument for either 32 or 64 bit characters these
days. Same as integers grew from 16 to 64 bit...
On 2025-11-30, The Natural Philosopher wrote:
On 29/11/2025 19:45, Peter Flass wrote:
On 11/29/25 04:20, The Natural Philosopher wrote:I would have thought the font would be a function of software, not
On 28/11/2025 21:10, Alexander Schreiber wrote:
UTF-16 has neither advantage. Upgrading from ASCII can?t be doneThus making it the_perfect_ choice of encoding for Microsoft Windows. >>>>>
compatibly with existing applications, and you don?t even get the
fixed-width encoding of UTF-32 in return. It?s the worst of all
possible
worlds.
IBM mainframes and System i use UTF-16.
tied to any hardware.
There is probably an argument for either 32 or 64 bit characters these
days. Same as integers grew from 16 to 64 bit...
This is not about fonts, this is about encodings.
Python takes a different approach. Its internal string
representation dynamically picks 8, 16 or 32 bits depending on the
string contents, with UTF-8 created on demand and cached.
On 30/11/2025 12:29, Nuno Silva wrote:
On 2025-11-30, The Natural Philosopher wrote:Well fonts that contain UTF-8 etc....
On 29/11/2025 19:45, Peter Flass wrote:
On 11/29/25 04:20, The Natural Philosopher wrote:I would have thought the font would be a function of software, not
On 28/11/2025 21:10, Alexander Schreiber wrote:
UTF-16 has neither advantage. Upgrading from ASCII can?t be done >>>>>>> compatibly with existing applications, and you don?t even get the >>>>>>> fixed-width encoding of UTF-32 in return. It?s the worst of allThus making it the_perfect_ choice of encoding for Microsoft Windows. >>>>>>
possible
worlds.
IBM mainframes and System i use UTF-16.
tied to any hardware.
There is probably an argument for either 32 or 64 bit characters these
days. Same as integers grew from 16 to 64 bit...
This is not about fonts, this is about encodings.
of WCHAR and TCHAR based on a compiler flag, and
so forth it's a lot of fun.
UTF-8 is a code point (character number) encoding. A way to store the "numbers" that reference which font glyph to display on disk/in
memory/on the wire/etc.
In article <10ght0p$gnag$2@dont-email.me>, rich@example.invalid says...
UTF-8 is a code point (character number) encoding. A way to store the
"numbers" that reference which font glyph to display on disk/in
memory/on the wire/etc.
And to further complicate matters, what looks like a single character (grapheme) to the user may be encoded as multiple code points combined together. So even if you're using UTF-32, characters are still variable length.
Richard Kettlewell wrote:
Python takes a different approach. Its internal string
representation dynamically picks 8, 16 or 32 bits depending on the
string contents, with UTF-8 created on demand and cached.
Its ?str? type (immutable) is nominally UTF-32.
On 11/29/25 04:20, The Natural Philosopher wrote:
On 28/11/2025 21:10, Alexander Schreiber wrote:
UTF-16 has neither advantage. Upgrading from ASCII can?t be doneThus making it the_perfect_ choice of encoding for Microsoft Windows.
compatibly with existing applications, and you don?t even get the
fixed-width encoding of UTF-32 in return. It?s the worst of all possible >>>> worlds.
IBM mainframes and System i use UTF-16.
Johnny Billquist <bqt@softjar.se> wrote:
Just because there was a problem it don't follow that Unicode was a good
solution.
I'm not claiming it is a good solution, but it is the solution we ended up with that reasonably covers a lot of the problem space. Given that:
- it covers a wide and very irregular problem space
- it
- it is, due to the problem scope, a design by committee
ending with a solution being bit of a mess is hardly avoidable.
It has the property of "working well enough most of the time", which is already a big impediment to anyone spending the time, money and brains
in order to:
- come up with a New And Improved Design That Surely Has No Warts
- establish it as the new standard
Honestly: not happening.
On 2025-11-27 20:02, Johnny Billquist wrote:
On 2025-11-22 20:25, Carlos E.R. wrote:
On 2025-11-22 17:55, Johnny Billquist wrote:
On 2025-11-18 21:29, Eli the Bearded wrote:
In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:
On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:
Without UTF-8, you could not have ??? or ??? or ?? or thoseOf course you could.
curly quotes.
They exist just fine in Latin-1 (hmm, maybe not the quotes...).
The Latin-1 I know does not have a Euro symbol. It does have the
generic
currency placeholder at 0xA5: ?
Yeah. Sorry. That came in 8859-15.
ElijahThat don't even make sense. UTF-8 is just a way to encode large
------
likes utf-8 better than iso-8859-$WHATEVER
integers in a variable sequence of 8-bit bytes.
Which of course makes it a mess to figure out how long a string
actually is.
But I guess what you actually mean is that you like Unicode better
than 8859-whatever.
I couldn't disagree more. Endless ways to represent the exact same
character, and weird things like sometimes having a separate
codepoint for units or prefixes, but sometimes using normal ASCII
for them, and then you have sometimes different codepoints because
of colors, but sometimes not.
It's a trainwreck, but now we're stuck with it. :(
Encode large integers? No.
Ok. Call it "encode Unicode" then if that makes you happier. And
Unicode codepoints can be described as integers (in fact, they are,
which is why you see U+nnnn, where nnnn is a hex value, for
codepoints), and have a range of roughly 2^22.
UTF-8 isn't defining any characters, just defining a way to represent
Unicode characters using a variable number of 8-bit bytes.
And 8 bit ascii letters are also numbers representing characters. That's
how computers work.
Johnny Billquist <bqt@softjar.se> writes:
On 2025-11-25 21:05, Lawrence D?Oliveiro wrote:
On Tue, 25 Nov 2025 10:26:31 +0000, Eric Pozharski wrote:
Instead we have UTF-8. It's a shame.
Could be worse. Could be UTF-16. (*Cough* Microsoft Windows *Cough*)
In which way is it worse? It's the same character set, just encoded
using one or two 16-bit values instead of 1-4 8-bit values. You still
need to extract the actual Unicode value out of that encoding before
showing anything and vice versa.
Because the endianness can vary, and thus UTF-16 requires a BOM.
UTF-16 should have been a non-starter.
On 2025-11-28 22:08, Alexander Schreiber wrote:
Johnny Billquist <bqt@softjar.se> wrote:
Just because there was a problem it don't follow that Unicode was a good >>> solution.
I'm not claiming it is a good solution, but it is the solution we
ended up
with that reasonably covers a lot of the problem space. Given that:
- it covers a wide and very irregular problem space
- it
- it is, due to the problem scope, a design by committee
ending with a solution being bit of a mess is hardly avoidable.
It has the property of "working well enough most of the time", which is
already a big impediment to anyone spending the time, money and brains
in order to:
- come up with a New And Improved Design That Surely Has No Warts
- establish it as the new standard
Honestly: not happening.
I know that Unicode is here to stay. Said as much before. But it has introduced a whole range of problems that people tend to pretend don't exist. The most immediate one coming to my mind are all kind of scammers creating fake domains to phish stuff. Using known, trusted company
names, but letters replaced by things that look visually equivalent, but actually are other characters, and then through those domains fool
people to give information, such as passwords, account numbers, money,
and god knows what else.
A big part of the problem is that Unicode don't even seem to have known
what problem is was supposed to solve.
No? The problem is that ASCII only represent the USA view of the alphabet.
On 2025-11-28 22:08, Alexander Schreiber wrote:
Johnny Billquist <bqt@softjar.se> wrote:
Just because there was a problem it don't follow that Unicode was a good >>> solution.
I'm not claiming it is a good solution, but it is the solution we
ended up
with that reasonably covers a lot of the problem space. Given that:
- it covers a wide and very irregular problem space
- it
- it is, due to the problem scope, a design by committee
ending with a solution being bit of a mess is hardly avoidable.
It has the property of "working well enough most of the time", which is
already a big impediment to anyone spending the time, money and brains
in order to:
- come up with a New And Improved Design That Surely Has No Warts
- establish it as the new standard
Honestly: not happening.
I know that Unicode is here to stay. Said as much before. But it has introduced a whole range of problems that people tend to pretend don't exist. The most immediate one coming to my mind are all kind of scammers creating fake domains to phish stuff. Using known, trusted company
names, but letters replaced by things that look visually equivalent, but actually are other characters, and then through those domains fool
people to give information, such as passwords, account numbers, money,
and god knows what else.
A big part of the problem is that Unicode don't even seem to have known
what problem is was supposed to solve. Was it about representing
different characters that have different meanings? Was it about
representing same characters but with different visual effects? Was it supposed to be some kind of generic system to modify characters through
some clever system design?
As it is, it's sortof all of these, but none of them properly.
And it makes it a hellhole to deal with.
I know that Unicode is here to stay. Said as much before. But it has introduced a whole range of problems that people tend to pretend don't exist. The most immediate one coming to my mind are all kind of scammers creating fake domains to phish stuff. Using known, trusted company
names, but letters replaced by things that look visually equivalent, but actually are other characters, and then through those domains fool
people to give information, such as passwords, account numbers, money,
and god knows what else.
On 03/12/2025 12:56, Carlos E.R. wrote:
e.
The problem is that ASCII only represent the USA view of *ONE* alphabet.
No? The problem is that ASCII only represent the USA view of the
alphabet.
?at is ?e problem...
And, worse, many writing methods do not use alphabets...
I know that Unicode is here to stay. Said as much before. But it has introduced a whole range of problems that people tend to pretend don't exist. The most immediate one coming to my mind are all kind of scammers creating fake domains to phish stuff. Using known, trusted company
names, but letters replaced by things that look visually equivalent, but actually are other characters, and then through those domains fool
people to give information, such as passwords, account numbers, money,
and god knows what else.
A big part of the problem is that Unicode don't even seem to have known
what problem is was supposed to solve. Was it about representing
different characters that have different meanings? Was it about
representing same characters but with different visual effects? Was it supposed to be some kind of generic system to modify characters through
some clever system design?
On 2025-11-27 20:16, Carlos E.R. wrote:
And 8 bit ascii letters are also numbers representing characters. That's
how computers work.
Right. So both ASCII and Unicode use numbers to represent characters.
Note that UTF-8 didn't get mentioned in that sentence.
In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:
I know that Unicode is here to stay. Said as much before. But it has
introduced a whole range of problems that people tend to pretend don't
exist. The most immediate one coming to my mind are all kind of scammers
creating fake domains to phish stuff. Using known, trusted company
names, but letters replaced by things that look visually equivalent, but
actually are other characters, and then through those domains fool
people to give information, such as passwords, account numbers, money,
and god knows what else.
As opposed to scammers posting as J0HNNY BILLQUIST, or Johnny Bi11quist,
or JOHNNY BILLQUlST in ordinary ASCII. More alphabets compound the
problem, sure, but it was always there.
A big part of the problem is that Unicode don't even seem to have known
what problem is was supposed to solve. Was it about representing
different characters that have different meanings? Was it about
representing same characters but with different visual effects? Was it
supposed to be some kind of generic system to modify characters through
some clever system design?
It's pretty much never about "visual effects" although there are semantic differences to some visually similar characters. Math is a big offender
in wanting ? meaning something different than Z or ?. But you could argue that Japanese style "fullwidth" ? is a visual effect.
I would say the problem Unicode is trying to solve, albeit with some inconsistency, is the communication of all written languages in a standardized system of encoding.
Agreed. I see only one issue clearly limited to UTF-8. In most of the actual writing systems the characters are displayed from left to right, others
from right to left and, to my knowledge only old scripts, in
boustrophedon.
Of course, the rendering isn't considered by the encoding. It's the
purpose of the font. I choose fonts which doesn't make me think about
the character written. The 0 and O doesn't have to be similar. Like 1
and l and I can be easily differentiated. If it's not the case on your computer and if that matters, change the font, not the encoding.
| Sysop: | Tetrazocine |
|---|---|
| Location: | Melbourne, VIC, Australia |
| Users: | 14 |
| Nodes: | 8 (0 / 8) |
| Uptime: | 135:01:35 |
| Calls: | 185 |
| Calls today: | 1 |
| Files: | 21,502 |
| Messages: | 82,193 |