Forum: d0p3 BBS

ISO 8859-1 ("Latin 1") (was: Recent history of vi)

From Michael Bäuerle@3:633/10 to All on Wed Nov 19 14:58:00 2025

Carlos E.R. wrote:

On 2025-11-18 20:04, Johnny Billquist wrote:

On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:

On 16 Nov 2025 20:19:12 GMT, Ted Nolan <tednolan> wrote:

Lack of utf-8 would be an issue for some things, but mostly not.

Without UTF-8, you could not have ??? or ??? or ?�? or those curly quotes.

Of course you could.
They exist just fine in Latin-1 (hmm, maybe not the quotes...).

As noted by others in this thread, ??? is not available with it.

But with the transmission you have to transmit first what charset you
are going to use, and then you are limited by it, and the recipient must
have the same map, and be able to use it. Perhaps he has to use his own
map instead.

ISO 8859-1 ("Latin 1") is a special case. No mapping table is required
for conversion to Unicode, because all ISO 8859-1 codepoints have 1:1
mappings to Unicode codepoints. This means any UTF can be directly
applied to ISO 8859-1 codepoints.

This means, for the characters from this thread, it is sufficient to
look at their Unicode codepoints: +-----------+-------------------+-------------------------------------+
| Character | Unicode codepoint | ISO 8859-1 codepoint (hexadecimal) | +-----------+-------------------+-------------------------------------+
| ? | U+20AC | [not available] |
| ? | U+00A9 | A9 |
| � | U+00B1 | B1 | +-----------+-------------------+-------------------------------------+

Any Unicode codepoint up to U+00FF is also present in ISO 8859-1 [1],
or the C0 and C1 control characters [2], with the same value.

The MIME declaration "ISO-8859-1" includes CO and C1 control characters.

______________
[1] <https://en.wikipedia.org/wiki/ISO/IEC_8859-1>
[2] <https://en.wikipedia.org/wiki/C0_and_C1_control_codes>

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Eli the Bearded@3:633/10 to All on Thu Nov 20 02:09:45 2025

In comp.os.linux.misc, Michael B�uerle <michael.baeuerle@gmx.net> wrote:

ISO 8859-1 ("Latin 1") is a special case. No mapping table is required
for conversion to Unicode, because all ISO 8859-1 codepoints have 1:1 mappings to Unicode codepoints. This means any UTF can be directly
applied to ISO 8859-1 codepoints.

...

The MIME declaration "ISO-8859-1" includes CO and C1 control characters.

Be technical. The MIME charset ISO-8859-1 includes the CO and C1
control characters and has all of its characters at the same codepoints
as Unicode but the character encoding is different from all Unicode
character encodings.

"charset" is a very specific term from MIME and it conflates character
set with character encoding. In a world were all characters fit in
eight bits, that's a very easy mistake to make, but since the MIME
designers were aware of (and specifically working to accomodate) worlds
where 8-bit encodings might not be used, that's was a poor choice.

charset="utf-8" is an encoding using variable lengths for all of the
codepoints in the Unicode character set. In UTF-8, codepoints that
are under 128 are encoded in a single octet with the highbit unset. All codepoints over 127 are encoded in multiple octets all with the highbit
set.

charset="utf-7" is an encoding using variable lengths for many of the codepoints in the Unicode character set. In UTF-7 some characters are
left as is, some characters (those above codepoint 65535) cannot be represented, and many characters are multibyte sequences. But
critically, none of the bytes have the highbit set.

charset="utf-ebcdic" is an encoding using variable lengths for all of
the codepoints in the Unicode character set. In UTF-EBCDIC an encoding
very similar to UTF-8 encodes Unicode codepoints five bits at a time
into EBCDIC. Codepoints that are under 160 are encoded in a single octet
and codepoints above 159 are encoded in multiple octets all with the
highbit set. Only the C1 control chacters are native highbit set EBCDIC.

Elijah
------
here is the map to the map you want

--- PyGate Linux v1.5
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Peter Flass@3:633/10 to All on Wed Nov 19 20:16:42 2025

On 11/19/25 19:09, Eli the Bearded wrote:

charset="utf-ebcdic" is an encoding using variable lengths for all of
the codepoints in the Unicode character set. In UTF-EBCDIC an encoding
very similar to UTF-8 encodes Unicode codepoints five bits at a time
into EBCDIC. Codepoints that are under 160 are encoded in a single octet
and codepoints above 159 are encoded in multiple octets all with the
highbit set. Only the C1 control chacters are native highbit set EBCDIC.

That sounds like a particularly bad choice. above 159 includes lowercase
s-z, all uppercase, and all numerics. Under 160 are only lowercase a-r
and specials. Personally I'd have chosen 128 and above as single bytes, possibly biased (i.e all alphabetics and numerics), and 0-127 as
multiple bytes (special characters).

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Richard Kettlewell@3:633/10 to All on Thu Nov 20 08:47:21 2025

Peter Flass <Peter@Iron-Spring.com> writes:

On 11/19/25 19:09, Eli the Bearded wrote:

charset="utf-ebcdic" is an encoding using variable lengths for all of
the codepoints in the Unicode character set. In UTF-EBCDIC an
encoding very similar to UTF-8 encodes Unicode codepoints five bits
at a time into EBCDIC. Codepoints that are under 160 are encoded in a
single octet and codepoints above 159 are encoded in multiple octets
all with the highbit set. Only the C1 control chacters are native
highbit set EBCDIC.

That sounds like a particularly bad choice. above 159 includes
lowercase s-z, all uppercase, and all numerics. Under 160 are only
lowercase a-r and specials. Personally I'd have chosen 128 and above
as single bytes, possibly biased (i.e all alphabetics and numerics),
and 0-127 as multiple bytes (special characters).

There are no good choices involving ECBDIC.

--
https://www.greenend.org.uk/rjk/

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From The Natural Philosopher@3:633/10 to All on Thu Nov 20 11:10:29 2025

On 20/11/2025 08:47, Richard Kettlewell wrote:

There are no good choices involving ECBDIC.

ROFLMAO....
--
"An intellectual is a person knowledgeable in one field who speaks out
only in others...?

Tom Wolfe

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Charlie Gibbs@3:633/10 to All on Thu Nov 20 17:57:31 2025

On 2025-11-20, The Natural Philosopher <tnp@invalid.invalid> wrote:

On 20/11/2025 08:47, Richard Kettlewell wrote:

There are no good choices involving ECBDIC.

ROFLMAO....

Taken from Ted Nelson's _Computer Lib_:

ASCII and ye shall receive.
-- the computer industry

ASCII not, what your machine can do for you.
-- IBM

A TA in one of my computer science classes pronounced EBCDIC as "ee-biddy-dick".

--
/~\ Charlie Gibbs | Growth for the sake of
\ / <cgibbs@kltpzyxm.invalid> | growth is the ideology
X I'm really at ac.dekanfrus | of the cancer cell.
/ \ if you read it the right way. | -- Edward Abbey

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Ralf Fassel@3:633/10 to All on Fri Nov 21 12:24:21 2025

* Charlie Gibbs <cgibbs@kltpzyxm.invalid>
| Taken from Ted Nelson's _Computer Lib_:

| ASCII and ye shall receive.
| -- the computer industry

| ASCII not, what your machine can do for you.
| -- IBM

ASCII stupid question, get a stupid ANSI
-- [from someones .sig]

R'

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Nuno Silva@3:633/10 to All on Fri Nov 21 23:20:42 2025

On 2025-11-21, Niklas Karlsson wrote:

On 2025-11-21, St�phane CARPENTIER <sc@fiat-linux.fr> wrote:

Le 18-11-2025, Eli the Bearded <*@eli.users.panix.com> a �crit�:

In comp.os.linux.misc, Johnny Billquist <bqt@softjar.se> wrote:

On 2025-11-16 21:59, Lawrence D?Oliveiro wrote:

Without UTF-8, you could not have ??? or ??? or ?�? or those curly quotes.

Of course you could.
They exist just fine in Latin-1 (hmm, maybe not the quotes...).

The Latin-1 I know does not have a Euro symbol. It does have the generic >>> currency placeholder at 0xA5: ?

They created the latin9 from the latin1 to add this ? symbol.

I thought that was Latin-15.

Niklas

It seems it's both latin9 and iso8859-15:

https://jkorpela.fi/latin9.html

I was wondering why "latin15" didn't bring it up in some context the
other day, I guess this is why?

(On this system, I apparently can also open the online manual page for iso_8859-15 using the name "latin9".)

--
Nuno Silva

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From John Levine@3:633/10 to All on Mon Nov 24 01:45:52 2025

According to Carlos E.R. <robin_listas@es.invalid>:

You don?t have to go very far from there to find ones that were a little
harder to deal with ...

It amazes me that computers can handle Chinese. Not only display, but >keyboards.

Actually, there aren't Chinese keyboards. While there were some impressive attempts at electromechanical Chinese typewriters in the 20th c., these days the way one types Chinese is to type the pinyin transliteration and the
input software figures out the characters. When there are multiple characters with the same pinyin it can usually tell from context which one makes sense,
or if need be it'll pop up a question box and the user picks the correct one.

Japanese has two phonetic alphabets, hiragana amd katakana, so that's
what people type, with a similar scheme turning them into kanji
characters.

Displaying Chinese and Japanese is relatively straightforward since
there are Unicode code points for all of the characters that are in
common use, known as the CJK Unified Ideographs. But Chinese has a lot
of obscure rarely used characters and there is a huge backlog of them
still proposed to be added to Unicode.

If you are interested in this topic, read this excellent book:

https://en.wikipedia.org/wiki/Kingdom_of_Characters

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bobbie Sellers@3:633/10 to All on Sun Nov 23 18:06:20 2025

On 11/23/25 17:45, John Levine wrote:

According to Carlos E.R. <robin_listas@es.invalid>:

You don?t have to go very far from there to find ones that were a little >>> harder to deal with ...

It amazes me that computers can handle Chinese. Not only display, but
keyboards.

Actually, there aren't Chinese keyboards. While there were some impressive attempts at electromechanical Chinese typewriters in the 20th c., these days the way one types Chinese is to type the pinyin transliteration and the
input software figures out the characters. When there are multiple characters
with the same pinyin it can usually tell from context which one makes sense, or if need be it'll pop up a question box and the user picks the correct one.

Japanese has two phonetic alphabets, hiragana amd katakana, so that's
what people type, with a similar scheme turning them into kanji
characters.

Yes but the 2000 Kanji are essential to be considered literate. To add to the fun the
kanji may be used it various ways to indicate the desired pronounciation
and whether
a word is an adaptation of a word not found in Japanese language and
these are
shown as superscripts set above the first letter. Originally Japanese
was written in
Chinese but the pronouciation changed. Then hiragana was invented and
it became
an item of artistic interest with some very difficult to read scripts
being used in
succeeding centuries and the schools of calligraphy.

Displaying Chinese and Japanese is relatively straightforward since

there are Unicode code points for all of the characters that are in
common use, known as the CJK Unified Ideographs. But Chinese has a lot
of obscure rarely used characters and there is a huge backlog of them
still proposed to be added to Unicode.

If you are interested in this topic, read this excellent book:

https://en.wikipedia.org/wiki/Kingdom_of_Characters

bliss

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lawrence D?Oliveiro@3:633/10 to All on Mon Nov 24 02:13:59 2025

On Sun, 23 Nov 2025 18:06:20 -0800, Bobbie Sellers wrote:

Originally Japanese was written in Chinese but the pronouciation
changed.

Japanese was an entirely different language, which adopted Chinese writing
in lieu of having its own script. The Koreans and Vietnamese started out
doing the same thing, but the Koreans invented their own syllabic-based
script in the 13th century or so, and switched wholesale to that. The Vietnamese were colonized (for a while) by the French, who introduced a Roman-based rendition of the language, complete with funny squiggles here
and there to denote tones of the tonal language, plus some other sound distinctions (e.g. ??? versus ?d?).

I guess the only Koreans and Vietnamese who need to understand the old Chinese-based script for their respective languages would be those dealing with old historical documents.

Meanwhile, the Japanese stuck with the Chinese script, only adding a few complications (like two different syllabic-based character sets, as well
as the Roman alphabet) on top of that.

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From John Levine@3:633/10 to All on Mon Nov 24 02:23:11 2025

According to Bobbie Sellers <blissInSanFrancisco@mouse-potato.com>:

Japanese has two phonetic alphabets, hiragana amd katakana, so that's
what people type, with a similar scheme turning them into kanji
characters.

Yes but the 2000 Kanji are essential to be considered literate.

Indeed, but the question was about how do you type Japanese, not how
do you read it.

To add >to the fun the
kanji may be used it various ways to indicate the desired pronounciation
and whether a word is an adaptation of a word not found in Japanese language and
these are shown as superscripts set above the first letter.

I don't know Japanese well enough to say how if at all one would type the superscripts.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From John Levine@3:633/10 to All on Fri Dec 5 01:59:06 2025

According to Johnny Billquist <bqt@softjar.se>:

A big part of the problem is that Unicode don't even seem to have known
what problem is was supposed to solve. Was it about representing
different characters that have different meanings? Was it about
representing same characters but with different visual effects? Was it >supposed to be some kind of generic system to modify characters through
some clever system design?

Nope.

Unicode is a typesetting language. Its goal is to represent every written language that people use, and it does that quite well. When you're setting type,
the goal is to make the result look correct, and you do not care how you do that. For example, I am old enough to remember manual typewriters that only had the digits 2 through 9, because you used lower case "l" and capital "O" for digits
1 and O. That was fine, on those typewriters they looked the same.

The problem is that we want to represent identifiers in a unique way, which both
means that there is only one way to represent a particular identifier, and that there aren't two representations that look the same. It shouldn't be surprising
that Unicode doesn't do either of those, so we have been coming up with kludges for the past decade to try and fake it.

The reason we use Unicode is that while it sucks for identifiers, all of the alternatives are even worse.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Nuno Silva@3:633/10 to All on Fri Dec 5 10:14:24 2025

On 2025-12-05, John Levine wrote:

According to Johnny Billquist <bqt@softjar.se>:

A big part of the problem is that Unicode don't even seem to have known >>what problem is was supposed to solve. Was it about representing
different characters that have different meanings? Was it about >>representing same characters but with different visual effects? Was it >>supposed to be some kind of generic system to modify characters through >>some clever system design?

Nope.

Unicode is a typesetting language. Its goal is to represent every written language that people use, and it does that quite well. When you're setting type,
the goal is to make the result look correct, and you do not care how you do that. For example, I am old enough to remember manual typewriters that only had
the digits 2 through 9, because you used lower case "l" and capital "O" for digits
1 and O. That was fine, on those typewriters they looked the same.

The problem is that we want to represent identifiers in a unique way, which both
means that there is only one way to represent a particular identifier, and that
there aren't two representations that look the same. It shouldn't be surprising
that Unicode doesn't do either of those, so we have been coming up with kludges
for the past decade to try and fake it.

The reason we use Unicode is that while it sucks for identifiers, all of the alternatives are even worse.

It also at least provides a way to get the meaning of glyphs even when
the font can't show them or the display encoding can't carry
them. Meaning one could in theory have software that'd replace
unreadable emojis by their text representation.

Especially good ol' "no 18" :-P

I mean, I could e.g. do M-x describe-char on '�' and it tells me
PLUS-MINUS SIGN / PLUS-OR-MINUS SIGN. Not a big problem here as it's
readable and can be easily translated/transliterated as "+-", but it's
probably a much bigger advantage for other glyphs.

I do prefer more legible displays like :-) and :-( and so on, but I can
see value in having a glyph that specifically says it's that.

--
Nuno Silva

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Richard Kettlewell@3:633/10 to All on Fri Dec 5 10:35:08 2025

John Levine <johnl@taugh.com> writes:

According to Johnny Billquist <bqt@softjar.se>:

A big part of the problem is that Unicode don't even seem to have
known what problem is was supposed to solve. Was it about
representing different characters that have different meanings? Was
it about representing same characters but with different visual
effects? Was it supposed to be some kind of generic system to modify
characters through some clever system design?

The introduction to the standard covers the design goals.

Nope.

Unicode is a typesetting language. Its goal is to represent every
written language that people use, and it does that quite well. When
you're setting type, the goal is to make the result look correct, and
you do not care how you do that. For example, I am old enough to
remember manual typewriters that only had the digits 2 through 9,
because you used lower case "l" and capital "O" for digits 1 and O.
That was fine, on those typewriters they looked the same.

I don?t fully agree with the interpretation of it as a typesetting
standard; it explicitly disclaims any concern over visual representation
and confines itself to interpretation of characters. The typewriter
analogy is certainly relevant though: Unicode distinguishes capital O
from digit 0 very clearly, whether or not your chosen font does the
same.

The problem is that we want to represent identifiers in a unique way,
which both means that there is only one way to represent a particular identifier, and that there aren't two representations that look the
same. It shouldn't be surprising that Unicode doesn't do either of
those, so we have been coming up with kludges for the past decade to
try and fake it.

The reason we use Unicode is that while it sucks for identifiers, all
of the alternatives are even worse.

Agreed.

--
https://www.greenend.org.uk/rjk/

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Carlos E.R.@3:633/10 to All on Fri Dec 5 12:05:20 2025

On 2025-12-05 02:59, John Levine wrote:

Unicode is a typesetting language. Its goal is to represent every written language that people use, and it does that quite well. When you're setting type,
the goal is to make the result look correct, and you do not care how you do that. For example, I am old enough to remember manual typewriters that only had
the digits 2 through 9, because you used lower case "l" and capital "O" for digits
1 and O. That was fine, on those typewriters they looked the same.

You could type 0[backspace]/. Did not occur to me at the time, but most
of what I typed was text, not math works for school. The numbers were distinguished from context, except when they weren't.

--
Cheers, Carlos.
ES??, EU??;

--- PyGate Linux v1.5.1
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

Who's Online
Recent Visitors
- Guest
  Sat Nov 22 17:37:30 2025
  from Meh. Nah via Telnet
- Guest
  Wed Nov 26 06:46:07 2025
  from Gremlintown, Az via Telnet
- Guest
  Thu Nov 27 12:02:51 2025
  from Gremlintown, Az via Raw
- John F Kennedy
  Sat Dec 6 10:35:57 2025
  from crazyworldbbs.com:2323 via Telnet

System Info

Sysop:	Tetrazocine
Location:	Melbourne, VIC, Australia
Users:	14
Nodes:	8 (0 / 8)
Uptime:	135:00:30
Calls:	185
Calls today:	1
Files:	21,502
Messages:	82,193

ISO 8859-1 ("Latin 1") (was: Recent history of vi)

Who's Online

Recent Visitors

System Info