50% Spanish or German, 50% Chinese
Eduard Selleslagh
edsel at glo.be
Wed Jul 7 10:35:36 UTC 1999
[snip]
>At the risk of beating an undead horse, i would very much like it
>reiterated that not all of us have software capable of recognizing the
>means by which some lucky subscribers' software can encode diacritics
>and other expanded character sets. And worse, some of us have software
>that recognizes such codes, but interprets them quite differently from
>the way they are intended.
[Ed Selleslagh]
I know of no software (e.g. Windows, text processors, browsers, e-mail
programs, etc.) that are incapable of handling the full 256 characters
(using 8 bits) of code page 437 or 850, unless you are using a '70's IBM
mainframe.
[ Moderator's comment:
Try a 1990's XKL mainframe. One that uses 7-bit ASCII natively, and barfs
on anything else. Like the one running this mailing list.
--rma ]
It is simply a matter of settings which you can change easily. Unless... See
below.
>For instance, the software we have here at SCU automatically converts
>all such codes into representations of various Chinese characters. The
>result is that i have recently received many postings including extended
>and richly-exemplified discussions of Spanish vocabulary, and most of
>the words in question have been printed half in Spanish and half in
>Chinese, to the point that i have been unable to make any sense out of
>the posting at all, and have regretfully decided i must automatically
>dump & ignore the whole discussion.
[Ed]
In the English speaking countries a number of servers, but far from all,
use only 7-bit encoding, i.e. only 128 characters.
In the case of 8-bit transmission, local (on your own PC) switching to the
right character set (code page) will solve all the problems (e.g. in
Windows: 'international settings'). You clearly use another code page in
which only the first 128 characters are recognized as classic ASCII. BUT:
languages like Chinese usually use 16-bit characters; I guess your PC is set
for a 16-bit (256 x 256 = 65,536 characters) code page, the first 128 of
which are reserved for the classic ASCII characters like in all non-Latin
code pages (e.g. Greek, Cyrillic, etc.).
Apparently, the *.xkl.com server correctly transmits 8-bit (256) codes,
which can locally (your PC) be recognized correctly using the right
settings, since I receive the diacritics correctly, using code page 850 (437
works equally well).
[ Moderator's comment:
No, the server here transmits only 7-bit ASCII. MIME-quoted-printable
messages, which use only 7-bit ASCII, are translated at the *receiving* end
into 8-bit (which I cannot read with any mail program available to me, to
check on this).
Further, you are assuming a Windows system, with your discussion of "code
pages" and the like. Neither Macintosh nor Unix systems agree with Windows
on 8-bit conventions. Therefore, it is unfriendly to use 8-bit characters
in mailing list messages.
--rma ]
Of course, there may be some people on this list that depend on a local
(somewhat old-fashioned) server that only transmits 7 bits, in which case
there is no solution to the problem.
[ Moderator's comment:
Your moderator, for instance.
--rma ]
It is not obvious that setting the right code page would eliminate your
'Chinese problem' because of the 16-bit setting, which creates a problem on
a deeper level than that among European-type PC's (actually their software).
>I've recently begun noticing similar problems with postings in German.
[Ed]
The solution is the same for all Western European languages (code page 437
or 850).
[ moderator snip ]
>[ Moderator's comment:
> I have pointed this out in the past, and been roundly excoriated for my
> point of view, taken somehow to be "English-only". I will once again
> suggest that we adopt a modified TeX-like accent-writing system, in which
> the accent (in the typographical sense, which includes umlaut/diaeresis/
> trema and the like) is written next to the character affected. (In TeX
> systems, it must precede, but I think that context can disambiguate for
> human readers.) Should I send out a list of the TeX conventions, for those
> unused to them?
> --rma ]
[Ed]
The list would be very welcome.
However, there is a problem: on most non-US keyboards, either the diacritics
are on dead keys (How do you write a diacritic without a letter under it? I
tried the dead key plus Spacebar: it works) or the letters with diacritics
are normal keys (Maybe you have to use ALT+number codes?).
[ Moderator's response:
The diacritics used in TeX are 7-bit ASCII characters. I will post a list of
them shortly. Let's get caught up on the backlog first.
--rma ]
More information about the Indo-european
mailing list