email Cyrillic problems

Richard Robin rrobin at gwu.edu
Wed Oct 20 21:20:59 UTC 1999


I'm posting to the list. Just flame me if you think this should have been sent
individually.
===============
"where does the encoding(?) get lost?  Is it like the fall of the yers, and I
really do not want to know ?"

You're not suggesting that you didn't want to learn about the fall of the
yers?(!)

Seriously, though, it sounds like you're the victim of the war of the encodings.
Most Cyrillicized Windows machines default to WindowsCyrillic (cp1251) for
web-browsing and many (but not all applications). The most recent Microsoft
software products (Word97, 2000, for example) do their best to turn
WindowsCyrillic 1251 into Unicode (where every alphabetic character in the world
gets its own unique code). But some e-mail programs (Netscape, for one) insist
on turning everything into KOI-8.

Here's what often happens. I write you an e-mail. I use Netscape, which I have
set to I use WinCyrillic 1251. Let's say that you too use Netscape. But you have
set it to use KOI8. Guess what -- No problem! Why? Because Netscape thinks it's
smarter than both of us. It KNOWS that God meant e-mail to be written in KOI8,
and so it automatically switches my WinCyrillic1251 message to KOI8 even before
it reaches your mailbox. And then, to take advantage of the Unicode fonts on
your machine (TimesNewRoman, Arial, and Courier), it switches the koi8 to
Unicode so that you can read it on your machine without having to download a
bunch of amateurish-looking koi8 fonts. No hassle, right? Right.... until that
is...

I switch email programs to ... oh, say, Eudora, which I have again set to use
WindowsCyrillic1251. Now I send you the same message. And while Netscape may
think that it's smarter than both of us, it still believes that my incoming
message has already been converted to koi8 and that it has to convert it from
koi8 to Unicode so that it can be read. But... uh-oh... it's converting not from
koi8, but from WindowsCyrillic 1251 AND IT DOESN'T KNOW IT. That's why you get
lots of meaningless characters (usually Cyrillic uppercase).

On the other hand, Netscape doesn't touch attachments (unless the are html
in-line attachments - and then watch out!). But Netscape's concern for
TC-encoding is mostly limited to raw e-mail.

That's what may be happening. Similar things happen with various e-mail programs
on different platforms. Their are a number of band-aids for this and similar
problems. Take a look at http://www.gwu.edu/~slavic/gw-cyrillic/cyrilize.htm for
a complete discussion of Cyrillic and e-mail on PCs.

I would add that Outlook Express (which I don't use) gives you more options in
setting your e-mail encoding, but that means that it "lets" you set it wrong.
Netscape appears to give you a choice of e-mail encodings, but it works behind
the scenes to remove that choice as described above. That's foolproof, as long
as your correspondent is a Netscape user.

For Macs, I looked at http://www.courses.fas.harvard.edu/fl/, which Laurel
Mittenthal suggested earlier today. It looked looked pretty promising.


David Paul Brokaw wrote:

> Dear SEELangers,
>
> I have forgotten some of the information that is the basis of the discussion
> that has been taking place on this list.  Why can I open and read an
> attachment to an e-mail written in Cyrillic and an html file attachment in
> an e-mail written in Cyrillic, but I cannot read the text of an e-mail that
> has been written in Cyrillic.  There is no problem reading the text of a
> website written in Cyrillic and writing an e-mail in Cyrillic.  Why can I
> read the attachment in Cyrillic from a given person when I cannot read the
> text of an e-mail in Cyrillic from the same person?
>
> When I receive an attachment in Cyrillic, all I have to do is open the
> attachment, highlight the text, and set the language to Ukrainian or
> Russian-the two languages I use at work.  The text switches from something
> that looks like curse words in the comic strips to Cyrillic.  When I try the
> same technique with the text of an e-mail, the text just laughs at me.  If
> you listen closely enough, you can hear the giggles.
>
> I assume that something is lost when the information is sent.  But if the
> information is in KIO8 or 1251(CP1251?) when it is sent, where does the
> encoding(?) get lost?  Is it like the fall of the yers, and I really do not
> want to know ?
>
> I thank you for any assistance and apologize for any inconvenience.
>
> Dave Brokaw, Office Manager
> Cincinnati-Kharkiv Sister City Project

--
Richard Robin - http://gwis2.circ.gwu.edu/~rrobin
German and Slavic Dept.
The George Washington University
WASHINGTON, DC 20052
Can read HTML mail.
Читаю по-русски в любой кодировке.
Chitayu po-russki v lyuboi kodirovke.



More information about the SEELANG mailing list