Why no Cyrillic?

Paul B. Gallagher paulbg at PBG-TRANSLATIONS.COM
Thu Feb 5 02:15:52 UTC 2009


Bill Leidy wrote:

> Hello, I'd like to add a few words about problems with Cyrillic in 
> e-mails. I get the SEELANGS in digest form, and very often the
> Cyrillic comes out in equal signs and hexadecimal numbers as you see
> below. I think this has something to do with the variety of default
> encodings people use or perhaps how the SEELANGS compiles the digest
> and chooses an encoding for the entire e-mail. Anyway, no matter how
> I change the character encoding in Mozilla Thunderbird, I can't fix
> the row of hexadecimal into something readable. Как жаль!
> 
> So, unless I'm doing something wrong on my end, you can see how
> Cyrillic has a tendency to not come out correctly, even on Slavic
> mailing lists when delivered in digest form.

The most obvious problem with a digest is that several messages in 
different encodings must be assembled into one message in a single 
encoding. This one-size-fits-all requirement means that any of the 
original messages encoded differently from the choice made for the 
digest will be garbled.

If everyone standardized on a single encoding (for example, Unicode, 
which can correctly render every language in the world), and the digest 
were also encoded the same way, your problem would disappear. 
Unfortunately, there are still many subscribers who can't or won't use 
Unicode, so the digest receives messages in a variety of encodings. I 
suspect (though I'm not enough of an expert to be sure) that if the 
digest were forced to use Unicode, the various other encodings would be 
rendered correctly. But of course malformed messages (e.g., Cyrillic 
incorrectly sent as Western) would not be fixed.

In the case described in your first paragraph, Richard Robin's original 
Unicode message was rendered in Western in the digest, and thereby 
garbled, and then regarbled again when converted to "quoted printable" 
form (hence all the equal signs). Since you've shown you can handle 
Unicode, I'm sure you could've read the original message if it had not 
gone through the digest process. By the same token, Susan Bauckus 
could've read it too, if she had not displayed a Unicode message in Western.

An easy way to recognize Unicoded Cyrillic is that it has twice as many 
characters as expected, and almost every other letter is "Ð" ("D" with a 
bar through it) or "Ñ" ("N" with a tilde).

-- 
War doesn't determine who's right, just who's left.
--
Paul B. Gallagher
pbg translations, inc.
"Russian Translations That Read Like Originals"
http://pbg-translations.com

-------------------------------------------------------------------------
 Use your web browser to search the archives, control your subscription
  options, and more.  Visit and bookmark the SEELANGS Web Interface at:
                    http://seelangs.home.comcast.net/
-------------------------------------------------------------------------



More information about the SEELANG mailing list