Non-ASC-II [was: Re: Firstmention.com; Texas Proverbs]

Chris F Waigl chris at LASCRIBE.NET
Fri Aug 24 08:13:17 UTC 2007


Alice Faber wrote:
> The problem is individual email software, not just what the list
> tolerates.

The problem, these days, is largely the list software.  Most modern
email software on just about any platform is quite capable of displaying
even utf-8 (the appropriate Unicode encoding).

> For instance, I see some posts with non-ASCII characters, but
> not others. Depending on how the posts reach me, I might see a ?, a
> square, or just *nothing*.
What happens is that list members send in pieces of email that are
correctly encoded for email. These may be posts that contain the
characters directly (to simplify a bit), or encode the content specially
as quoted-printable ( these posts littered with "=2D" and similar codes)
or base64 (the solid blocks of gibberish we sometimes get).

What the list software does, however, is to strip away the lines in the
headers of the original submission that indicate which encoding is used,
thus sending out broken email. It is bad form, anyway, to send mass
email without these headers, even if the content is in us-ascii.

Now it's up to the individual email program to try to recover from this
situation. Some have algorithms to detect for example base64 -- and
therefore some of us never see these solid blocks of characters.

For example, for the easiest case of non-ascii, Western European
(ISO-8859-1) characters such as used in Spanish, French, German etc.,
most of the email programs used by this group will be set to this
character set as a default. These subscribers will therefore be able to
see the correct display if I type, for example, résumé, España or bokmål.

However, a list member in Greece or an Arabic-speaking country, whose
email client may be set to ISO-8859-7 (Greek) or ISO-8859-6 (Arabic)
would see Greek or Arabic characters instead of the Western European
ones. Also, if your email program is set to a default of us-ascii, you
can change it to ISO-8859-1 (Western European) without losing anything.
<http://no.wikipedia.org/wiki/Arag%C3%B3n>
> The only thing the ADS-L can do, possibly, is
> to transmit the characters and provide an indication of what character
> set is being used.
>
Most of the time, ADS-L does transmit the characters but removes the
"recipe" how to read them from the header of the original email when
sending it on to the list. There is no such thing as plain text --
without the "recipe", it is nothing more than guesswork.

If the list owners need help finding, altering and testing the parts of
the list software, I'd be happy to help. (I solve problems with email
software for a living.) If it is a Perl script, it's not hard to do and
I've done it before.

I, for one, think that occasional IPA or non-Latin characters (a quoted
Hebrew word, a Chinese character, something from Old English...) would
be very useful on this particular list.

Chris Waigl

------------------------------------------------------------
The American Dialect Society - http://www.americandialect.org



More information about the Ads-l mailing list