Decoding in XXI Cent

Chris F Waigl chris at LASCRIBE.NET
Tue Feb 6 01:45:03 UTC 2007


sagehen wrote:
>
> Perhaps easier than running down whose Latin was involved  in the faeces et
> urinam, would be deciphering  the "=?iso-2022-jp?B?"  (Quotation marks
> added)with which about 15% of my spam is  headed.  It may be in the sender
> or the subject heading, but it is always the first group of characters & is
> followed by more varied gibberish.  I think it has turned up half-a-dozen
> times already, today.  The time stamp is sometimes -0500   but can be
> nearly anything. Lots of identical spam does show up , but I don't think
> anything else comes close to the number of repetitions of this (?) phrase.
> AM
>
>

I expect this is rather far out in the off-topic area for ADS-L, but as
this question falls into my are of professional expertise, allow me to
explain: What you are seeing is the correct way to indicate the
character set and encoding of your subject header as laid down in RFC
2047 (http://tools.ietf.org/html/rfc2047). A somewhat less technical
explanation is for example here:
http://www.johanvanmol.org/content/view/34/37/1/3/ (where it says "Mail
Clients").

In a nutshell, the information about what character set / encoding a
message is in is contained in one of the message headers. Now the
problem arises where to put the information about the charset of message
headers themselves: when the mail application reads the headers, it
doesn't yet have this information at its disposal. So this ugly clutch
is what people came up with.

Spam filtering on this information has been tried, but I've usually been
very angry with it. For a short while messages in English had a much
larger probability to land in my spam folder than those I received in
French or German.

Chris Waigl

------------------------------------------------------------
The American Dialect Society - http://www.americandialect.org



More information about the Ads-l mailing list