special characters in CLAN

Brian MacWhinney macw at cmu.edu
Thu May 15 15:47:56 UTC 2003


Dear Margaret,
  The files on the old CD-ROM are not in Unicode, so putting them into
Unicode with Arial will not produce good results.  For English, Dutch,
and a few other languages, the difference between Unicode and the
earlier ANSI encoding is seldom noticeable.  Even the Japanese and some
Chinese get lucky too because the old encodings were standardized and
still work in Unicode.  But for French, Swedish, Spanish, etc. there
were never any standards and so the movement to Unicode forces changes.
In late 2002, I reformatted all of the data on the server to Unicode.
So that is why you see the discrepancy.  In order to know for sure
whether a file is in Unicode or not, you can use Notepad or Word (not
CLAN) to open the file.  Look at the first line which gives the @Font
information.  If it mentions Unicode, then it should be in Unicode and
should read correctly when placed into Arial Unicode font.  If there is
an @Font header that does not mention Unicode, then it is not in
Unicode.  If there is not @Font header, then it is also probably not in
Unicode.  Also, there may be an @UTF8 header.  If that header is
present, then the file is in Unicode.
   If you have non-Unicode files and wish to convert them to Unicode, you
need to run the CP2UTF program.  If necessary, I could discuss the use
of that program in another message.
   I realize that this whole shift to Unicode can be a bit confusing.
However, once we have all made the shift uniformly, I think you will
find that it makes a lot of sense.

--Brian MacWhinney



More information about the Chibolts mailing list