words with accents
Leonid Spektor
spektor at andrew.cmu.edu
Mon Feb 12 18:52:58 UTC 2007
Dear CLAN community,
It looks like there is a lot of confusion about Unicode CLAN, that is
clanxu.sit and clanwinu.exe. Both of those version will be able to open your
data files and convert them to Unicode format, if necessary. But neither one
of them will be able run CLAN programs correctly on you old, non-Unicode
data files.
If you are using PC, then please open your data file with Notepad.
If you are using Mac, then please use TextEdit.
Next check for a "@UTF8" line at the top of the data file.
If this line is missing then you need to run cp2utf conversion program on
your data files. If you have just a few files and/or you want to convert
them by hand, then you can just open each data file with Unicode CLAN, make
a small change and then save it. That will convert your data file(s) to
Unicode.
If your are constantly using textin to convert files to CHAT format, then
you will need to run cp2utf on the output of the textin program.
If you choose to continue using Unicode CLAN on non-Unicode data, then you
will be stuck with the only option described in a message below.
All of the data on childes web site has already been converted to
Unicode, so if you are experiencing a problem with that data, then please
let us know so that we can fix the problem.
Leonid.
On 12-02-07 04:18, "Castle Sinicrope" <castles at hawaii.edu> wrote:
> Dear CLAN community,
>
> I have also experienced similar problems with umlaut and double s characters
> in my German data. These characters disappear in the CLAN output window when
> I run freq or kwal. However, when I output my results to a separate file and
> view the file outside CLAN, the characters are present:
>
> Example kwal for Vereinigten Staaten
>
> 1. Output in CLAN window:
> *TXT: Whrend der Grndung der Vereinigten Staaten war eine der grten
> Probleme die Reprsentation der Staaten.
>
> 2. Output outside CLAN window (using Notepad):
> *TXT: Während der Gründung der Vereinigten Staaten war eine der größten
> Probleme die Repräsentation der Staaten.
>
> I have encountered the same difficulty when converting plain text (.txt)
> files to .cha files using the textin command. In the converted .cha files,
> umlauts and the double s are missing. For recent data, I have added these
> characters manually, and these manually added characters do show up in the
> kwal results. Adding manually, however, does not seem like the correct
> approach.
>
> If anyone has any suggestions or helpful hints, I would be very grateful.
>
> Thank you for your time,
> Castle Sinicrope
>
>
> On 2/2/07, Fernanda Gonçalves <frg at uevora.pt> wrote:
>>
>> Dear Brian,
>>
>> I have a problem when running programs like freq or combo on my files: the
>> words that are written with accents ( like "está" or "óculos") appear in the
>> output files with those characters omitted (like "est" or "culos").
>> Can it be because I´m using the current version of CLAN and the files were
>> written with a previous version?
>> With that previous version, it didn´t occur.
>>
>> Thank you for your attention.
>>
>> Best wishes.
>>
>> Fernanda Gonçalves
>>
More information about the Chibolts
mailing list