[Corpora-List] European Constitution in parallel
Andrius Utka
a.utka at hmf.vdu.lt
Mon Apr 25 11:57:51 UTC 2005
Dear Joerg,
As far as I know Lithuanian uses ISO 8859-13. Not sure about Latvian.
Best,
Andrius
>
>follow-up ....
>
>I just realized that there are some additional problems with character
>encodings. Latvian and Lithuanian should be supported by
>ISO-8859-4 according to information I found. However, I got serious
>trouble when converting from UTF-8 to ISO for these languages. Did the
>alphabet change recently or is the ISO standard just useless?
>
>Now, I changed the Latvian and Lithuanian texts from the EUconst corpus
>to
>UTF-8 in the CWB index. Looks good but is difficult to query for
>diacritics. Check:
>http://logos.uio.no/cgi-bin/opus/opuscqp.pl?corpus=EUconst;lang=lt
>http://logos.uio.no/cgi-bin/opus/opuscqp.pl?corpus=EUconst;lang=lv
>
>Let me know if there is a 8-bit code that can be (is) used for these
>2 languages.
>
>
>Jörg
>
>***********/\/\/\/\/\/\/\/\/\/\/\************************************
>** Jörg Tiedemann tiedeman at let.rug.nl **
>** Alfa-Informatica http://www.let.rug.nl/~tiedeman **
>** Rijksuniversiteit Groningen Harmoniegebouw, room 1311-429 **
>** Oude Kijk in 't Jatstraat 26 phone: +31 (0)50-363 5935 **
>** 9712 EK Groningen fax: +31 (0)50-363 6855 **
>*************************************/\/\/\/\/\/\/\/\/\/\/\**********
>
>
>
More information about the Corpora
mailing list