[Corpora-List] European Constitution in parallel

Andrius Utka a.utka at hmf.vdu.lt
Mon Apr 25 11:57:51 UTC 2005


Dear Joerg,
As far as I know Lithuanian uses ISO 8859-13. Not sure about Latvian.
Best,
Andrius

>
>follow-up ....
>
>I just realized that there are some additional problems with character
>encodings. Latvian and Lithuanian should be supported by
>ISO-8859-4 according to information I found. However, I got serious
>trouble when converting from UTF-8 to ISO for these languages. Did the
>alphabet change recently or is the ISO standard just useless?
>
>Now, I changed the Latvian and Lithuanian texts from the EUconst corpus
>to
>UTF-8 in the CWB index. Looks good but is difficult to query for
>diacritics. Check:
>http://logos.uio.no/cgi-bin/opus/opuscqp.pl?corpus=EUconst;lang=lt
>http://logos.uio.no/cgi-bin/opus/opuscqp.pl?corpus=EUconst;lang=lv
>
>Let me know if there is a 8-bit code that can be (is) used for these
>2 languages.
>
>
>Jörg
>
>***********/\/\/\/\/\/\/\/\/\/\/\************************************
>**  Jörg Tiedemann                 tiedeman at let.rug.nl             **
>**  Alfa-Informatica               http://www.let.rug.nl/~tiedeman **
>**  Rijksuniversiteit Groningen     Harmoniegebouw, room 1311-429  **
>**  Oude Kijk in 't Jatstraat 26    phone: +31 (0)50-363 5935      **
>**  9712 EK Groningen               fax:   +31 (0)50-363 6855      **
>*************************************/\/\/\/\/\/\/\/\/\/\/\**********
>
>
>



More information about the Corpora mailing list