[Corpora-List] European Constitution in parallel
Lou Burnard
lou.burnard at computing-services.oxford.ac.uk
Mon Apr 25 11:06:41 UTC 2005
Would it not be possible to make the corpus available in Unicode?
Surely that would be the best solution, especially since you're saving
it in an XML format.
But many thanks for this effort -- what a great resource!
Joerg Tiedemann wrote:
>follow-up ....
>
>I just realized that there are some additional problems with character
>encodings. Latvian and Lithuanian should be supported by
>ISO-8859-4 according to information I found. However, I got serious
>trouble when converting from UTF-8 to ISO for these languages. Did the
>alphabet change recently or is the ISO standard just useless?
>
>Now, I changed the Latvian and Lithuanian texts from the EUconst corpus to
>UTF-8 in the CWB index. Looks good but is difficult to query for
>diacritics. Check:
>http://logos.uio.no/cgi-bin/opus/opuscqp.pl?corpus=EUconst;lang=lt
>http://logos.uio.no/cgi-bin/opus/opuscqp.pl?corpus=EUconst;lang=lv
>
>Let me know if there is a 8-bit code that can be (is) used for these
>2 languages.
>
>
>J�rg
>
>***********/\/\/\/\/\/\/\/\/\/\/\************************************
>** J�rg Tiedemann tiedeman at let.rug.nl **
>** Alfa-Informatica http://www.let.rug.nl/~tiedeman **
>** Rijksuniversiteit Groningen Harmoniegebouw, room 1311-429 **
>** Oude Kijk in 't Jatstraat 26 phone: +31 (0)50-363 5935 **
>** 9712 EK Groningen fax: +31 (0)50-363 6855 **
>*************************************/\/\/\/\/\/\/\/\/\/\/\**********
>
>
>
>
>
>
More information about the Corpora
mailing list