[Corpora-List] European Constitution in parallel
Joerg Tiedemann
tiedeman at let.rug.nl
Thu Apr 28 17:08:00 UTC 2005
thanks for your reply.
I changed the encoding for the CWB indeces to iso-8859-13. I hope it
worked. maybe you could have a short look if you have some time. (the
OPUS query interface)
thanks for your help!
Jörg
***********/\/\/\/\/\/\/\/\/\/\/\************************************
** Jörg Tiedemann tiedeman at let.rug.nl **
** Alfa-Informatica http://www.let.rug.nl/~tiedeman **
** Rijksuniversiteit Groningen Harmoniegebouw, room 1311-429 **
** Oude Kijk in 't Jatstraat 26 phone: +31 (0)50-363 5935 **
** 9712 EK Groningen fax: +31 (0)50-363 6855 **
*************************************/\/\/\/\/\/\/\/\/\/\/\**********
On Mon, 25 Apr 2005, Andrius Utka wrote:
> Dear Joerg,
> As far as I know Lithuanian uses ISO 8859-13. Not sure about Latvian.
> Best,
> Andrius
>
> >
> >follow-up ....
> >
> >I just realized that there are some additional problems with character
> >encodings. Latvian and Lithuanian should be supported by
> >ISO-8859-4 according to information I found. However, I got serious
> >trouble when converting from UTF-8 to ISO for these languages. Did the
> >alphabet change recently or is the ISO standard just useless?
> >
> >Now, I changed the Latvian and Lithuanian texts from the EUconst corpus
> >to
> >UTF-8 in the CWB index. Looks good but is difficult to query for
> >diacritics. Check:
> >http://logos.uio.no/cgi-bin/opus/opuscqp.pl?corpus=EUconst;lang=lt
> >http://logos.uio.no/cgi-bin/opus/opuscqp.pl?corpus=EUconst;lang=lv
> >
> >Let me know if there is a 8-bit code that can be (is) used for these
> >2 languages.
> >
> >
> >Jörg
> >
> >***********/\/\/\/\/\/\/\/\/\/\/\************************************
> >** Jörg Tiedemann tiedeman at let.rug.nl **
> >** Alfa-Informatica http://www.let.rug.nl/~tiedeman **
> >** Rijksuniversiteit Groningen Harmoniegebouw, room 1311-429 **
> >** Oude Kijk in 't Jatstraat 26 phone: +31 (0)50-363 5935 **
> >** 9712 EK Groningen fax: +31 (0)50-363 6855 **
> >*************************************/\/\/\/\/\/\/\/\/\/\/\**********
> >
> >
> >
>
>
>
>
>
More information about the Corpora
mailing list