[Corpora-List] European Constitution in parallel

Joerg Tiedemann tiedeman at let.rug.nl
Mon Apr 25 11:12:01 UTC 2005


sorry for the confusion. the corpus is available in unicode. all xml-files 
are in utf8. it's only in the corpus work bench where I tried to use other 
encoding standars (CWB doesn't know unicode and utf8 would make it 
difficult to type a query with some diacritics).

but everything that you download from
http://logos.uio.no/opus/EUconst.html
is in utf-8

joerg


On Mon, 25 Apr 2005, Lou Burnard wrote:

> Would it not be possible to make the corpus available in Unicode?
> 
> Surely that would be the best solution, especially since you're saving 
> it in an XML format.
> 
> But many thanks for this effort -- what a great resource!
> 
> 
> 
> Joerg Tiedemann wrote:
> 
> >follow-up ....
> >
> >I just realized that there are some additional problems with character 
> >encodings. Latvian and Lithuanian should be supported by 
> >ISO-8859-4 according to information I found. However, I got serious 
> >trouble when converting from UTF-8 to ISO for these languages. Did the 
> >alphabet change recently or is the ISO standard just useless?
> >
> >Now, I changed the Latvian and Lithuanian texts from the EUconst corpus to 
> >UTF-8 in the CWB index. Looks good but is difficult to query for 
> >diacritics. Check:
> >http://logos.uio.no/cgi-bin/opus/opuscqp.pl?corpus=EUconst;lang=lt
> >http://logos.uio.no/cgi-bin/opus/opuscqp.pl?corpus=EUconst;lang=lv
> >
> >Let me know if there is a 8-bit code that can be (is) used for these 
> >2 languages.
> >
> >
> >J�rg
> >
> >***********/\/\/\/\/\/\/\/\/\/\/\************************************
> >**  J�rg Tiedemann                 tiedeman at let.rug.nl             **
> >**  Alfa-Informatica               http://www.let.rug.nl/~tiedeman **  
> >**  Rijksuniversiteit Groningen     Harmoniegebouw, room 1311-429  **
> >**  Oude Kijk in 't Jatstraat 26    phone: +31 (0)50-363 5935      **
> >**  9712 EK Groningen               fax:   +31 (0)50-363 6855      **
> >*************************************/\/\/\/\/\/\/\/\/\/\/\**********
> >
> >
> >
> >
> >  
> >
> 



More information about the Corpora mailing list