[Corpora-List] BILINGUAL PARALLEL CORPORA

Joerg Tiedemann tiedeman at let.rug.nl
Tue Nov 14 11:14:36 UTC 2006



Look also at OPUS:
http://logos.uio.no/opus/

It's free and (sentence) aligned (but not corrected).
The cleanest parts are probably:

OpenOffice manuals (http://logos.uio.no/opus/oo.html)
and the EU constitution (http://logos.uio.no/opus/EUconst.html)
EuroParl is also part of OPUS (tokenized and aligned in XML)

The search interface is on-line again:
http://logos.uio.no/cgi-bin/opus/opuscqp.pl

The interface uses the corpus workbench (CWB) from IMS Stuttgart
(ftp://ftp.ims.uni-stuttgart.de/pub/outgoing/cwb-beta/index.html)
You can get the cgi-script for the search interface if you like.

There are also web-based tools for processing parallel corpora in Uplug
(http://sourceforge.net/projects/uplug)
Look at the demo at http://www.let.rug.nl/~tiedeman/uplug-demo/



Jörg

***********/\/\/\/\/\/\/\/\/\/\/\************************************
**  Jörg Tiedemann                 tiedeman at let.rug.nl             **
**  Alfa-Informatica               http://www.let.rug.nl/~tiedeman **  
**  Rijksuniversiteit Groningen     Harmoniegebouw, room 1311-429  **
**  Oude Kijk in 't Jatstraat 26    phone: +31 (0)50-363 5935      **
**  9712 EK Groningen               fax:   +31 (0)50-363 6855      **
*************************************/\/\/\/\/\/\/\/\/\/\/\**********

On Sun, 12 Nov 2006, JLDLME wrote:

> Dear Corpora-List members,
>    
>   I have three questions...
>    
>   Does anyone know if there is any publicly available bilingual, sentence aligned, freely available corpus involving several languages, namely in Scandinavian (Finnish, Norwegian, etc.) or Latin languages (Spanish, Italian, etc.), for bilingual studies ?
>    
>   My second question is: Which would be the requirements to create an online/desktop software tool for the whole process of a parallel corpora?
>    
>   Finally, do you should consider one million of words (in both languages) a large or a little bilingual corpus?
>    
>   Any help will be appreciated.
>    
>    
>   Regards,
>    
>    
>   J. L. DeLucca (in some place of Spain)
>    
> 
>  
> ---------------------------------
> Access over 1 million songs - Yahoo! Music Unlimited.


More information about the Corpora mailing list