[Corpora-List] BILINGUAL PARALLEL CORPORA
Joerg Tiedemann
tiedeman at let.rug.nl
Tue Nov 14 11:14:36 UTC 2006
Look also at OPUS:
http://logos.uio.no/opus/
It's free and (sentence) aligned (but not corrected).
The cleanest parts are probably:
OpenOffice manuals (http://logos.uio.no/opus/oo.html)
and the EU constitution (http://logos.uio.no/opus/EUconst.html)
EuroParl is also part of OPUS (tokenized and aligned in XML)
The search interface is on-line again:
http://logos.uio.no/cgi-bin/opus/opuscqp.pl
The interface uses the corpus workbench (CWB) from IMS Stuttgart
(ftp://ftp.ims.uni-stuttgart.de/pub/outgoing/cwb-beta/index.html)
You can get the cgi-script for the search interface if you like.
There are also web-based tools for processing parallel corpora in Uplug
(http://sourceforge.net/projects/uplug)
Look at the demo at http://www.let.rug.nl/~tiedeman/uplug-demo/
Jörg
***********/\/\/\/\/\/\/\/\/\/\/\************************************
** Jörg Tiedemann tiedeman at let.rug.nl **
** Alfa-Informatica http://www.let.rug.nl/~tiedeman **
** Rijksuniversiteit Groningen Harmoniegebouw, room 1311-429 **
** Oude Kijk in 't Jatstraat 26 phone: +31 (0)50-363 5935 **
** 9712 EK Groningen fax: +31 (0)50-363 6855 **
*************************************/\/\/\/\/\/\/\/\/\/\/\**********
On Sun, 12 Nov 2006, JLDLME wrote:
> Dear Corpora-List members,
>
> I have three questions...
>
> Does anyone know if there is any publicly available bilingual, sentence aligned, freely available corpus involving several languages, namely in Scandinavian (Finnish, Norwegian, etc.) or Latin languages (Spanish, Italian, etc.), for bilingual studies ?
>
> My second question is: Which would be the requirements to create an online/desktop software tool for the whole process of a parallel corpora?
>
> Finally, do you should consider one million of words (in both languages) a large or a little bilingual corpus?
>
> Any help will be appreciated.
>
>
> Regards,
>
>
> J. L. DeLucca (in some place of Spain)
>
>
>
> ---------------------------------
> Access over 1 million songs - Yahoo! Music Unlimited.
More information about the Corpora
mailing list