[Corpora-List] Parallel corpora that are directly searchable on the web?

Iñaki San Vicente Roncal i.sanvicente at elhuyar.com
Mon Nov 8 15:46:03 UTC 2010


http://corpus.consumer.es/corpus/ - Eroski Consumer magazine corpus
includes parallel texts in Spanish, Basque, Galizian and Catalan
languages, approximately 2.5M words per language (3.7M in the case of
Spanish).


Regards,
Iñaki.


.......................................................



Iñaki San Vicente
Hizkuntza Zerbitzuak - I+G+B
Elhuyar Fundazioa
Zelai Haundi, 3
Osinalde industrialdea
20170 Usurbil
tel.: 943363040
www.elhuyar.org






2010/11/8 Wallace Chen <juiching.chen at gmail.com>
>
> Any links for the English-Chinese pair? Thanks!
>
> Wallace Chen
> Monterey Institute of International Studies
> U.S.A.
>
> On 11/8/2010 6:38 AM, R.M.Salkie at bton.ac.uk wrote:
>
> I have found these web sites where you can search for a word or phrase in one language and the output is a bilingual or multilingual concordance.
>
>
>
> Can anyone add to the list?
>
>
>
> Thanks. – Raphael Salkie, School of Humanities, University of Brighton, UK.
>
>
>
>
>
> 1. CABAL: Un concordancier en ligne pour la linguistique contrastive
>
> http://cabal.rezo.net/ (University of Poitiers)
>
> English, French
>
> Environ 200 articles sont actuellement en ligne (soit environ 400 000 mots). La majorité sont issus du Monde diplomatique et datés de 1998 à décembre 2003.
>
>
>
> 2.  The CLUVI corpus:
>
> http://sli.uvigo.es/CLUVI/index_en.html
>
> English, French, Spanish, Galician,
>
> Corpus:  UNESCO Corpus of English-Galician-French-Spanish scientific-technical divulgation
>
>
>
> 3. German(-English) parallel corpora (Europarl and German News)
>
> http://corpus.leeds.ac.uk/paraquery.html
>
> English, German
>
>
>
> 4. WebTCE (Translation Corpus Explorer)
>
> http://khnt.hit.uib.no/webtce.htm
>
> English, German, French, Spanish, Norwegian, Danish
>
>
>
> 5. EVROKORPUS Parallel corpora
>
> http://evrokorpus.gov.si/index.php?jezik=angl
>
> 223 million words. English, French, German, Italian, Slovene and Spanish. Searches must involve Slovene and one other language.
>
>
>
> 6. TERMACOR terminology and corpus
>
> http://evrokorpus.gov.si/k2/index.php?jezik=angl
>
> 98 million words in 22 European Languages.  EU Commission data.
>
>
>
> 7. COMPARA Portuguese-English parallel corpus
>
> http://www.linguateca.pt/COMPARA/
>
> Three million words.
>
> Portuguese, English
>
>
>
> 8. Termsearch
>
> http://www.termsearch.info/ or a faster interface at:
>
> http://www.bible-study-in-geneva.info/termsearch/
>
> English, French, Russian
>
> Major international treaties, conventions, agreements, etc. 792 documents.
>
>
>
> 9. English-Inuktitut Parallel Corpus
>
> http://www.inuktitutcomputing.ca/NunavutHansard/en/
>
> 3.5 million words (of English), 1.5 million words of Inuktitut
>
> English, Inuktitut (an Inuit Language of North-Eastern Canada)
>
>
>
> 10. English-Russian Parallel Corpus
>
> http://ruscorpora.ru/search-para.html
>
> English, Russian, (some German?)
>
> Interface only in Russian.
>
> About 9 million words
>
>
>
> ___________________________________________________________
> This email has been scanned by MessageLabs' Email Security
> System on behalf of the University of Brighton.
> For more information see http://www.brighton.ac.uk/is/spam/
> ___________________________________________________________
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list