[Corpora-List] Parallel corpora that are directly searchable on theweb?

William H Fletcher fletcher at usna.edu
Mon Nov 8 15:05:49 UTC 2010


http://www.linguee.de/ “Das Web als Wörterbuch” is based on 100M bilingual
online texts (yes, total texts, not total words), primarily German /
English, but also French, Portuguese and Spanish.  The quality of the
translations varies, but unverified translations are marked as such.  Search
terms and their translations are highlighted in the results.

 

http://WeBiText.com <http://webitext.com/>  has searchable parallel texts in
30 different language, primarily from EU and other government sources.
Unlike Linguee this site appears to match exact wordforms only.

 

Regards,

Bill Fletcher

 

  _____  

From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
R.M.Salkie at bton.ac.uk
Sent: Monday, November 08, 2010 9:39 AM
To: corpora at uib.no
Subject: [Corpora-List] Parallel corpora that are directly searchable on
theweb?

 

I have found these web sites where you can search for a word or phrase in
one language and the output is a bilingual or multilingual concordance.

 

Can anyone add to the list?

 

Thanks. – Raphael Salkie, School of Humanities, University of Brighton, UK.

 

 

1. CABAL: Un concordancier en ligne pour la linguistique contrastive

 <http://cabal.rezo.net/> http://cabal.rezo.net/ (University of Poitiers)

English, French

Environ 200 articles sont actuellement en ligne (soit environ 400 000 mots).
La majorité sont issus du Monde diplomatique et datés de 1998 à décembre
2003.

 

2.  The CLUVI corpus:

http://sli.uvigo.es/CLUVI/index_en.html

English, French, Spanish, Galician, 

Corpus:  UNESCO Corpus of English-Galician-French-Spanish
scientific-technical divulgation

 

3. German(-English) parallel corpora (Europarl and German News)

http://corpus.leeds.ac.uk/paraquery.html

English, German

 

4. WebTCE (Translation Corpus Explorer)

 <http://khnt.hit.uib.no/webtce.htm> http://khnt.hit.uib.no/webtce.htm

English, German, French, Spanish, Norwegian, Danish

 

5. EVROKORPUS Parallel corpora

 <http://evrokorpus.gov.si/index.php?jezik=angl>
http://evrokorpus.gov.si/index.php?jezik=angl

223 million words. English, French, German, Italian, Slovene and Spanish.
Searches must involve Slovene and one other language.

 

6. TERMACOR terminology and corpus 

http://evrokorpus.gov.si/k2/index.php?jezik=angl

98 million words in 22 European Languages.  EU Commission data.

 

7. COMPARA Portuguese-English parallel corpus

http://www.linguateca.pt/COMPARA/ 

Three million words.

Portuguese, English

 

8. Termsearch

http://www.termsearch.info/ or a faster interface at:

http://www.bible-study-in-geneva.info/termsearch/

English, French, Russian

Major international treaties, conventions, agreements, etc. 792 documents.

 

9. English-Inuktitut Parallel Corpus

http://www.inuktitutcomputing.ca/NunavutHansard/en/ 

3.5 million words (of English), 1.5 million words of Inuktitut

English, Inuktitut (an Inuit Language of North-Eastern Canada)

 

10. English-Russian Parallel Corpus

http://ruscorpora.ru/search-para.html

English, Russian, (some German?)

Interface only in Russian.

About 9 million words

 


___________________________________________________________
This email has been scanned by MessageLabs' Email Security
System on behalf of the University of Brighton.
For more information see http://www.brighton.ac.uk/is/spam/
___________________________________________________________

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20101108/7e94fa8f/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list