[Corpora-List] English-French parallel corpus?

Chris Callison-Burch callison-burch at ed.ac.uk
Thu Jan 19 17:04:02 UTC 2006


Dear Oana,

You might consider constructing a parallel corpus of French novels  
and their translations into English using public domain texts from  
Project Gutenberg.  As I see it, there are two advantages of doing  
this.  Firstly, the text would be quite different from the  
parliamentary domain represented by the Canadian Hansard and  
Europarl.  Secondly, novels often have multiple translations, which  
you could potentially use with automatic MT evaluation metrics that  
take advantage of multiple reference translations.

Here's an example to get you started:

Madame Bovary in the original French:
	http://www.gutenberg.org/files/14155/14155-8.txt

Translated into English:
	http://www.gutenberg.org/dirs/etext00/mbova11.txt

Also, here are two additional English translations that Regina  
Barzilay used in her PhD thesis on paraphrasing with monolingual  
parallel corpora:
	http://people.csail.mit.edu/regina/par/bovary1.txt
	http://people.csail.mit.edu/regina/par/bovary3.txt

Yours,
Chris Callison-Burch



On Jan 19, 2006, at 4:10 PM, ofrun083 at uottawa.ca wrote:

>
>
>     Hello All,
>
>   My name is Oana, and i am a Msc. student at University of Ottawa  
> working
> in the field of NLP and ML.
>
>   I am currently working on project for French and English, and i am
> looking for a parallel corpus, other than Hansard  and EuroParl. I am
> interested in a parallel text that contains other domains, any,  
> than the
> ones of Hansard and EuroParl.
>
>   Thank you for your help,
>     Oana
>



More information about the Corpora mailing list