[Corpora-List] English-French parallel corpus?
Chris Callison-Burch
callison-burch at ed.ac.uk
Thu Jan 19 17:04:02 UTC 2006
Dear Oana,
You might consider constructing a parallel corpus of French novels
and their translations into English using public domain texts from
Project Gutenberg. As I see it, there are two advantages of doing
this. Firstly, the text would be quite different from the
parliamentary domain represented by the Canadian Hansard and
Europarl. Secondly, novels often have multiple translations, which
you could potentially use with automatic MT evaluation metrics that
take advantage of multiple reference translations.
Here's an example to get you started:
Madame Bovary in the original French:
http://www.gutenberg.org/files/14155/14155-8.txt
Translated into English:
http://www.gutenberg.org/dirs/etext00/mbova11.txt
Also, here are two additional English translations that Regina
Barzilay used in her PhD thesis on paraphrasing with monolingual
parallel corpora:
http://people.csail.mit.edu/regina/par/bovary1.txt
http://people.csail.mit.edu/regina/par/bovary3.txt
Yours,
Chris Callison-Burch
On Jan 19, 2006, at 4:10 PM, ofrun083 at uottawa.ca wrote:
>
>
> Hello All,
>
> My name is Oana, and i am a Msc. student at University of Ottawa
> working
> in the field of NLP and ML.
>
> I am currently working on project for French and English, and i am
> looking for a parallel corpus, other than Hansard and EuroParl. I am
> interested in a parallel text that contains other domains, any,
> than the
> ones of Hansard and EuroParl.
>
> Thank you for your help,
> Oana
>
More information about the Corpora
mailing list