[Corpora-List] Release of a German-English Parallel Corpus

manaal faruqui manaalfar at gmail.com
Mon May 14 15:12:22 UTC 2012


Hello everyone,

We would like to announce the distribution of a German-English parallel corpus
of 18th/19th century literary texts.

The corpus has been constructed from a total of 106 public-domain novels and
stories, mostly 19th-century texts collected from the Project
Gutenberg website. The
texts are available for research purposes (see the website for details).

The texts are segmented into paragraphs, sentences and words, are aligned
at the sentence level, and are POS-tagged and lemmatized in both languages.

Furthermore, the German sentences are labeled with T/V (formality) information
on the basis of pronoun information which has been copied onto the English
side. See our paper (Manaal Faruqui and Sebastian Pado, "Towards a model of
formal and informal address in English" presented at EACL-2012) for details.

For the corpus, and more information, please see
http://www.nlpado.de/~sebastian/data/tv_data.shtml.

Regards,
Manaal Faruqui


Manaal Faruqui | Final Year Dual Degree | Computer Science and Engg | IIT
Kharagpur

Website: http://cse.iitkgp.ac.in/~manaalf<http://cse.iitkgp.ac.in/%7Eashisy>
Mobile: +91-9932900944
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20120514/5674b7ed/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list