[Corpora-List] Release of a German-English Parallel Corpus
manaal faruqui
manaalfar at gmail.com
Mon May 14 15:12:22 UTC 2012
Hello everyone,
We would like to announce the distribution of a German-English parallel corpus
of 18th/19th century literary texts.
The corpus has been constructed from a total of 106 public-domain novels and
stories, mostly 19th-century texts collected from the Project
Gutenberg website. The
texts are available for research purposes (see the website for details).
The texts are segmented into paragraphs, sentences and words, are aligned
at the sentence level, and are POS-tagged and lemmatized in both languages.
Furthermore, the German sentences are labeled with T/V (formality) information
on the basis of pronoun information which has been copied onto the English
side. See our paper (Manaal Faruqui and Sebastian Pado, "Towards a model of
formal and informal address in English" presented at EACL-2012) for details.
For the corpus, and more information, please see
http://www.nlpado.de/~sebastian/data/tv_data.shtml.
Regards,
Manaal Faruqui
Manaal Faruqui | Final Year Dual Degree | Computer Science and Engg | IIT
Kharagpur
Website: http://cse.iitkgp.ac.in/~manaalf<http://cse.iitkgp.ac.in/%7Eashisy>
Mobile: +91-9932900944
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20120514/5674b7ed/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list