12.526, Sum: Corpora English and German
The LINGUIST Network
linguist at linguistlist.org
Sun Feb 25 22:21:30 UTC 2001
LINGUIST List: Vol-12-526. Sun Feb 25 2001. ISSN: 1068-4875.
Subject: 12.526, Sum: Corpora English and German
Moderators: Anthony Aristar, Wayne State U.<aristar at linguistlist.org>
Helen Dry, Eastern Michigan U. <hdry at linguistlist.org>
Andrew Carnie, U. of Arizona <carnie at linguistlist.org>
Reviews (reviews at linguistlist.org):
Simin Karimi, U. of Arizona
Terence Langendoen, U. of Arizona
Editors (linguist at linguistlist.org):
Karen Milligan, WSU Naomi Ogasawara, EMU
Lydia Grebenyova, EMU Jody Huellmantel, WSU
James Yuells, WSU Michael Appleby, EMU
Marie Klopfenstein, WSU Ljuba Veselinova, Stockholm U.
Software: John Remmers, E. Michigan U. <remmers at emunix.emich.edu>
Gayathri Sriram, E. Michigan U. <gayatri at linguistlist.org>
Home Page: http://linguistlist.org/
The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.
Editor for this issue: Lydia Grebenyova <lydia at linguistlist.org>
=================================Directory=================================
1)
Date: Wed, 21 Feb 2001 05:12:33 -0600
From: "Frank Oswalt" <f_oswalt at hotmail.com>
Subject: Corpora English and German
-------------------------------- Message 1 -------------------------------
Date: Wed, 21 Feb 2001 05:12:33 -0600
From: "Frank Oswalt" <f_oswalt at hotmail.com>
Subject: Corpora English and German
For Query: Linguist 11.1877
Howdy y'all,
a long while back I asked for information on German and English corpora
which are tagged for grammatical functions, as well as for accessible
parallel English-German corpora. Here is a summary of the replies I got.
ENGLISH GRAMMATICALLY TAGGED CORPORA
Joybrato Mukherjee (j.mukherjee at uni-bonn.de) drew my attention to the
International Corpus of English, which can be ordered at the following
website (which also allows you to download a very nice demo version):
http://www.ucl.ac.uk/english-usage/ice/
GERMAN GRAMMATICALLY TAGGED CORPORA
George Smith (george at bloomfield.phil1.uni-potsdam.de) drew my attention to
the NEGRA and TIGER projects, which can be reached via the following
websites:
http://www.coli.uni-sb.de/sfb378/negra-corpus/
http://www.coli.uni-sb.de/cl/projects/tiger/
PARALLEL CORPORA GERMAN-ENGLISH
Anatol Stefanowitsch (anatol at rice.edu) drew my attention to a small
web-accessible parallel corpus at the University of Chemnitz:
http://www.tu-chemnitz.de/phil/InternetGrammar/
Some people have their own collections of parallel texts, which they may or
may not be willing to share with others (there may be copyright issues
here).
The two that agreed to be mentioned here are
- Raphael Salkie (R.M.Salkie at bton.ac.uk), who has a collection of parallel
texts from websites, literature, manuals, EU- documents, political writing
and speeches
coming to about 800.000 words in each language.
- Anatol Stefanowitsch, who has a small collection of parallel texts from
news magazines (about 15,000 words), and who is in the process of
assembling a larger parallel corpus of narrative writing.
VARIOUS
Martin Frost (Martin at sinequa.com) drew my attention to the following
websites:
http://www.mpi.nl/world/tg/corpora/corpora.html
http://www.ifi.unizh.ch/CL
http://www.ims.uni-stuttgart.de/projekte/corplex/
http://www.icp.grenet.fr/ELRA/fr/cata/tabtext.html
Thanks also to Klaus Abels, Petra Steiner, and Monika Budde for other
helpful hints.
Take care now,
Frank Oswalt
---------------------------------------------------------------------------
LINGUIST List: Vol-12-526
More information about the LINGUIST
mailing list