12.526, Sum: Corpora English and German

The LINGUIST Network linguist at linguistlist.org
Sun Feb 25 22:21:30 UTC 2001


LINGUIST List:  Vol-12-526. Sun Feb 25 2001. ISSN: 1068-4875.

Subject: 12.526, Sum: Corpora English and German

Moderators: Anthony Aristar, Wayne State U.<aristar at linguistlist.org>
            Helen Dry, Eastern Michigan U. <hdry at linguistlist.org>
            Andrew Carnie, U. of Arizona <carnie at linguistlist.org>

Reviews (reviews at linguistlist.org):
	Simin Karimi, U. of Arizona
	Terence Langendoen, U. of Arizona

Editors (linguist at linguistlist.org):
	Karen Milligan, WSU 		Naomi Ogasawara, EMU
	Lydia Grebenyova, EMU		Jody Huellmantel, WSU
	James Yuells, WSU		Michael Appleby, EMU
	Marie Klopfenstein, WSU		Ljuba Veselinova, Stockholm U.

Software: John Remmers, E. Michigan U. <remmers at emunix.emich.edu>
          Gayathri Sriram, E. Michigan U. <gayatri at linguistlist.org>

Home Page:  http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Lydia Grebenyova <lydia at linguistlist.org>

=================================Directory=================================

1)
Date:  Wed, 21 Feb 2001 05:12:33 -0600
From:  "Frank Oswalt" <f_oswalt at hotmail.com>
Subject:  Corpora English and German

-------------------------------- Message 1 -------------------------------

Date:  Wed, 21 Feb 2001 05:12:33 -0600
From:  "Frank Oswalt" <f_oswalt at hotmail.com>
Subject:  Corpora English and German

For Query: Linguist 11.1877

Howdy y'all,

a long while back I asked for information on German and English corpora
which are tagged for grammatical functions, as well as for accessible
parallel English-German corpora. Here is a summary of the replies I got.


ENGLISH GRAMMATICALLY TAGGED CORPORA

Joybrato Mukherjee (j.mukherjee at uni-bonn.de) drew my attention to the
International Corpus of English, which can be ordered at the following
website (which also allows you to download a very nice demo version):

   http://www.ucl.ac.uk/english-usage/ice/


GERMAN GRAMMATICALLY TAGGED CORPORA

George Smith (george at bloomfield.phil1.uni-potsdam.de) drew my attention to
the NEGRA and TIGER projects, which can be reached via the following
websites:

  http://www.coli.uni-sb.de/sfb378/negra-corpus/
  http://www.coli.uni-sb.de/cl/projects/tiger/


PARALLEL CORPORA GERMAN-ENGLISH

Anatol Stefanowitsch (anatol at rice.edu) drew my attention to a small
web-accessible parallel corpus at the University of Chemnitz:

   http://www.tu-chemnitz.de/phil/InternetGrammar/

Some people have their own collections of parallel texts, which they may or
may not be willing to share with others (there may be copyright issues
here).
The two that agreed to be mentioned here are
- Raphael Salkie (R.M.Salkie at bton.ac.uk), who has a collection of parallel
texts from websites, literature, manuals, EU- documents, political writing
and speeches
coming to about 800.000 words in each language.
- Anatol Stefanowitsch, who has a small collection of parallel  texts from
news magazines (about 15,000 words), and who is in the process of
assembling a larger parallel corpus of narrative writing.


VARIOUS

Martin Frost (Martin at sinequa.com) drew my attention to the following
websites:

  http://www.mpi.nl/world/tg/corpora/corpora.html
  http://www.ifi.unizh.ch/CL
  http://www.ims.uni-stuttgart.de/projekte/corplex/
  http://www.icp.grenet.fr/ELRA/fr/cata/tabtext.html

Thanks also to Klaus Abels, Petra Steiner, and Monika Budde for other
helpful hints.

Take care now,
Frank Oswalt




---------------------------------------------------------------------------
LINGUIST List: Vol-12-526



More information about the LINGUIST mailing list