12.2303, FYI: Gsearch Corpus Query, German Treebank Sampler

LINGUIST List linguist at linguistlist.org
Wed Sep 19 16:02:14 UTC 2001


LINGUIST List:  Vol-12-2303. Wed Sep 19 2001. ISSN: 1068-4875.

Subject: 12.2303, FYI: Gsearch Corpus Query, German Treebank Sampler

Moderators: Anthony Aristar, Wayne State U.<aristar at linguistlist.org>
            Helen Dry, Eastern Michigan U. <hdry at linguistlist.org>
            Andrew Carnie, U. of Arizona <carnie at linguistlist.org>

Reviews (reviews at linguistlist.org):
	Simin Karimi, U. of Arizona
	Terence Langendoen, U. of Arizona

Editors (linguist at linguistlist.org):
	Karen Milligan, WSU 		Naomi Ogasawara, EMU
	Jody Huellmantel, WSU		James Yuells, WSU
	Michael Appleby, EMU		Marie Klopfenstein, WSU
	Ljuba Veselinova, Stockholm U.	Heather Taylor-Loring, EMU
	Dina Kapetangianni, EMU		Richard Harvey, EMU
	Karolina Owczarzak, EMU		Renee Galvis, WSU

Software: John Remmers, E. Michigan U. <remmers at emunix.emich.edu>
          Gayathri Sriram, E. Michigan U. <gayatri at linguistlist.org>

Home Page:  http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.



Editor for this issue: Karen Milligan <karen at linguistlist.org>

=================================Directory=================================

1)
Date:  Fri, 14 Sep 2001 12:15:42 +0200 (MET DST)
From:  Frank Keller <keller at CoLi.Uni-SB.DE>
Subject:  Available for download: Gsearch Corpus Query System

2)
Date:  Fri, 14 Sep 2001 08:51:02 +0200
From:  "TIGER corpus team" <tigercorpus at ims.uni-stuttgart.de>
Subject:  German treebank sampler

-------------------------------- Message 1 -------------------------------

Date:  Fri, 14 Sep 2001 12:15:42 +0200 (MET DST)
From:  Frank Keller <keller at CoLi.Uni-SB.DE>
Subject:  Available for download: Gsearch Corpus Query System

- -------------------------
GSEARCH CORPUS QUERY SYSTEM
- -------------------------

We are pleased to announce the immediate availability of Gsearch 2.06,
free of charge for research purposes.

The Gsearch corpus query system allows the selection of sentences by
syntactic criteria from text corpora, even when these corpora contain
no prior syntactic markup. This is achieved by means of a fast chart
parser, which takes as input a grammar and a search expression
specified by the user.

Among the major features of Gsearch are:

* runs under Solaris, Linux, and MacOS X;

* simple to install, based on GNU automake/autoconf;

* supports standard corpora (including BNC, Brown, Susanne, WSJ,
  Frankfurter Rundschau, Negra);

* can be easily extended to new corpora;

* supports standard taggers (LT POS, TnT);

* interfaces with external linguistic resources such as WordNet;

* outputs syntax trees in SGML, but also interfaces with external
  visualization tools (Viewtree, Thistle);

* comes with a tool for random sampling of Gsearch output.

For more information about Gsearch, and to download the latest
version, please visit:

http://www.hcrc.ed.ac.uk/gsearch/

Bug reports, suggestions for enhancements should be sent to:

gsearch-dev at cogsci.ed.ac.uk

Sincerely,

Gsearch Deveopment Team
Martin Corley, University of Edinburgh
Frank Keller, Saarland University


-------------------------------- Message 2 -------------------------------

Date:  Fri, 14 Sep 2001 08:51:02 +0200
From:  "TIGER corpus team" <tigercorpus at ims.uni-stuttgart.de>
Subject:  German treebank sampler


The TIGER German treebank sampler has been released!

A large syntactically annotated corpus of German newspaper text is
under construction in the TIGER project - with project partners in
Saarbruecken, Potsdam, and Stuttgart In order to get feedback from the
research community, the TIGER project team has relased a sampler of
the TIGER corpus:

	http://www.ims.uni-stuttgart.de/projekte/TIGER/	
	The TIGER corpus is annotated with 'syntax graps', a generalization of
	syntax trees, in order to be able to account fo phenomena involving
	discontinuous constituents. E.g
	- long distance dependencies are encoded by crossing edges
	- coreference in coordination is represented by 'secondary edges'
	More details of the annotation scheme are available online, where you can
	also explore the TIGER corpus sampler interactively.
	
	---
	The TIGER project team.
	Department of Computational Linguistics, Saarland University
	Institut fuer Germanistik, University of Potsdam
	Department of Natural Language Processing (IMS), University of Stuttgart
	email: tigercorpus at ims.uni-stuttgart.de
	
		
	

---------------------------------------------------------------------------
LINGUIST List: Vol-12-2303



More information about the LINGUIST mailing list