Corpora: German treebank sampler and Portuguese treebank

Santos Diana Diana.Santos at informatics.sintef.no
Thu Sep 13 11:32:08 UTC 2001


Dear TIGER team,

Thank you for the wealth of information you put on-line in connection with
the TIGER project. 

In this connection, we would also like to inform you that there is ongoing a
project for the creation of a treebank for Portuguese, the Floresta
Sintá(c)tica project,
http://cgi.portugues.mct.pt/treebank/PaginaFloresta.html (a joint project
having as partners VISL http://visl.sdu.dk/ and the Computational Processing
of Portuguese project http://www.portugues.mct.pt/).

>From our pages, we also make available a sampler and documentation on
general and specific linguistic options taken. We have also developed a
special querying tool for the syntactically annotated trees, on top of the
IMS Corpus Workbench developed by the IMS at the University of Stuttgart.
This is work in progress, but can be tested at
http://cgi.portugues.mct.pt/treebank/ProcuraArvores.html

I should also recall that the Computational Processing of Portuguese also
gives access to several millions of (automatically) syntactically annotated
words through the AC/DC project (again in joint work with VISL), at
http://cgi.portugues.mct.pt/acesso.
 
We would therefore be grateful if you updated your "related links" page
accordingly. 

I use this opportunity to also inform the corpora community at large,
although most of the Web pages referred to are (so far) only in Portuguese.

Best greetings,
Diana (for the Floresta and AC/DC teams)
************************************************************************
Diana Santos			Computational processing of Portuguese

SINTEF Telecom & Informatics	Tel. (direct line) +47 22 06 73 12
Forskningsveien 1			Tel. +47 22 06 73 00
Box 124 Blindern			Fax. +47 22 06 73 50
N-0314 Oslo				Email:
Diana.Santos at informatics.sintef.no
Norway				http://www.portugues.mct.pt/
************************************************************************


> -----Original Message-----
> From: TIGER corpus team [mailto:tigercorpus at ims.uni-stuttgart.de]
> Sent: 12. september 2001 10:46
> To: corpora at hd.uib.no
> Subject: Corpora: German treebank sampler
> 
> 
> 
> The TIGER German treebank sampler has been released!
> ----------------------------------------------------
> 
> A large syntactically annotated corpus of German newspaper text
> is under construction in the TIGER project - with project partners
> in Saarbruecken, Potsdam, and Stuttgart.
> 
> In order to get feedback from the research community, the 
> TIGER project team
> has released a sampler of the TIGER corpus:
> 
> http://www.ims.uni-stuttgart.de/projekte/TIGER/
> 
> The TIGER corpus is annotated with 'syntax graphs', a 
> generalization of
> syntax trees, in order to be able to account for phenomena involving
> discontinuous constituents. E.g.
> - long distance dependencies are encoded by crossing edges
> - coreference in coordination is represented by 'secondary edges'
> More details of the annotation scheme are available online, 
> where you can
> also explore the TIGER corpus sampler interactively.
> 
> ---
> The TIGER project team.
> Department of Computational Linguistics, Saarland University
> Institut fuer Germanistik, University of Potsdam
> Department of Natural Language Processing (IMS), University 
> of Stuttgart
> email: tigercorpus at ims.uni-stuttgart.de
> 
> 
> 
> 
> 



More information about the Corpora mailing list