[Corpora-List] Treebank annotation tools

Joakim Nivre nivre at msi.vxu.se
Thu Aug 23 15:37:21 UTC 2007


Dear list members,

I am interested in tools for treebank annotation that specifically support 
the process of manually correcting automatically parsed sentences. Here 
are some of the requirements that I would like such a tool to meet:

* Search: The tool should enable annotators to search for (complex) 
  patterns involving both the primary linguistic data and various levels 
  of annotation. (Examples: NPs within NPs, NPs headed by a proper name, 
  NPs headed by "picture", NPs as subjects, verbs with two subjects.)

* Display: The tool should be able to display annotated sentences 
  graphically, in particular as the result of a search query.

* Editing: The tool should enable annotators to edit the annotation in a 
  "flexible and efficient manner", preferably by direct manipulation of 
  graphically displayed annotation.

* Validation: The tool should validate that the edited annotation conforms 
  to the formal specifications of the annotation scheme. Minimally, this 
  should imply that only valid annotation categories ("tags") are used, 
  but it is desirable that also more global and/or structural constraints 
  can be expressed and validated. (Examples: Every word must have a 
  part-of-speech tag, every phrase must have a head, every dependency 
  graph must have a unique root or must be projective.)

* Documentation: The tool should support documentation of the annotation 
  process, such as time stamping of edits, information about what parts of 
  an annotation has been checked and validated, statistics on editing 
  operations, etc. 

* Standards: The tool should support the use of (well-documented) 
  standards for corpus annotation (TEI, (X)CES, LAF, ...) or allow the 
  user to define such standards in, e.g., XML.

* Interfaces: The tool should interface flexibly with other tools involved 
  in the treebank development process, in particular taggers and parsers 
  used for automatic annotation. 

* Specificity: The tool should have tailor-made support for treebank 
  annotation, possibly at the expense of not supporting linguistic 
  annotation of arbitrary complexity.

I am acquainted with a certain number of tools that satisfy many of these 
requirements (TrEd, DTAG, LFG Parsebanker, ...), as well as some tools 
that take care of a subset of the functions (such as TigerSearch for 
Search and Display, or Annotate for Editing), but I would like to know 
more about the tools that people are using in various treebank projects 
and the extent to which they satisfy some version of these requirements. 
I would also be interested to hear about other requirements that people 
want their tools to meet.

I will be happy to post a summary to the list if people send their replies 
to me off-line. 

Best,
Joakim Nivre

==================================================================
Joakim Nivre

Växjö University		Uppsala University
School of Mathematics		Department of Linguistics
and Systems Engineering		and Philology
SE-35195 Växjö			Box 635, SE-75126 Uppsala

Tel: +46 470 708992		Tel: +46 18 4717009
Fax: +46 470 84004		Fax: +46 18 4711094
E-mail: nivre at msi.vxu.se	E-mail: joakim.nivre at lingfil.uu.se

URL: http://www.msi.vxu.se/users/nivre
==================================================================
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list