[Corpora-List] Treebank annotation tools
Joakim Nivre
nivre at msi.vxu.se
Thu Aug 23 15:37:21 UTC 2007
Dear list members,
I am interested in tools for treebank annotation that specifically support
the process of manually correcting automatically parsed sentences. Here
are some of the requirements that I would like such a tool to meet:
* Search: The tool should enable annotators to search for (complex)
patterns involving both the primary linguistic data and various levels
of annotation. (Examples: NPs within NPs, NPs headed by a proper name,
NPs headed by "picture", NPs as subjects, verbs with two subjects.)
* Display: The tool should be able to display annotated sentences
graphically, in particular as the result of a search query.
* Editing: The tool should enable annotators to edit the annotation in a
"flexible and efficient manner", preferably by direct manipulation of
graphically displayed annotation.
* Validation: The tool should validate that the edited annotation conforms
to the formal specifications of the annotation scheme. Minimally, this
should imply that only valid annotation categories ("tags") are used,
but it is desirable that also more global and/or structural constraints
can be expressed and validated. (Examples: Every word must have a
part-of-speech tag, every phrase must have a head, every dependency
graph must have a unique root or must be projective.)
* Documentation: The tool should support documentation of the annotation
process, such as time stamping of edits, information about what parts of
an annotation has been checked and validated, statistics on editing
operations, etc.
* Standards: The tool should support the use of (well-documented)
standards for corpus annotation (TEI, (X)CES, LAF, ...) or allow the
user to define such standards in, e.g., XML.
* Interfaces: The tool should interface flexibly with other tools involved
in the treebank development process, in particular taggers and parsers
used for automatic annotation.
* Specificity: The tool should have tailor-made support for treebank
annotation, possibly at the expense of not supporting linguistic
annotation of arbitrary complexity.
I am acquainted with a certain number of tools that satisfy many of these
requirements (TrEd, DTAG, LFG Parsebanker, ...), as well as some tools
that take care of a subset of the functions (such as TigerSearch for
Search and Display, or Annotate for Editing), but I would like to know
more about the tools that people are using in various treebank projects
and the extent to which they satisfy some version of these requirements.
I would also be interested to hear about other requirements that people
want their tools to meet.
I will be happy to post a summary to the list if people send their replies
to me off-line.
Best,
Joakim Nivre
==================================================================
Joakim Nivre
Växjö University Uppsala University
School of Mathematics Department of Linguistics
and Systems Engineering and Philology
SE-35195 Växjö Box 635, SE-75126 Uppsala
Tel: +46 470 708992 Tel: +46 18 4717009
Fax: +46 470 84004 Fax: +46 18 4711094
E-mail: nivre at msi.vxu.se E-mail: joakim.nivre at lingfil.uu.se
URL: http://www.msi.vxu.se/users/nivre
==================================================================
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list