[Corpora-List] Treebank annotation tools

Joakim Nivre nivre at msi.vxu.se
Wed Aug 29 15:54:51 UTC 2007


Hi Yannick,

Thanks for reminding me of this tool and also for taking the time to go 
through the list of requirements.

Best,
Joakim

On Wed, 29 Aug 2007, Yannick Versley wrote:

> Hello Joakim,
> 
> I'm not sure if the XCDG tool created in Wolfgang Menzel's group in Hamburg 
> would fit your bill since it has always been a bit cumbersome to install, and 
> is tied to a specific formalism for implementing dependency grammars (WCDG).
> But it has some unique features that I feel are worth mentioning, especially 
> the really nice integration into the grammar.
> 
> > * Search: The tool should enable annotators to search for (complex) 
> >   patterns involving both the primary linguistic data and various levels 
> >   of annotation. (Examples: NPs within NPs, NPs headed by a proper name, 
> >   NPs headed by "picture", NPs as subjects, verbs with two subjects.)
> As far as I know, Xcdg does not support search directly, but there was a 
> separate search engine that would use CDG formulae (i.e., specification over 
> edges and lexical entries, with existential operators thrown in) and that 
> could be used to feed the results into an Xcdg display somehow.
> > * Display: The tool should be able to display annotated sentences 
> >   graphically, in particular as the result of a search query.
> See the screenshot at
> http://nats-www.informatik.uni-hamburg.de/view/CDG/ScreenShots
> > * Editing: The tool should enable annotators to edit the annotation in a 
> >   "flexible and efficient manner", preferably by direct manipulation of 
> >   graphically displayed annotation.
> You can change labels, drag edges around and select appropriate lexical 
> entries. Most convenient is the function of right-clicking on an edge or a 
> label and the tool changes the parent or the label to what it thinks should 
> be the right one (according to the grammar).
> > * Validation: The tool should validate that the edited annotation conforms 
> >   to the formal specifications of the annotation scheme. Minimally, this 
> >   should imply that only valid annotation categories ("tags") are used, 
> >   but it is desirable that also more global and/or structural constraints 
> >   can be expressed and validated. (Examples: Every word must have a 
> >   part-of-speech tag, every phrase must have a head, every dependency 
> >   graph must have a unique root or must be projective.)
> Since the annotation tool is tightly integrated with the grammar, all 
> invariants encoded in the grammar are also checked by the editor (i.e., 
> projectivity in the cases where it is required, acyclicity, verb arguments).
> It is possible to create structures disfavored or disallowed by the grammar, 
> unlike in LFG annotation environments, which is very useful when you run onto 
> a construct that isn't covered by your grammar.
> > * Documentation: The tool should support documentation of the annotation 
> >   process, such as time stamping of edits, information about what parts of 
> >   an annotation has been checked and validated, statistics on editing 
> >   operations, etc. 
> currently nonexisting, I think.
> > * Standards: The tool should support the use of (well-documented) 
> >   standards for corpus annotation (TEI, (X)CES, LAF, ...) or allow the 
> >   user to define such standards in, e.g., XML.
> There is a proprietary (but simple and well-documented) annotation format, 
> which also has some XML variant of it. It's still pretty much nonstandard.
> > * Interfaces: The tool should interface flexibly with other tools involved 
> >   in the treebank development process, in particular taggers and parsers 
> >   used for automatic annotation. 
> The whole point of Xcdg is the integration of the parser - you have a good 
> integration of the parser and any components it uses, including PP attacher, 
> POS tagger; you can view the violated grammar constraints and thus gain 
> insights into the reasons why the parser would prefer one parse or the other.
> You can also parse sentences inside Xcdg, although this is somewhat pointless.
> > * Specificity: The tool should have tailor-made support for treebank 
> >   annotation, possibly at the expense of not supporting linguistic 
> >   annotation of arbitrary complexity.
> Xcdg is not useful for anything beyond treebank development. The screenshots 
> show some screens for editing hierarchies, but as far as I know, this is not 
> really used by anybody.
> In terms of treebank development, Xcdg has been used to annotate a large 
> German dependency treebank, consisting of multiple genres (I don't have exact 
> figures around, but I think the whole thing has more sentences than Tiger...)
> 
> There is an Xcdg manual available on the page
> http://nats-www.informatik.uni-hamburg.de/view/CDG/CdgManuals
> and there is useful support through a mailing list (I think it was the
> cdg at nats.informatik.uni-hamburg.de one).
> 
> Best,
> Yannick
> -- 
> Yannick Versley
> Seminar für Sprachwissenschaft, Abt. Computerlinguistik
> Wilhelmstr. 19, 72074 Tübingen
> Tel.: (07071) 29 77352
> 
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
> 

==================================================================
Joakim Nivre

Växjö University		Uppsala University
School of Mathematics		Department of Linguistics
and Systems Engineering		and Philology
SE-35195 Växjö			Box 635, SE-75126 Uppsala

Tel: +46 470 708992		Tel: +46 18 4717009
Fax: +46 470 84004		Fax: +46 18 4711094
E-mail: nivre at msi.vxu.se	E-mail: joakim.nivre at lingfil.uu.se

URL: http://www.msi.vxu.se/users/nivre
==================================================================
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list