[Corpora-List] Annotation Tool for German corpora/NE recognition task

Christopher Walker chwalker at ldc.upenn.edu
Tue Oct 17 17:24:31 UTC 2006


Hi,

| > What I am really missing is:
| > - a good tool to annotate some documents quickly, i.e. with 
| > information about : toponym, first and surname, and other 
| > NE?s. This, to get an idea
| > (prec.+recall) about the quality of my models.

The LDC ACE toolkit may also satisfy your needs:

  http://projects.ldc.upenn.edu/ace/tools/2005Toolkit.html

It is highly customized to ACE annotation, but can be modified
via an XML config file to suit a number of similar needs --
including NE annotation, with co-reference.   We find this tool
to be easier to use (at scale for raw, untagged data) than either 
Callisto or Wordfreak, but I am not familiar with gate.

Also, the output is .ag.xml format.  But the package includes
conversion scripts to the latest ACE Pilot Format (.apf.xml).
These would need to be modified to the new tagset, but would
work nonetheless.  If you're interested in the infrastructure,
I have a few perl script that generate a tabular output as 
well.

-Christopher.

---------------------------------------
Christopher R. Walker, Project Manager
Automatic Content Extraction (ACE) &
Less-Commonly Taught Languages (LCTL)
LDC Annotation Lab
chwalker at ldc.upenn.edu
215.898.0946
---------------------------------------



More information about the Corpora mailing list