[Corpora-List] Annotation layers: missing reference

Yannick Versley yversley at gmail.com
Fri Nov 19 12:48:21 UTC 2010


When you mention ATLAS, it's probably woth mentioning that this model of
having stand-off annotations of various kinds was already common in various
systems for information extraction, such as FASTUS and GATE.
(NLP, specifically for information extraction, has needed multiple layers
of annotation pretty early since it was the solution to merging the output
from different NLP components).
NITE and ATLAS are an attempt at making the complexity of
having multiple annotation layers more manageable (or palatable)
than the approach taken for e.g. MUC (where you have both NER and
coreference info annotated on the same base data, but distributed
as joint SGML).
Other people probably know more about this, but the Alembic Workbench
(from 1997, no less) allows you to create standoff annotation separately
from the document itself (and its markup):
http://www.timeml.org/site/terqas/alembic/AWB-ptf-format.html

Best,
Yannick

On Fri, Nov 19, 2010 at 12:56 PM, Piotr Bański <bansp at o2.pl> wrote:

> Don't forget about ATLAS (Steven Bird, Mark Liberman; [1], look around
> 1999/2000). It is also necessary to mention the work by Nancy Ide since
> the Corpus Encoding Standard[2], then, in the context of ISO TC 37 SC 4,
> mostly with Laurent Romary and Keith Suderman [3]. Some attempt at
> putting order into these notions has been made in Goecke et al., 2010 [4].
>
> Good luck,
>
>  Piotr
>
> [1]:
>
> http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/l/Liberman:Mark.html
> [2]: http://www.cs.vassar.edu/CES/
> [3]: http://www.cs.vassar.edu/~ide/pubs.html
> [4]: Goecke, D., Metzing, D., Lüngen, H., Stührenberg, M., Witt, A.
> (2010). Different views on markup. distinguishing levels and layers. In
> Linguistic modeling of information and markup languages. Contributions
> to language technology. Springer Netherlands, pp. 1–21.
>
> On 2010-11-18 14:06, Philippe Blache wrote:
> > Hi Karen,
> > I don't know whether the idea comes from there, but it belongs to the NXT
> data model:
> >
> > J. Carletta, S. Evert, U. Heid, J. Kilgour, J. Robertson, H. Voormann
> > "The NITE XML Toolkit: Flexible annotation for multimodal language data"
> > Behavior Research Methods, Instruments, & Computers 2003, 35 (3), 353-363
> >
> >
> > Philippe
> >
> >
> > Le 18 nov. 2010 à 11:04, Karen Fort a écrit :
> >
> >> Dear members,
> >>
> >> I'm looking for a reference on annotations layers.
> >> Can somebody tell me when the "idea" of annotation layers appeared?
> >>
> >> I suppose it came from the speech community, but I cannot find a clear
> reference on that.
> >>
> >> Thank you for your help!
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20101119/83f456ac/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list