[Corpora-List] RE : Annotation layers: missing reference

Jason Eisner jason at cs.jhu.edu
Fri Nov 19 13:42:35 UTC 2010


On Fri, Nov 19, 2010 at 7:30 AM, FORT, Karen <Karen.FORT at inist.fr> wrote:

> As for NLP as such, the question seems more difficult to answer, but for
> now, I'd tend to think, like you do, that Bird & Liberman were the first to
> introduce it in the field, don't you think so?
>

It certainly predates the Bird & Liberman reference around 1999/2000 that
Piotr mentions.

Although this was a bit before my time, I believe that the term "standoff
markup" or "standoff annotation" was introduced by the DARPA TIPSTER program
that started in 1991.  This scheme stores the document in one file and
various annotation layers in other files.  These annotations may be produced
manually or automatically.

I am not sure who was responsible or the general idea or the specific
encodings used in TIPSTER --
http://www.fas.org/irp/program/process/tipster.htm suggests that
standardization began in 1994, but the ideas may have been in use in
individual TIPSTER projects before that.

Hamish Cunningham and the folks behind the GATE architecture may know more;
GATE implements the DARPA scheme, I believe.

Thompson & McKelvie (1997) is well-cited for explaining how to encode
standoff markup in SGML or XML:
http://www.ltg.ed.ac.uk/~ht/sgmleu97.html<http://www.ltg.ed.ac.uk/%7Eht/sgmleu97.html>

cheers, jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20101119/da53cdd8/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list