formal analysis of annotation systems

Brian MacWhinney macw at cmu.edu
Wed Mar 3 17:56:33 UTC 1999


Dear Info-CHILDES,

  Steven Bird and Mark Liberman have examined at least 12 transcription
and annotation systems, including CHAT, and constructed a formal
representation that captures relations in all 12 systems.  Some of
these systems go down to nitty-gritty levels of autosegmental
representation.  Others focus more on the word level.  The result of
their formal analysis is that all these systems can be represented as
"annotation graphs" which have the shape of DAGs (directed acyclic
graphs).  Along the way, they make some important observations about
linkage of codes to time and hierarchical relations between codes and
words.  Here is the information about the article and its address on
the web.

--Brian MacWhinney



A Formal Framework for Linguistic Annotation
Steven Bird & Mark Liberman

Abstract

`Linguistic annotation' covers any descriptive or analytic notations
applied to raw language data. The basic data may be in the form of
time functions - audio, video and/or physiological recordings - or it
may be textual. The added notations may include transcriptions of all
sorts (from phonetic features to discourse structures), part-of-speech
and sense tagging, syntactic analysis, `named entity' identification,
co-reference annotation, and so on. While there are several ongoing
efforts to provide formats and tools for such annotations and to
publish annotated linguistic databases, the lack of widely accepted
standards is becoming a critical problem. Proposed standards, to the
extent they exist, have focussed on file formats. This paper focuses
instead on the logical structure of linguistic annotations. We survey
a wide variety of existing annotation formats and demonstrate a common
conceptual core, the annotation graph. This provides a formal
framework for constructing, maintaining and searching linguistic
annotations, while remaining consistent with many alternative data
structures and file formats.

49pp, download from: [http://xxx.lanl.gov/abs/cs.CL/9903003]
Formats: PDF (336kb), Postscript (161kb), DVI (134kb), LaTeX (112kb)

For an online survey and extensive links, visit the
Linguistic Annotations Page: [http://www.ldc.upenn.edu/annotation]

@TechReport{BirdLiberman99,
  author={Steven Bird and Mark Liberman},
  title={A Formal Framework for Linguistic Annotation},
  institution={Department of Computer and Information Science,
    University of Pennsylvania},
  year=1999,
  number={MS-CIS-99-01},
  note={[xxx.lanl.gov/abs/cs.CL/9903003]}
}

Please send comments to: sb at ldc.upenn.edu, myl at ldc.upenn.edu

Regards,
Steven Bird & Mark Liberman

--
Steven.Bird at ldc.upenn.edu  http://www.ldc.upenn.edu/sb
Assoc Director, LDC; Adj Assoc Prof, CIS & Linguistics
Linguistic Data Consortium, University of Pennsylvania
3615 Market St, Suite 200, Philadelphia, PA 19104-2608



More information about the Info-childes mailing list