Corpora: Large Corpora and Annotation Standards
Nancy M. Ide
ide at cs.vassar.edu
Thu Mar 30 22:27:47 UTC 2000
Large Corpora and Annotation Standards
http://www.cs.vassar.edu/~ide/ANLP-NAACL2000.html
Held in conjunction with ANLP/NAACL'00
Seattle, Washington
4 May 2000 1-6pm
This meeting is intended to bring together researchers and
developers from a variety of domains in text, speech,
video, etc., to look broadly at the technical issues that
bear on the development of software systems and standards
for the annotation and exploitation of linguistic
resources. The goal is to lay the groundwork for the
definition of a data and system architecture to support
corpus annotation and exploitation that can be widely
adopted within the community.
Among the issues to be addressed are:
- layered data architectures
- system architectures for distributed databases
- support for plurality of annotation schemes
- impact and use of XML/XSL
- support for multimedia, including speech and video
- tools for creation, annotation, query and access of
corpora
- mechanisms for linkage of annotation and primary
data
- applicability of semi-structured data models, search
and query systems, etc.
- evaluation/validation of systems and annotations
The motivation for this meeting is the American National
Corpus (ANC) effort, which should begin corpus creation
within the year. We anticipate that the ANC will provide a
significant resource for natural language processing, and
we therefore seek to identify state-of-the-art methods for
its creation, annotation, and exploitation. Also, as a
national and freely available resource, the data and system
architecture of the ANC is likely to become a de facto
standard. We therefore hope to draw together leading
researchers and developers to establish a basis for the
design of a system to support the creation and use of the
ANC.
Provisional Program
Overview of the American National Corpus Effort
Nancy Ide and Catherine Macleod
Searching Linguistically Annotated Corpora
Chris Brew
Considerations for Large Corpus Annotation:
Intercoder Reliability
Rebecca Bruce and Janyce Wiebe
The XML Framework and Its Implications for Large
Corpus Access
Nancy Ide
The ATLAS System
John Henderson
Annotation Standards and Their Impact on Large
Corpus Development
Nicoletta Calzolari
A Framework for Multi-level Linguistic Annotation
Patrice Lopez and Laurent Romary
Discussion : Requirements for the ANC
A related workshop will be held at the LREC conference on
May 29-30, 2000. Se http://www.cs.vassar.edu/~ide/anc/lrec.html.
Organizer:
Nancy Ide
Professor and Chair
Department of Computer Science
Vassar College
Poughkeepsie, NY 12604-0520 USA
Tel: +1 914 437-5988 Fax: +1 914 437-7498
ide at cs.vassar.edu
More information about the Corpora
mailing list