Corpora: Call for participation: meeting on annotation and software standards

Nancy M. Ide ide at cs.vassar.edu
Thu Feb 10 17:10:00 UTC 2000


                   *** CALL FOR PARTICIPATION ***

            Large Corpus Annotation and Software Standards


           Post-conference session held in conjunction with
                           ANLP/NAACL'00
          Thursday, May 4, 2000, 1-6pm, Seattle, Washington


This meeting is intended to bring together researchers and developers
from a variety of domains in text, speech, video, etc., to look
broadly at the technical issues that bear on the development of
software systems and standards for the annotation and exploitation of
linguistic resources. The goal is to lay the groundwork for the
definition of a data and system architecture to support corpus
annotation and exploitation that can be widely adopted within the
community.

Among the issues to be addressed are:

     o layered data architectures
     o system architectures for distributed databases
     o support for plurality of annotation schemes
     o impact and use of XML/XSL
     o support for multimedia, including speech and video
     o tools for creation, annotation, query and access of corpora
     o mechanisms for linkage of annotation and primary data
     o applicability of semi-structured data models, search and query
       systems, etc.
     o evaluation/validation of systems and annotations

The motivation for this meeting is the American National Corpus (ANC)
effort, which will begin corpus creation within the year. We
anticipate that the ANC will provide a significant resource for
natural language processing, and we therefore seek to identify
state-of-the-art methods for its creation, annotation, and
exploitation. Also, as a national and freely available resource, the
data and system architecture of the ANC is likely to become a de facto
standard. We therefore hope to draw together leading researchers and
developers to establish a basis for the design of a system to support
the creation and use of the ANC.

At present, the format of the meeting is open, and we invite
suggestions for topics, presentations, etc. Those interested should
contact ide at cs.vassar.edu before April 1, 2000.



Organizer:

Nancy Ide
Department of Computer Science
Vassar College
Poughkeepsie, NY 12604-0520 USA
Tel: +1 914 437-5988 Fax: +1 914 437-7498
ide at cs.vassar.edu


NOTE: A Birds-of-a-feather meeting for those interested in the American
National Corpus effort will be held immediately following the discussion.

---------------------------------------------------------------------
A related workshop will be held at the LREC conference on May 29-30,
2000; see http://www.cs.vassar.edu/~ide/anc/lrec.html for information.



More information about the Corpora mailing list