XML at OSU

Thu Apr 20 23:32:33 UTC 2000

----------------------------Original message----------------------------
Announcing a 5-day workshop associated with  "Spoken Language in Context:
Methods and Models"  July 3-7, 2000 (see http://ling.ohio-state.edu/SU2000
for further information)

XML and Linguistic Annotation

Chris Brew
Department of Linguistics
Ohio State University

Corpora of spoken and written language are crucial to much of linguistics,
providing both quantitive and qualitative data which informs and grounds
our work. Much  of the material which is available is raw text, but this is
complemented by a substantial and increasing number of annotated corpora.
It is important to ensure that such annotated corpora are reliable,
re-usable and maximally informative, but it is not immediately obvious how
this is to be achieved, not least because the corpus data often stimulates
research which was not envisaged at the time that the data was collected.

XML(the eXtensible Markup Language) provides a standardized vehicle for the
generation, processing and  exchange of arbitrary structured data,
including, but not limited to, texts marked up with linguistic information.
Many, but no means all, corpus creation initiatives have chosen to adopt
the XML route. This means that researchers who want to use (and  perhaps
add to) the products of these efforts need to understand something of what
XML is and how it can be used. Non-linguistic applications of XML will be
covered only tangentially.

This workshop introduces XML as a means for creating and using linguistic
annotations,  gives hands-on experience of both corpus annotation and
corpus use, and  discusses its strengths and weaknesses as a research tool.
There will be five 105 minute sessions, one per day, spread over a week,
along with practical sessions covering the use of text and speech data.
Students should expect to spend approximately 60  minutes per day on the
practicals. The only prerequisite is a very basic training in any of the
language sciences. It should therefore be accessible to all participants in
"Spoken
Language in Context: Methods and Models".