[Corpora-List] Final call: Treebanks for spoken language, NoDaLiDa05
Janne Bondi Johannessen
j.b.johannessen at ilf.uio.no
Mon Feb 7 15:04:01 UTC 2005
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
FINAL CALL FOR PAPERS
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
NODALIDA 2005: http://phon.joensuu.fi/nodalida2005/
SPECIAL SESSION ON TREEBANKS:
http://www.hf.uio.no/tekstlab/treebank_workshop
SPECIAL SESSION ON TREEBANKS FOR
SPOKEN LANGUAGE AND DISCOURSE
JOENSUU, FINLAND, THURSDAY MAY 19, 2005
ORGANIZED BY THE NORDIC TREEBANK NETWORK
Treebanks are a language resource that provides annotations of
natural languages at various levels: at the morpheme level, the
word level, the phrase level, the discourse level, and the level
of functor-argument structure. Treebanks have become crucially
important for the development of data-driven approaches to natural
language processing, human language technologies, grammar
extraction and linguistic research in general.
Existing spoken language treebanks include the Switchboard section
of the Penn Treebank [1], and the CHRISTINE [2] and ICE-GB [3]
treebanks for English; the VERBMOBIL [4] treebanks for English,
German, and Japanese; and the CGN [5] treebank for Dutch. Existing
discourse treebanks include the English RST Corpus [6] and the
Penn Discourse Treebank [7]. The DAMSL project [8] and the
Gothenburg Dialogue Coding Schemas [9] address the problem of
annotating dialogues with speech act relations between utterances.
The special NODALIDA session on treebanks aims to provide a forum
where researchers and advanced students with an interest in
treebanks can exchange ideas, in particular on how to extend
treebanks from syntactic annotations of written language to
treebanks that also include annotations of the structure of spoken
language with respect to syntax, discourse structure, and/or
speech acts.
TOPICS OF INTEREST
Invited speaker: Bonnie Webber, University of Edinburgh.
We invite submission of papers on topics relevant to treebanks
in general, and spoken language and discourse treebanks in
particular, including but not
limited to:
* design principles and annotation schemes for annotating
spoken language and discourse treebanks with respect
to syntax, discourse
structure, and/or speech acts;
* automatic tools for creating spoken language and
discourse treebanks, and how
to adapt tools
designed for creating written language
treebanks to spoken language and discourse;
* comparing spoken language and discourse annotations
with written language
annotations, and
identifying the most important challenges in spoken
language and discourse
annotation;
While we particularly encourage submissions on spoken language
and discourse treebanks, we also encourage submissions on other
treebank topics.
SUBMISSIONS
We invite extended abstracts (approximately 1500 words) describing
existing research connected to the topics of the special session.
Submissions are non-anonymous and should include: title;
author(s); affiliation(s); and contact author's e-mail address,
postal address, telephone and fax numbers.
Abstracts should be sent to: mtk at id.cbs.dk
The presentation at the workshop will be 30 minutes long (20
minutes for presentation and 10 minutes for questions and
discussion). The final version of the accepted papers may not
exceed 12 A4 pages.
A SAMPLE SPOKEN LANGUAGE AND DISCOURSE TREEBANK
We strongly encourage the participants as well as the speakers of
the special session on spoken language and discourse
treebanks to contribute with a small sample treebank which should
preferably:
* be based on a small corpus of spontaneous spoken dialogue
consisting of 500-1500 words in any language;
* contain English glosses to ensure that the treebank is
accessible to a wider audience;
* include annotations of discourse relations, speech acts, or
similar relations that connect sentences and utterances made
by different speakers into larger units;
* contain annotated examples of overlapping dialogue,
including utterances where one speaker completes an
utterance started by another speaker.
The sample treebank should be submitted by sending the following
three files to mtk at id.cbs.dk before 20th February 2005:
* a plain text abstract of 50-200 words that briefly
describes
how the sample treebank was created, possibly with
hyperlinks to more detailed information about the treebank;
* a PDF file containing a human-readable visualization of the
treebank;
* optionally, the source files for the sample treebank,
preferably encoded in TIGER-XML format.
The sample treebanks will be made publicly available before the
NODALIDA conference.
IMPORTANT DATES
Deadline for submission of
abstracts and treebank samples to the treebank
session
February 20, 2005
Notification of acceptance
March 25, 2005
Special session on treebanks
Thursday, May 19, 2005
Final version of paper for proceedings
June 20, 2005
PROCEEDINGS
Papers presented at the workshop will be
invited to appear in the workshop proceedings
(after a reviewing process).
PROGRAM COMMITTEE
Matthias Trautner Kromann (mtk at id.cbs.dk)
Peter Juel Henrichsen (pjuel at id.cbs.dk)
Janne Bondi Johannessen (jannebj at ilf.uio.no)
IMPORTANT WEBSITES:
SPECIAL TREEBANK SESSION: http://www.hf.uio.no/tekstlab/treebank_workshop
NORDIC TREEBANK NETWORK: http://w3.msi.vxu.se/~nivre/research/nt.html
NODALIDA: http://phon.joensuu.fi/nodalida2005/
LINKS
[1] http://www.cis.upenn.edu/~treebank/home.html
[2] http://www.grsampson.net/RChristine.html
[3] http://www.ucl.ac.uk/english-usage/ice-gb
[4] http://verbmobil.dfki.de/cgi-bin/verbmobil/htbin/doc-access.cgi
[5]
http://lands.let.kun.nl/cgn/doc_English/topics/version_1.0/annot/syntax/info.htm
[6] http://www.isi.edu/~marcu/discourse
[7] http://www.cis.upenn.edu/~pdtb
[8] http://www.cs.rochester.edu/research/cisd/resources/damsl/
[9] http://www.ling.gu.se/~jens/publications/docs076-100/093.pdf
More information about the Corpora
mailing list