[Corpora-List] Call: Spoken corpus processing
Jean Veronis
Jean.Veronis at up.univ-mrs.fr
Thu Oct 30 06:42:20 UTC 2003
Call for papers
Journal "Traitement Automatique des Langues"
(T.A.L.)
"Spoken corpus processing"
*
* *
Special issue edited by Jean Véronis (Université de Provence)
Submission deadline : 15 March 2004
http://www.atala.org/tal/appel-corpus-oral.html
Context
-------
Several hundred million words of written texts are available for research and
the World Wide Web pushes this limit further everyday. At the same time, very
few spontaneous spoken language corpora are available, although they are of
prime importance for linguistic studies and the development of spoken
language technologies.
A number of spoken corpora have been completed for English (British National
Corpus, Santa Barbara Corpus of American English, Corpus CANCODE, etc.) and
some are being developed for other languages (Corpus Gesproken Nederlands,
Corpus of Spoken Israeli Hebrew, Corpus of Spoken Portuguese, etc.), but the
transcription and annotation of spoken corpora is a very expensive process.
In the last decade, many annotation and processing tools have been developed
for written corpora, but equivalent tools for spoken data are very far from
being available. Of course, phonetic institutes have developed sophisticated
tools for the analysis of laboratory speech, but their applicability to
spontaneous data is not immediate, given the different corpus sizes, the
different types of analyses, and the phenomena that are specific to
spontaneous speech (high variability, disfluencies, non canonical syntax,
etc.).
Objectives
----------
The present issue of "Traitement Automatique des Langues" (T.A.L.) aims at
assessing the state of the art on the technologies available for processing
spoken corpora, and at answering questions such as the adaptability of the
techniques developed for written corpora, or the degree of reusability of the
techniques developed for laboratory speech.
Topics of interest (non-exhaustive list):
- tools for transcription
- phonetisation
- segmentation (pauses, speaker turns) and text-to-sound alignment
- identification of hesitations, repeats, disfluencies and other phenomena
specific to spontaneous speech
- prosodic tagging
- morpho-syntactic tagging
- chunking and shallow parsing
- other analysis levels (anaphoras, etc.)
- search and exploitation tools
External Reviewers
------------------
Jean-Yves Antoine (Université de Bretagne Sud)
Claude Barras (LIMSI-CNRS)
Frédéric Béchet (Université d'Avignon)
Edouard Geoffrois (DGA)
Dafydd Gibbon (Universität Bielefeld)
Daniel Hirst (LPL-CNRS)
Amy Isard (University of Edinburgh)
Joaquim Llisterri (Universitat Autònoma de Barcelona)
Philippe Martin (Université Paris VII)
Joakim Nivre (Växjö Universitet)
Nelleke Oostdijk (Katholieke Universiteit Nijmegen)
Geoffrey Sampson (University of Sussex)
Antonio Moreno Sandoval (Universidad Autónoma de Madrid)
François Yvon (ENST)
Format
------
Papers (25 pages maximum) may be submitted in Word or LaTeX (in the latter
case, please provide a PDF file). The publisher's style sheets are available
at:
http://tal.e-revues.com/appel.jsp
Language
--------
Papers may be written either in French or in English (non-French speaking
authors only).
Schedule
--------
The submission deadline is March 15th, 2004. Authors intending to submit a
paper should contact Jean Véronis <Jean.Veronis at up.univ-mrs.fr>.
Articles will be reviewed by a member of the editorial board of the journal
and by two external reviewers chosen by the editors of the special issue.
Editorial board decisions and referees' reports will be transmitted to the
authors by May 31st,
2004.
Final versions of accepted papers will be required by July 1st, 2004.
Publication is planned for the end of 2004.
Submission
----------
Submissions (25 pages maximum, following the publisher's style sheet) should
be sent electronically to:
Jean Véronis <Jean.Veronis at up.univ-mrs.fr>
More information about the Corpora
mailing list