[Corpora-List] Reminder: CFP Spoken corpora

Jean Veronis Jean.Veronis at up.univ-mrs.fr
Wed Feb 18 13:01:03 UTC 2004


                 Journal "Traitement Automatique des Langues"
                                  (T.A.L.)

                         "Spoken corpus processing"

          Special issue edited by Jean Véronis (Université de Provence)

                    Submission deadline : 15 March 2004

                         http://www.atala.org/tal/




CONTEXT

Several hundred million words of written texts are available for research and 
the World Wide Web pushes this limit further everyday. At the same time, very 
few spontaneous spoken language corpora are available, although they are of 
prime importance for linguistic studies and the development of spoken 
language technologies.

A number of spoken corpora have been completed for English (British National 
Corpus, Santa Barbara Corpus of American English, Corpus CANCODE, etc.) and 
some are being developed for other languages (Corpus Gesproken Nederlands, 
Corpus of Spoken Israeli Hebrew, Corpus of Spoken Portuguese, etc.), but the 
transcription and annotation of spoken corpora is a very expensive process. 
In the last decade, many annotation and processing tools have been developed 
for written corpora, but equivalent tools for spoken data are very far from 
being available. Of course, phonetic institutes have developed sophisticated 
tools for the analysis of laboratory speech, but their applicability to 
spontaneous data is not immediate, given the different corpus sizes, the 
different types of analyses, and the phenomena that are specific to 
spontaneous speech (high variability, disfluencies, non canonical syntax, 
etc.).

OBJECTIVES

The present issue of "Traitement Automatique des Langues" (T.A.L.) aims at 
assessing the state of the art on the technologies available for processing 
spoken corpora, and at answering questions such as the adaptability of the 
techniques developed for written corpora, or the degree of reusability of the 
techniques developed for laboratory speech.

Topics of interest (non-exhaustive list):

   - tools for transcription
   - phonetisation
   - segmentation (pauses, speaker turns) and text-to-sound alignment
   - identification of hesitations, repeats, disfluencies and other phenomena
     specific to spontaneous speech
   - prosodic tagging
   - morpho-syntactic tagging
   - chunking and shallow parsing
   - other analysis levels (anaphoras, etc.)
   - search and exploitation tools

EXTERNAL REVIEWERS

     Jean-Yves Antoine (Université de Bretagne Sud)
     Claude Barras (LIMSI-CNRS)
     Frédéric Béchet (Université d'Avignon)
     Edouard Geoffrois (DGA)
     Dafydd Gibbon (Universität Bielefeld)
     Daniel Hirst (LPL-CNRS)
     Amy Isard (University of Edinburgh)
     Joaquim Llisterri (Universitat Autònoma de Barcelona)
     Philippe Martin (Université Paris VII)
     Joakim Nivre (Växjö Universitet)
     Nelleke Oostdijk (Katholieke Universiteit Nijmegen)
     Geoffrey Sampson (University of Sussex)
     Antonio Moreno Sandoval (Universidad Autónoma de Madrid)
     François Yvon (ENST)

FORMAT

Papers (25 pages maximum) may be submitted in Word or LaTeX (in the latter 
case, please provide a PDF file). The publisher's style sheets are available 
at:

http://tal.e-revues.com/appel.jsp

LANGUAGE

Papers may be written either in French or in English (non-French speaking 
authors only).

SCHEDULE

The submission deadline is March 15th, 2004.

Articles will be reviewed by a member of the editorial board of the journal 
and by two external reviewers chosen by the editors of the special issue. 
Editorial board decisions and referees' reports will be transmitted to the 
authors by May 31st, 2004.

Final versions of accepted papers will be required by July 1st, 2004. 
Publication is planned for the end of 2004.

SUBMISSION

Submissions (25 pages maximum, following the publisher's style sheet) should 
be sent electronically to:

   Jean Véronis <Jean.Veronis at up.univ-mrs.fr>



More information about the Corpora mailing list