Appel: International Workshop on Spoken Language Translation (IWSLT 2006)
Thierry Hamon
thierry.hamon at LIPN.UNIV-PARIS13.FR
Fri Jun 23 20:24:34 UTC 2006
Date: Fri, 23 Jun 2006 16:57:41 +0200
From: ELDA <info at elda.org>
Message-ID: <449C0165.4030801 at elda.org>
X-url: http://www.slc.atr.jp/IWSLT2006
X-url: http://penance.is.cs.cmu.edu/iwslt2005
X-url: http://www.slc.atr.jp/IWSLT2004
X-url: http://www.c-star.org/
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
International Workshop on Spoken Language Translation (IWSLT 2006)
-- Evaluation Campaign on Spoken Language Translation --
Second Call for Participants / Papers
November 27-28, 2006
Kyoto, Japan
http://www.slc.atr.jp/IWSLT2006
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Spoken language translation technologies attempt to cross the language
barriers between people having different native languages who each
want to engage in conversation by using their mother-tongue. Spoken
language translation has to deal with problems of automatic speech
recognition (ASR) and machine translation (MT).
One of the prominent research activities in spoken language
translation is the work being conducted by the Consortium for Speech
Translation Advanced Research (C-STAR III), which is an international
partnership of research laboratories engaged in automatic translation
of spoken language. Current members include ATR (Japan), CAS (China),
CLIPS (France), CMU (USA), ETRI (Korea), ITC-irst (Italy), and UKA
(Germany).
A multilingual speech corpus comprised of tourism-related sentences
(BTEC*) has been created by the C-STAR members and parts of this
corpus were already used for previous IWSLT workshops focusing on the
evaluation of MT results based on text input
(http://www.slc.atr.jp/IWSLT2004) and the translation of ASR output
(word lattices, N-best lists) using read speech as input
(http://penance.is.cs.cmu.edu/iwslt2005). The full BTEC* corpus
consists of 160K of sentence-aligned text data and parts of the corpus
will be provided to all evaluation campaign participants for training
purposes.
In this workshop, we focus on the translation of spontaneous speech
which includes ill-formed utterances due to grammatical incorrectness,
incomplete sentences, and redundant expressions. The impact of
spontaneity aspects on the ASR and MT systems performance as well as
the robustness of state-of-the-art MT engines towards speech
recognition errors will be investigated in detail.
Two types of submissions are invited:
1) participants in the evaluation campaign of spoken language
translation technologies. Each participant in the evaluation
campaign is requested to submit a paper describing the utilized
ASR and MT systems and to report results using the provided test
data.
2) technical papers on related issues.
An overview of the evaluation campaign is as follows:
=== Evaluation Campaign
Theme:
* Spontaneous speech translation
Translation Directions:
* Arabic/Chinese/Italian/Japanese into English (AE, CE, IE, JE)
Input Conditions:
* Speech (audio)
* ASR Output (word lattice or N-best list)
* Cleaned Transcripts (text)
Supplied Resources:
* training corpus:
o AE, IE:
+ 20,000 sentence pairs of BTEC*
+ three develop sets (3x500 sentence pairs, 16
multiple references)
o CE, JE:
+ 40,000 sentence pairs of BTEC*
+ three develop sets (3x500 sentence pairs, 16
multiple
references)
* develop corpus:
o speech data, word lattices, N-best lists of 500 input
sentences with 7 reference translations for each
translation direction and input condition
* test corpus:
o speech data, word lattices, N-best lists of 500 input
sentences for each translation direction and input
condition
=> word segmentations will be provided according to the output of
the provided ASR engines
Data Tracks:
The past IWSLT workshop results showed that the amount of BTEC*
sentence pairs used for training largely effects the performance
of the MT systems on the given task. However, only CSTAR partners
have access to the full BTEC* corpus. In order to allow a fair
comparison between the systems, we decided to distinguish the
following two data tracks:
* Open Data Track ("open" for everyone :->)
o no restrictions on training data of ASR engines
o any resources, besides the full BTEC* corpus and
proprietary data, can be used as the training data of MT
engines. Concerning the BTEC* corpus and proprietary
data, only the Supplied Resources (see above) are allowed
to be used for training purposes.
* C-STAR Data Track
o no restrictions on training data of ASR engines
o any resources (including the full BTEC* corpus and
proprietary data) can be used as the training data of MT
engines.
Evaluation Specification:
* ASR output
o (automatic) WER
* MT output
o (automatic) BLEU(*), NIST, METEOR
o (subjective) fluency(*), adequacy(*)
-> systems will be ranked according to the metrics marked '(*)'
-> human assessment will be carried out for the top-10 systems
(according to the BLEU metric) of the Chinese-to-English
Open Data Track (ASR Output condition).
=== Technical Paper:
The workshop also invites technical papers related to spoken language
translation.
Possible topics include, but are not limited to:
* Spontaneous speech translation
* Domain and language portability
* MT using comparable and non-parallel corpora
* Phrase alignment algorithms
* MT decoding algorithms
* MT evaluation measures
=== Important Dates
+ Evaluation Campaign
April 7, 2006 -- System Registration Open
May 12, 2006 -- Training Corpus Release
June 30, 2006 -- Develop Corpus Release
August 7, 2006 -- Test Corpus Release [00:01 JST]
August 9, 2006 -- Result Submission Due [23:59 JST]
September 15, 2006 -- Result Feedback to Participants 2006
September 29, 2006 -- Paper Submission Due
October 14, 2006 -- Notification of Acceptance
October 27, 2006 -- Camera-ready Submission Due
- system registrations will be accepted until release of
test corpus
- late result submissions will be treated as unofficial
result submissions
+ Technical Papers
September 15, 2006 -- Paper Submission Due [23:59 JST]
October 17, 2006 -- Notification of Acceptance
October 27, 2006 -- Camera-ready Submission Due
=== Application / Submission Guidelines / Updated Information
+ available at http://www.slc.atr.jp/IWSLT2006
=== Organizers
+ Satoshi Nakamura (ATR, Japan; Chair)
+ Herve Blanchon (CLIPS, France)
+ Gianni Lazzari (ITC-irst, Italy)
+ Youngjik Lee (ETRI, Korea)
+ Alex Waibel (CMU, USA / UKA, Germany)
+ Bo Xu (CAS, China)
=== Program Committee
+ Michael Paul (ATR, Japan; Evaluation Campaign Chair)
+ Marcello Federico (ITC-irst, Italy; Technical Paper Chair)
+ Nicola Bertoldi (ITC-irst, Italy)
+ Christian Boitet (CLIPS, France)
+ Genichiro Kikui (NTT, Japan)
+ Kevin Knight (ISI, USA)
+ Phillip Koehn (Univ. of Edinburgh, UK)
+ Sadao Kurohashi (Univ. of Tokyo, Japan)
+ Young-Suk Lee (IBM, USA)
+ Jose B. Marino (UPC, Spain)
+ Arul Menezes (Microsoft, USA)
+ Masaaki Nagata (NTT, Japan)
+ Hermann Ney (RWTH, Germany)
+ Seung-Shin Oh (ETRI, Korea)
+ Wade Shen (MIT, USA)
+ Stephan Vogel (CMU, USA)
+ Andy Way (Dublin City University, Ireland)
+ Chengqing Zong (CAS, China)
=== Local Arrangements
+ Genichiro Kikui (NTT, Japan)
=== Conference Venue
+ Paruru Plaza Kyoto (right in front of Kyoto Station)
=== Supporting Organizations
+ Advanced Telecommunication Research Institute International (ATR)
+ Association for Computational Linguistics (ACL)
+ Center for the Evaluation of Language and Communication Technologies
(Celct)
+ European Language Resources Association (ELRA)
+ International Speech Communication Association (ISCA)
=== Contact
Michael Paul
e-mail: michael.paul at atr.jp
ATR Spoken Language Communication Research Laboratories
2-2-2 Hikaridai, Keihanna Science City, Kyoto 619-0288 Japan
=== References
+ IWSLT 2005 (http://penance.is.cs.cmu.edu/iwslt2005)
+ IWSLT 2004 (http://www.slc.atr.jp/IWSLT2004)
+ C-STAR (http://www.c-star.org/)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
-------------------------------------------------------------------------
Message diffuse par la liste Langage Naturel <LN at cines.fr>
Informations, abonnement : http://www.atala.org/article.php3?id_article=48
English version :
Archives : http://listserv.linguistlist.org/archives/ln.html
http://liste.cines.fr/info/ln
La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion : http://www.atala.org/
-------------------------------------------------------------------------
More information about the Ln
mailing list