Appel: International Workshop on Spoken Language Translation (IWSLT 2006)

Fri Jun 23 20:24:34 UTC 2006

Date: Fri, 23 Jun 2006 16:57:41 +0200
From: ELDA <info at elda.org>
Message-ID: <449C0165.4030801 at elda.org>
X-url: http://www.slc.atr.jp/IWSLT2006
X-url: http://penance.is.cs.cmu.edu/iwslt2005
X-url: http://www.slc.atr.jp/IWSLT2004
X-url: http://www.c-star.org/

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

  International Workshop on Spoken Language Translation (IWSLT 2006)
       -- Evaluation Campaign on Spoken Language Translation --

                Second Call for Participants / Papers

                         November 27-28, 2006
                             Kyoto, Japan

                   http://www.slc.atr.jp/IWSLT2006

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Spoken language translation technologies attempt to cross the language
barriers between people having different native languages who each
want to engage in conversation by using their mother-tongue.  Spoken
language translation has to deal with problems of automatic speech
recognition (ASR) and machine translation (MT).

One of the prominent research activities in spoken language
translation is the work being conducted by the Consortium for Speech
Translation Advanced Research (C-STAR III), which is an international
partnership of research laboratories engaged in automatic translation
of spoken language. Current members include ATR (Japan), CAS (China),
CLIPS (France), CMU (USA), ETRI (Korea), ITC-irst (Italy), and UKA
(Germany).

A multilingual speech corpus comprised of tourism-related sentences
(BTEC*) has been created by the C-STAR members and parts of this
corpus were already used for previous IWSLT workshops focusing on the
evaluation of MT results based on text input
(http://www.slc.atr.jp/IWSLT2004) and the translation of ASR output
(word lattices, N-best lists) using read speech as input
(http://penance.is.cs.cmu.edu/iwslt2005). The full BTEC* corpus
consists of 160K of sentence-aligned text data and parts of the corpus
will be provided to all evaluation campaign participants for training
purposes.

In this workshop, we focus on the translation of spontaneous speech
which includes ill-formed utterances due to grammatical incorrectness,
incomplete sentences, and redundant expressions. The impact of
spontaneity aspects on the ASR and MT systems performance as well as
the robustness of state-of-the-art MT engines towards speech
recognition errors will be investigated in detail.

Two types of submissions are invited:

 1) participants in the evaluation campaign of spoken language
    translation technologies. Each participant in the evaluation
    campaign is requested to submit a paper describing the utilized
    ASR and MT systems and to report results using the provided test
    data.

 2) technical papers on related issues.

An overview of the evaluation campaign is as follows:

=== Evaluation Campaign

Theme:

    * Spontaneous speech translation

Translation Directions:

    * Arabic/Chinese/Italian/Japanese into English (AE, CE, IE, JE)

Input Conditions:

    * Speech (audio)
    * ASR Output (word lattice or N-best list)
    * Cleaned Transcripts (text)

Supplied Resources:

    * training corpus:
          o AE, IE:
                + 20,000 sentence pairs of BTEC*
                + three develop sets (3x500 sentence pairs, 16
                  multiple references)
          o CE, JE:
                + 40,000 sentence pairs of BTEC*
                + three develop sets (3x500 sentence pairs, 16
                  multiple 
references)

    * develop corpus:
          o speech data, word lattices, N-best lists of 500 input
            sentences with 7 reference translations for each
            translation direction and input condition

    * test corpus:
          o speech data, word lattices, N-best lists of 500 input
            sentences for each translation direction and input
            condition

  => word segmentations will be provided according to the output of
     the provided ASR engines

Data Tracks:

    The past IWSLT workshop results showed that the amount of BTEC*
    sentence pairs used for training largely effects the performance
    of the MT systems on the given task. However, only CSTAR partners
    have access to the full BTEC* corpus. In order to allow a fair
    comparison between the systems, we decided to distinguish the
    following two data tracks:

    * Open Data Track ("open" for everyone :->)
          o no restrictions on training data of ASR engines
          o any resources, besides the full BTEC* corpus and
            proprietary data, can be used as the training data of MT
            engines.  Concerning the BTEC* corpus and proprietary
            data, only the Supplied Resources (see above) are allowed
            to be used for training purposes.

    * C-STAR Data Track
          o no restrictions on training data of ASR engines
          o any resources (including the full BTEC* corpus and
            proprietary data) can be used as the training data of MT
            engines.

Evaluation Specification:

    * ASR output
          o (automatic) WER

    * MT output
          o (automatic) BLEU(*), NIST, METEOR
          o (subjective) fluency(*), adequacy(*)

     -> systems will be ranked according to the metrics marked '(*)'
     -> human assessment will be carried out for the top-10 systems
        (according to the BLEU metric) of the Chinese-to-English
        Open Data Track (ASR Output condition).

=== Technical Paper:

The workshop also invites technical papers related to spoken language
translation.

Possible topics include, but are not limited to:

    * Spontaneous speech translation
    * Domain and language portability
    * MT using comparable and non-parallel corpora
    * Phrase alignment algorithms
    * MT decoding algorithms
    * MT evaluation measures

=== Important Dates

  + Evaluation Campaign

        April  7, 2006 -- System Registration Open
          May 12, 2006 -- Training Corpus Release
         June 30, 2006 -- Develop Corpus Release
       August  7, 2006 -- Test Corpus Release [00:01 JST]
       August  9, 2006 -- Result Submission Due [23:59 JST]
    September 15, 2006 -- Result Feedback to Participants 2006
    September 29, 2006 -- Paper Submission Due
      October 14, 2006 -- Notification of Acceptance
      October 27, 2006 -- Camera-ready Submission Due

     - system registrations will be accepted until release of
       test corpus
     - late result submissions will be treated as unofficial
       result submissions

  + Technical Papers

    September 15, 2006 -- Paper Submission Due [23:59 JST]
      October 17, 2006 -- Notification of Acceptance
      October 27, 2006 -- Camera-ready Submission Due

=== Application / Submission Guidelines / Updated Information

  + available at http://www.slc.atr.jp/IWSLT2006

=== Organizers

  + Satoshi Nakamura (ATR, Japan; Chair)
  + Herve Blanchon (CLIPS, France)
  + Gianni Lazzari (ITC-irst, Italy)
  + Youngjik Lee (ETRI, Korea)
  + Alex Waibel (CMU, USA / UKA, Germany)
  + Bo Xu (CAS, China)

=== Program Committee

  + Michael Paul (ATR, Japan; Evaluation Campaign Chair)
  + Marcello Federico (ITC-irst, Italy; Technical Paper Chair)
  + Nicola Bertoldi (ITC-irst, Italy)
  + Christian Boitet (CLIPS, France)
  + Genichiro Kikui (NTT, Japan)
  + Kevin Knight (ISI, USA)
  + Phillip Koehn (Univ. of Edinburgh, UK)
  + Sadao Kurohashi (Univ. of Tokyo, Japan)
  + Young-Suk Lee (IBM, USA)
  + Jose B. Marino (UPC, Spain)
  + Arul Menezes (Microsoft, USA)
  + Masaaki Nagata (NTT, Japan)
  + Hermann Ney (RWTH, Germany)
  + Seung-Shin Oh (ETRI, Korea)
  + Wade Shen (MIT, USA)
  + Stephan Vogel (CMU, USA)
  + Andy Way (Dublin City University, Ireland)
  + Chengqing Zong (CAS, China)

=== Local Arrangements

  + Genichiro Kikui (NTT, Japan)

=== Conference Venue

  + Paruru Plaza Kyoto (right in front of Kyoto Station)

=== Supporting Organizations

  + Advanced Telecommunication Research Institute International (ATR)
  + Association for Computational Linguistics (ACL)
  + Center for the Evaluation of Language and Communication Technologies 
(Celct)
  + European Language Resources Association (ELRA)
  + International Speech Communication Association (ISCA)

=== Contact

  Michael Paul    
  e-mail: michael.paul at atr.jp
  ATR Spoken Language Communication Research Laboratories
  2-2-2 Hikaridai, Keihanna Science City, Kyoto 619-0288 Japan

=== References

  + IWSLT 2005 (http://penance.is.cs.cmu.edu/iwslt2005)
  + IWSLT 2004 (http://www.slc.atr.jp/IWSLT2004)
  + C-STAR (http://www.c-star.org/)

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

-------------------------------------------------------------------------
Message diffuse par la liste Langage Naturel <LN at cines.fr>
Informations, abonnement : http://www.atala.org/article.php3?id_article=48
English version       : 
Archives                 : http://listserv.linguistlist.org/archives/ln.html
                                http://liste.cines.fr/info/ln

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  : http://www.atala.org/
-------------------------------------------------------------------------