Arabic-L:LING:Linguistic pre-processing for MT Final CFP

Wed May 20 17:27:19 UTC 2009

------------------------------------------------------------------------
Arabic-L: Wed 20 May 2009
Moderator: Dilworth Parkinson <dilworth_parkinson at byu.edu>
[To post messages to the list, send them to arabic-l at byu.edu]
[To unsubscribe, send message from same address you subscribed from to
listserv at byu.edu with first line reading:
            unsubscribe arabic-l                                      ]

-------------------------Directory------------------------------------

1) Subject:Linguistic pre-processing for MT Final CFP

-------------------------Messages-----------------------------------
1)
Date: 20 May 2009
From:Priscilla Rasmussen <rasmusse at ptd.net>
Subject:Linguistic pre-processing for MT Final CFP

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* FINAL CALL FOR PAPERS *

Workshop on
Linguistic pre-processing for MT

Paper submission deadline:   May 8, 2009

August 30, 2009
Machine Translation Summit XII
Ottawa, Ontario, Canada

We invite proposals for presentation at the Workshop on Linguistic pre- 
processing for MT, being held in conjunction with MT Summit XII.

WORKSHOP DESCRIPTION
Input for MT varies significantly in terms of spelling, terminology,  
word order phenomena, dialects, and sentence types, even within the  
same language. With user-generated content, this variability increases  
enormously.  MT systems, and NLP systems generally, cannot cover  
effectively all of this variability -- usually because they are built  
to deal with professionally written technical or journalistic texts.  
Robust and reliable systems for mapping highly variable, uncontrolled  
writing into more consistent, tractable, "controlled" sentences will  
improve MT, search, and other NLP tasks.  Current approaches to this  
problem include manually pre-editing the input texts -- as discussed  
for example in the series of CLAW workshops -- and/or expanding the  
coverage of MT systems.

One alternative approach is to pre-process or normalize the input  
automatically before MT.  Translation of subtitles for television  
(Flanagan, 2006), non-fluent speech, low-quality OCR, and non-standard  
writing from limited-proficiency writers are only some of the  
application scenarios that require automatic linguistic pre-processing  
to improve MT output. For example, Callison-Burch (2007) showed that  
substitution of lexical paraphrases improved MT output. Xu & Seneff  
(2008) and Collins, Koehn & Kucerova (2005) re-arranged word order to  
improve performance of a statistical MT system. Yet another  
alternative approach is to produce a linguistically "enriched" input,  
in the form of lattices, trees, markup, etc. and allow for final  
interpretation later in the translation pipeline and/or with a direct  
feedback capability to force emergent behavior. Some approaches may  
even call into question the need for a strict, linear processing  
pipeline and may employ adaptive, iterative, or self-learning methods.

Common to all of these alternatives is the strategy of deploying  
significant linguistic and non-linguistic knowledge before translation  
itself occurs. This raises many questions about which kinds of  
knowledge have the biggest impact on translation, which can be  
automated most reliably and robustly, and which are most cost  
effective and scalable.

This workshop aims to compare and contrast some of the various  
techniques and approaches to these kinds of linguistic pre-processing  
for MT. The workshop will consist of a set of papers that will be  
selected by peer review.

IMPORTANT DATES

Paper submission deadline:         May 8, 2009
Notification of acceptance:         June 12, 2009
Camera ready submissions:        July 10, 2009

WORKSHOP TOPICS

We welcome submissions about the main theme of this workshop. Specific  
topics include but are not limited to:
* Paraphrase generation
* Syntactic reordering
* Lexical / Terminological substitution
* Error detection and automatic correction
* Processing user-generated content
* Monolingual MT
* Confidence scoring
* Self-learning and adaptability

SUBMISSION REQUIREMENTS

Papers should not have been presented somewhere else or be under  
consideration for publication elsewhere, and should not identify the  
author(s). They should emphasize completed work rather than intended  
work. Each paper will be anonymously reviewed by the program committee.

Papers must be submitted in PDF format to mike [at] mikedillinger  
[dot] com by midnight of the due date. Submissions should be in  
English. The papers should be attached to an email indicating contact  
information for the author(s) and paper’s title. Papers should not  
exceed 8 pages including references and tables, and should follow the  
formatting guidelines posted at the MT Summit web site.

CONTACT INFORMATION

For further information, contact the organizing committee at mike [at]  
mikedillinger [dot] com

ORGANIZING COMMITTEE

Mike Dillinger, Translation Optimization Partners (Primary Contact)

PROGRAM COMMITTEE
* Alon Lavie (CMU)
* Farzad Ehsani (Fluential Inc)
* Hassan Sawaf (Apptek)
* Jörg Schütz (Bioloom Group)
* Philipp Koehn (U Edinburgh)

--------------------------------------------------------------------------
End of Arabic-L:  20 May 2009