<HTML><HEAD></HEAD>

<BODY dir=ltr>

<DIV dir=ltr>

<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: 'Calibri'; COLOR: #000000">

<DIV>

<DIV 

style='FONT-SIZE: small; TEXT-DECORATION: none; FONT-FAMILY: "Calibri"; FONT-WEIGHT: normal; COLOR: #000000; FONT-STYLE: normal; DISPLAY: inline'>=================================================================</DIV>

<DIV dir=ltr>

<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: 'Calibri'; COLOR: #000000">

<DIV> </DIV>

<DIV>LAST CALL FOR PAPERS</DIV>

<DIV> </DIV>

<DIV>Journal of Natural Language Engineering</DIV>

<DIV> </DIV>

<DIV>Special Issue on “Machine Translation Using Comparable Corpora”</DIV>

<DIV> </DIV>

<DIV><A 

href="http://comparable.limsi.fr/jnle-bucc2015/">http://comparable.limsi.fr/jnle-bucc2015/</A></DIV>

<DIV> </DIV>

<DIV>=================================================================</DIV>

<DIV> </DIV>

<DIV>Statistical machine translation based on parallel corpora has been very 

successful. For example, the major search engines' translation systems, which 

are used by millions of people, are primarily using this approach, and it has 

been possible to come up with new language pairs in a fraction of the time that 

would be required when using more traditional rule-based methods.</DIV>

<DIV> </DIV>

<DIV>In contrast, research on machine translation using comparable corpora is 

still at an earlier stage. Comparable corpora can be defined as monolingual 

corpora covering roughly the same subject area in different languages but 

without being exact translations of each other.</DIV>

<DIV> </DIV>

<DIV>However, despite its tremendous success, the use of parallel corpora in MT 

has a number of drawbacks:</DIV>

<DIV> </DIV>

<DIV>1) It has been shown that translated language is somewhat different from 

original language, for example Klebanov & Flor showed that "associative 

texture" is lost in translation. </DIV>

<DIV> </DIV>

<DIV>2) As they require translation, parallel corpora will always be a far 

scarcer resource than comparable corpora. This is a severe drawback for a number 

of reasons:</DIV>

<DIV> </DIV>

<DIV>a) Among the about 7000 world languages, of which 600 have a written form, 

the vast majority are of the "low resource" type.</DIV>

<DIV> </DIV>

<DIV>b) The number of possible language pairs increases with the square of the 

number of languages. When using parallel corpora, one bitext is needed for each 

language pair. When using comparable corpora, one monolingual corpus per 

language suffices.</DIV>

<DIV> </DIV>

<DIV>c) For improved translation quality, translation systems specialized on 

particular genres and domains are desirable. But it is far more difficult to 

acquire appropriate parallel rather than comparable training corpora.</DIV>

<DIV> </DIV>

<DIV>d) As language evolves over time, the training corpora should be updated on 

a regular basis. Again, this is more difficult in the parallel case.</DIV>

<DIV> </DIV>

<DIV>For such reasons it would be a big step forward if it were possible to base 

statistical machine translation on comparable rather than on parallel corpora: 

The acquisition of training data would be far easier, and  the unnatural 

"translation bias" (source language shining through) within the training data 

could be avoided.</DIV>

<DIV> </DIV>

<DIV>But is there any evidence that this is possible? Motivation for using 

comparable corpora in MT research comes from a cognitive perspective: Experience 

tells that persons who have learned a second language completely independently 

from their mother tongue can nevertheless translate between the languages. That 

is, human performance shows that there must be a way to bridge the gap between 

languages which does not rely on parallel data. Using parallel data for MT is of 

course a nice shortcut. But avoiding this shortcut by doing MT based on 

comparable corpora may well be a key to a better understanding of human 

translation, and to better MT quality.</DIV>

<DIV> </DIV>

<DIV>Work on comparable corpora in the context of MT has been ongoing for almost 

20 years. It has turned out that this is a very hard problem to solve, but as it 

is among the grand challenges in multilingual NLP, interest has steadily 

increased. Apart from the increase in publications this can be seen from the 

considerable number of research projects (such as ACCURAT, HyghTra, and TTC) 

which are fully or partially devoted to MT using comparable corpora. Given also 

the success of the workshop series on “Building and Using Comparable Corpora“ 

(BUCC), which is now in its 8th year, and following the publication of a related 

book (<A 

href="http://www.springer.com/computer/ai/book/978-3-642-20127-1">http://www.springer.com/computer/ai/book/978-3-642-20127-1</A>), 

we think that it is now time to devote a journal special issue to this field. It 

is meant to bundle the latest top class research, make it available to everybody 

working in the field, and at the same time give an overview on the state of the 

art to all interested researchers.</DIV>

<DIV> </DIV>

<DIV><BR>TOPICS OF INTEREST</DIV>

<DIV> </DIV>

<DIV>We solicit contributions including but not limited to the following 

topics:</DIV>

<DIV> </DIV>

<DIV>• Comparable corpora based MT systems (CCMTs)<BR>• Architectures for 

CCMTs<BR>• CCMTs for less-resourced languages<BR>• CCMTs for less-resourced 

domains<BR>• CCMTs dealing with morphologically rich languages<BR>• CCMTs for 

spoken translation<BR>• Applications of CCMTs<BR>• CCMT evaluation<BR>• Open 

source CCMT systems<BR>• Hybrid systems combining SMT and CCMT<BR>• Hybrid 

systems combining rule-based MT and CCMT <BR>• Enhancing phrase-based SMT using 

comparable corpora<BR>• Expanding phrase tables using comparable corpora<BR>• 

Comparable corpora based processing tools/kits for MT<BR>• Methods for mining 

comparable corpora from the Web<BR>• Applying Harris' distributional hypothesis 

to comparable corpora<BR>• Induction of morphological, grammatical, and 

translation rules from comparable corpora<BR>• Machine learning techniques using 

comparable corpora<BR>• Parallel corpora vs. pairs of non-parallel monolingual 

corpora<BR>• Extraction of parallel segments or paraphrases from comparable 

corpora<BR>• Extraction of bilingual and multilingual translations of single 

words and multi-word expressions, <BR>   proper names, and named 

entities from comparable corpora</DIV>

<DIV> </DIV>

<DIV><BR>IMPORTANT DATES<BR> <BR>December 1, 2014: Paper submission 

deadline<BR>February 1, 2015: Notification<BR>May 1, 2015: Deadline for revised 

papers<BR>July 1, 2015: Final notification<BR>September 1, 2015: Final paper 

due</DIV>

<DIV> </DIV>

<DIV><BR>FURTHER INFORMATION</DIV>

<DIV> </DIV>

<DIV>Further details and updates can be found here: <BR><A 

href="http://comparable.limsi.fr/jnle-bucc2015/">http://comparable.limsi.fr/jnle-bucc2015/</A></DIV>

<DIV> </DIV>

<DIV>Please use the following e-mail address to contact the guest editors: 

<BR>jnle.bucc (at) limsi (dot) fr</DIV>

<DIV> </DIV>

<DIV><BR>GUEST EDITORS</DIV>

<DIV> </DIV>

<DIV>Reinhard Rapp, University of Mainz (Germany)<BR>Serge Sharoff, University 

of Leeds (UK)<BR>Pierre Zweigenbaum, LIMSI, CNRS (France)</DIV>

<DIV> </DIV>

<DIV><BR>GUEST EDITORIAL BOARD</DIV>

<DIV> </DIV>

<DIV>Ahmet Aker (University of Sheffield, UK)<BR>Marianna Apidianaki (LIMSI, 

CNRS, Orsay, France)<BR>Nuria Bel (Universitat Pompeu Fabra, Barcelona, 

Spain)<BR>Dhouha Bouamor (Trooclick, Paris, France)<BR>Ken Church (IBM Watson 

Research Center, Yorktown Heights, NY, USA)<BR>Beatrice Daille (Université de 

Nantes, France)<BR>Silvia Hansen-Schirra (Universität Mainz, Germany)<BR>Amir 

Hazem (Université de Nantes, France)<BR>Kevin Knight (University of Southern 

California, ISI, USA)<BR>Philipp Koehn (Johns Hopkins University, Baltimore, MD, 

USA)<BR>Tomas Mikolov (Facebook, Menlo Park, CA, USA)<BR>Emmanuel Morin 

(Université de Nantes, France)<BR>Uwe Quasthoff (Universität Leipzig, 

Germany)<BR>Reinhard Rapp (Universität Mainz, Germany)<BR>Serge Sharoff 

(University of Leeds, UK)<BR>Inguna Skadina (Tilde and Liepaja University, 

Latvia)<BR>Marko Tadic (University of Zagreb, Croatia)<BR>George Tambouratzis 

(Institute for Language and Speech Processing, Athens, Greece)<BR>Benjamin Tsou 

(The Hong Kong Institute of Education, China)<BR>Stephan Vogel (Qatar Computing 

Research Institute, Doha, Qatar)<BR>Yorick Wilks (Florida Institute of Human and 

Machine Cognition, Ocala, USA)<BR>Pierre Zweigenbaum (LIMSI, CNRS, France)</DIV>

<DIV> </DIV></DIV></DIV></DIV></DIV></DIV></BODY></HTML>