<HTML><HEAD></HEAD>

<BODY dir=ltr>

<DIV dir=ltr>

<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: 'Calibri'; COLOR: #000000">

<DIV 

style='FONT-SIZE: small; TEXT-DECORATION: none; FONT-FAMILY: "Calibri"; FONT-WEIGHT: normal; COLOR: #000000; FONT-STYLE: normal; DISPLAY: inline'>

<DIV dir=ltr>

<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: 'Calibri'; COLOR: #000000">

<DIV>

<DIV 

style='FONT-SIZE: small; TEXT-DECORATION: none; FONT-FAMILY: "Calibri"; FONT-WEIGHT: normal; COLOR: #000000; FONT-STYLE: normal; DISPLAY: inline'>

<DIV 

style='FONT-SIZE: small; TEXT-DECORATION: none; FONT-FAMILY: "Calibri"; FONT-WEIGHT: normal; COLOR: #000000; FONT-STYLE: normal; DISPLAY: inline'>

<DIV 

style='FONT-SIZE: small; TEXT-DECORATION: none; FONT-FAMILY: "Calibri"; FONT-WEIGHT: normal; COLOR: #000000; FONT-STYLE: normal; DISPLAY: inline'>We 

apologize for multiple postings<BR>Please distribute to interested 

colleagues</DIV>

<DIV dir=ltr>

<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: 'Calibri'; COLOR: #000000">

<DIV dir=ltr>

<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: 'Calibri'; COLOR: #000000">

<DIV dir=ltr>

<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: 'Calibri'; COLOR: #000000">

<DIV> </DIV>

<DIV>============================================================</DIV>

<DIV>             

DEADLINE EXTENSION AND JOURNAL SPECIAL ISSUE 

<DIV>============================================================</DIV></DIV>

<DIV> </DIV>

<DIV>  7th WORKSHOP ON BUILDING AND USING COMPARABLE CORPORA</DIV>

<DIV> </DIV>

<DIV>  Building Resources for Machine Translation Research</DIV>

<DIV> </DIV>

<DIV>  <A 

href="http://comparable.limsi.fr/bucc2014/">http://comparable.limsi.fr/bucc2014/</A></DIV>

<DIV> </DIV>

<DIV>  May 27, 2014<BR>  Co-located with LREC 2014<BR>  Harpa 

Conference Centre, Reykjavik (Iceland)</DIV>

<DIV> </DIV>

<DIV>  EXTENDED DEADLINE FOR PAPERS: February 23, 2014<BR>  <A 

href="https://www.softconf.com/lrec2014/BUCC2014/">https://www.softconf.com/lrec2014/BUCC2014/</A></DIV>

<DIV> </DIV>

<DIV><BR>  *** INVITED SPEAKER ***</DIV>

<DIV> </DIV>

<DIV>  Chris Callison-Burch (University of Pennsylvania)</DIV>

<DIV> </DIV>

<DIV>============================================================</DIV>

<DIV> </DIV>

<DIV>MOTIVATION</DIV>

<DIV> </DIV>

<DIV>In the language engineering and the linguistics communities, research<BR>in 

comparable corpora has been motivated by two main reasons. In<BR>language 

engineering, on the one hand, it is chiefly motivated by the<BR>need to use 

comparable corpora as training data for statistical<BR>Natural Language 

Processing applications such as statistical machine<BR>translation or 

cross-lingual retrieval. In linguistics, on the other<BR>hand, comparable 

corpora are of interest in themselves by making<BR>possible inter-linguistic 

discoveries and comparisons. It is generally<BR>accepted in both communities 

that comparable corpora are documents in<BR>one or several languages that are 

comparable in content and form in<BR>various degrees and dimensions. We believe 

that the linguistic<BR>definitions and observations related to comparable 

corpora can improve<BR>methods to mine such corpora for applications of 

statistical NLP. As<BR>such, it is of great interest to bring together builders 

and users of<BR>such corpora.</DIV>

<DIV> </DIV>

<DIV>The scarcity of parallel corpora has motivated research concerning<BR>the 

use of comparable corpora: pairs of monolingual corpora selected<BR>according to 

the same set of criteria, but in different languages<BR>or language varieties. 

Non-parallel yet comparable corpora overcome<BR>the two limitations of parallel 

corpora, since sources for original,<BR>monolingual texts are much more abundant 

than translated texts.<BR>However, because of their nature, mining translations 

in comparable<BR>corpora is much more challenging than in parallel corpora. 

What<BR>constitutes a good comparable corpus, for a given task or per 

se,<BR>also requires specific attention: while the definition of a 

parallel<BR>corpus is fairly straightforward, building a non-parallel 

corpus<BR>requires control over the selection of source texts in both 

languages.</DIV>

<DIV> </DIV>

<DIV>Parallel corpora are a key resource as training data for 

statistical<BR>machine translation, and for building or extending bilingual 

lexicons<BR>and terminologies. However, beyond a few language pairs such 

as<BR>English- French or English-Chinese and a few contexts such 

as<BR>parliamentary debates or legal texts, they remain a scarce 

resource,<BR>despite the creation of automated methods to collect parallel 

corpora<BR>from the Web. To exemplify such issues in a practical setting, 

this<BR>year's special focus will be on</DIV>

<DIV> </DIV>

<DIV>    Building Resources for Machine Translation 

Research</DIV>

<DIV> </DIV>

<DIV>This special topic aims to address the need for:<BR>(1) Machine Translation 

training and testing data such as spoken or<BR>written monolingual, comparable 

or parallel data collections, and<BR>(2) methods and tools used for collecting, 

annotating, and verifying<BR>MT data such as Web crawling, crowdsourcing, tools 

for language<BR>experts and for finding MT data in comparable corpora.</DIV>

<DIV> </DIV>

<DIV><BR>TOPICS</DIV>

<DIV> </DIV>

<DIV>We solicit contributions including but not limited to the following 

topics:</DIV>

<DIV> </DIV>

<DIV>Topics related to the special theme:<BR>  * Methods and tools for 

collecting and processing MT data,<BR>        

including crowdsourcing<BR>  * Methods and tools for quality 

control<BR>  * Tools for efficient annotation<BR>  * Bilingual term 

and named entity collections<BR>  * Multilingual treebanks, wordnets, 

propbanks, etc.<BR>  * Comparable corpora with parallel units 

annotated<BR>  * Comparable corpora for under-resourced languages and 

specific domains<BR>  * Multilingual corpora with rich 

annotations:<BR>        POS tags, NEs, 

dependencies, semantic roles, etc.<BR>  * Data for special applications: 

patent translation, movie<BR>        

subtitles, MOOCs, meetings, chat-rooms, social media, etc.<BR>  * Legal 

issues with collecting and redistributing 

data<BR>        and generating 

derivatives</DIV>

<DIV> </DIV>

<DIV>Building comparable corpora:<BR>  * Human translations<BR>  * 

Automatic and semi-automatic methods<BR>  * Methods to mine parallel and 

non-parallel corpora from the Web<BR>  * Tools and criteria to evaluate the 

comparability of corpora<BR>  * Parallel vs non-parallel corpora, 

monolingual corpora<BR>  * Rare and minority languages, across language 

families<BR>  * Multi-media/multi-modal comparable corpora</DIV>

<DIV> </DIV>

<DIV>Applications of comparable corpora:<BR>  * Human 

translations<BR>  * Language learning<BR>  * Cross-language 

information retrieval & document categorization<BR>  * Bilingual 

projections<BR>  * Machine translation<BR>  * Writing assistance</DIV>

<DIV> </DIV>

<DIV>Mining from comparable corpora:<BR>  * Extraction of parallel segments 

or paraphrases from comparable corpora<BR>  * Extraction of bilingual and 

multilingual translations of single 

words<BR>        and multi-word expressions; 

proper names, named entities, etc.</DIV>

<DIV> </DIV>

<DIV><BR>IMPORTANT DATES</DIV>

<DIV> </DIV>

<DIV>   February 23, 2014    Deadline for submission of 

papers (extended)<BR>      March 10, 

2014    Notification of 

acceptance<BR>      March 27, 2014    

Camera-ready papers due<BR>         May 

27, 2014   Workshop date</DIV>

<DIV> </DIV>

<DIV><BR>SUBMISSION INFORMATION</DIV>

<DIV> </DIV>

<DIV>Papers should follow the LREC main conference formatting details (to 

be<BR>announced on the conference website <A 

href="http://lrec2014.lrec-conf.org/en/">http://lrec2014.lrec-conf.org/en/</A> 

)<BR>and should be submitted as a PDF-file via the START workshop manager 

at<BR>  <A 

href="https://www.softconf.com/lrec2014/BUCC2014/">https://www.softconf.com/lrec2014/BUCC2014/</A></DIV>

<DIV> </DIV>

<DIV>Contributions can be short or long papers. Short paper submission 

must<BR>describe original and unpublished work without exceeding six 

(6)<BR>pages. Characteristics of short papers include: a small, 

focused<BR>contribution; work in progress; a negative result; an opinion 

piece;<BR>an interesting application nugget. Long paper submissions 

must<BR>describe substantial, original, completed and unpublished work 

without<BR>exceeding ten (10) pages.</DIV>

<DIV> </DIV>

<DIV>Reviewing will be double blind, so the papers should not reveal 

the<BR>authors' identity. Accepted papers will be published in the 

workshop<BR>proceedings.</DIV>

<DIV> </DIV>

<DIV>Double submission policy: Parallel submission to other meetings 

or<BR>publications is possible but must be immediately notified to 

the<BR>workshop organizers.</DIV>

<DIV> </DIV>

<DIV>When submitting a paper from the START page, authors will be asked 

to<BR>provide essential information about resources (in a broad sense,<BR>i.e. 

also technologies, standards, evaluation kits, etc.) that have<BR>been used for 

the work described in the paper or are a new result of<BR>your research.  

Moreover, ELRA encourages all LREC authors to share<BR>the described LRs (data, 

tools, services, etc.), to enable their<BR>reuse, replicability of experiments, 

including evaluation ones, etc.</DIV>

<DIV> </DIV>

<DIV><BR>JOURNAL SPECIAL ISSUE</DIV>

<DIV> </DIV>

<DIV>Authors of selected papers will be encouraged to submit 

substantially<BR>extended versions of their manuscripts to an upcoming special 

issue<BR>on �Machine Translation Using Comparable Corpora� of the Journal<BR>of 

Natural Language Engineering.</DIV>

<DIV> </DIV>

<DIV><BR>ORGANISERS</DIV>

<DIV> </DIV>

<DIV>  Pierre Zweigenbaum, LIMSI, CNRS, Orsay (France)<BR>  Ahmet 

Aker, University of Sheffield (UK)<BR>  Serge Sharoff, University of Leeds 

(UK)<BR>  Stephan Vogel, QCRI (Qatar)<BR>  Reinhard Rapp, Universities 

of Mainz (Germany) and Aix-Marseille (France)</DIV>

<DIV> </DIV>

<DIV><BR>CONTACT</DIV>

<DIV> </DIV>

<DIV>  Pierre Zweigenbaum:  pz (at) limsi (dot) fr</DIV>

<DIV> </DIV>

<DIV><BR>SCIENTIFIC COMMITTEE</DIV>

<DIV> </DIV>

<DIV>  * Ahmet Aker, University of Sheffield (UK)<BR>  * Srinivas 

Bangalore (AT&T Labs, US)<BR>  * Caroline Barri�re (CRIM, Montr�al, 

Canada)<BR>  * Chris Biemann (TU Darmstadt, Germany)<BR>  * Herv� 

D�jean (Xerox Research Centre Europe, Grenoble, France)<BR>  * Kurt Eberle 

(Lingenio, Heidelberg, Germany)<BR>  * Andreas Eisele (European Commission, 

Luxembourg)<BR>  * �ric Gaussier (Universit� Joseph Fourier, Grenoble, 

France)<BR>  * Gregory Grefenstette (INRIA, Saclay, France)<BR>  * 

Silvia Hansen-Schirra (University of Mainz, Germany)<BR>  * Hitoshi Isahara 

(Toyohashi University of Technology)<BR>  * Kyo Kageura (University of 

Tokyo, Japan)<BR>  * Adam Kilgarriff (Lexical Computing Ltd, UK)<BR>  

* Natalie K�bler (Universit� Paris Diderot, France)<BR>  * Philippe 

Langlais (Universit� de Montr�al, Canada)<BR>  * Michael Mohler (Language 

Computer Corp., US)<BR>  * Emmanuel Morin (Universit� de Nantes, 

France)<BR>  * Dragos Stefan Munteanu (Language Weaver, Inc., US)<BR>  

* Lene Offersgaard (University of Copenhagen, Denmark)<BR>  * Ted Pedersen 

(University of Minnesota, Duluth, US)<BR>  * Reinhard Rapp (Universit� 

Aix-Marseille, France)<BR>  * Sujith Ravi (Google, Mountain View, 

US)<BR>  * Serge Sharoff (University of Leeds, UK)<BR>  * Michel 

Simard (National Research Council Canada)<BR>  * Richard Sproat (OGI School 

of Science & Technology, US)<BR>  * Tim Van de Cruys (IRIT-CNRS, 

Toulouse, France)<BR>  * Stephan Vogel (QCRI, Qatar)<BR>  * Guillaume 

Wisniewski (Universit� Paris Sud & LIMSI-CNRS, Orsay, France)<BR>  * 

Pierre Zweigenbaum (LIMSI-CNRS, Orsay, France)</DIV>

<DIV> </DIV>

<DIV> </DIV></DIV></DIV></DIV></DIV></DIV></DIV></DIV></DIV></DIV></DIV></DIV></DIV></DIV></DIV></BODY></HTML>