<HTML><HEAD></HEAD>

<BODY dir=ltr>

<DIV dir=ltr>

<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: 'Calibri'; COLOR: #000000">

<DIV>

<DIV 

style='FONT-SIZE: small; TEXT-DECORATION: none; FONT-FAMILY: "Calibri"; FONT-WEIGHT: normal; COLOR: #000000; FONT-STYLE: normal; DISPLAY: inline'>We 

apologize for multiple postings.</DIV>

<DIV dir=ltr>

<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: 'Calibri'; COLOR: #000000">

<DIV>Please distribute to interested colleagues </DIV>

<DIV><BR>============================================================</DIV>

<DIV> </DIV>

<DIV>  1st Call for Papers</DIV>

<DIV><BR>  7th WORKSHOP ON BUILDING AND USING COMPARABLE CORPORA</DIV>

<DIV> </DIV>

<DIV>  Building Resources for Machine Translation Research</DIV>

<DIV> </DIV>

<DIV>  <A 

href="http://comparable.limsi.fr/bucc2014/">http://comparable.limsi.fr/bucc2014/</A></DIV>

<DIV> </DIV>

<DIV>  May 27, 2014<BR>  Co-located with LREC 2014<BR>  Harpa 

Conference Centre, Reykjavik (Iceland) </DIV>

<DIV> </DIV>

<DIV>  DEADLINE FOR PAPERS: February 10, 2014<BR>  <A 

href="https://www.softconf.com/lrec2014/BUCC2014/">https://www.softconf.com/lrec2014/BUCC2014/</A></DIV>

<DIV> </DIV>

<DIV>============================================================</DIV>

<DIV> </DIV>

<DIV>MOTIVATION</DIV>

<DIV> </DIV>

<DIV>In the language engineering and the linguistics communities, research<BR>in 

comparable corpora has been motivated by two main reasons. In<BR>language 

engineering, on the one hand, it is chiefly motivated by the<BR>need to use 

comparable corpora as training data for statistical<BR>Natural Language 

Processing applications such as statistical machine<BR>translation or 

cross-lingual retrieval. In linguistics, on the other<BR>hand, comparable 

corpora are of interest in themselves by making<BR>possible inter-linguistic 

discoveries and comparisons. It is generally<BR>accepted in both communities 

that comparable corpora are documents in<BR>one or several languages that are 

comparable in content and form in<BR>various degrees and dimensions. We believe 

that the linguistic<BR>definitions and observations related to comparable 

corpora can improve<BR>methods to mine such corpora for applications of 

statistical NLP. As<BR>such, it is of great interest to bring together builders 

and users of<BR>such corpora.<BR> <BR>The scarcity of parallel corpora has 

motivated research concerning<BR>the use of comparable corpora: pairs of 

monolingual corpora selected<BR>according to the same set of criteria, but in 

different languages<BR>or language varieties. Non-parallel yet comparable 

corpora overcome<BR>the two limitations of parallel corpora, since sources for 

original,<BR>monolingual texts are much more abundant than translated 

texts.<BR>However, because of their nature, mining translations in 

comparable<BR>corpora is much more challenging than in parallel corpora. 

What<BR>constitutes a good comparable corpus, for a given task or per 

se,<BR>also requires specific attention: while the definition of a 

parallel<BR>corpus is fairly straightforward, building a non-parallel 

corpus<BR>requires control over the selection of source texts in both 

languages.</DIV>

<DIV> </DIV>

<DIV>Parallel corpora are a key resource as training data for 

statistical<BR>machine translation, and for building or extending bilingual 

lexicons<BR>and terminologies. However, beyond a few language pairs such 

as<BR>English- French or English-Chinese and a few contexts such 

as<BR>parliamentary debates or legal texts, they remain a scarce 

resource,<BR>despite the creation of automated methods to collect parallel 

corpora<BR>from the Web. To exemplify such issues in a practical setting, 

this<BR>year's special focus will be on</DIV>

<DIV> </DIV>

<DIV>    Building Resources for Machine Translation 

Research</DIV>

<DIV> </DIV>

<DIV>This special topic aims to address the need for:<BR>(1) Machine Translation 

training and testing data such as spoken or<BR>written monolingual, comparable 

or parallel data collections, and<BR>(2) methods and tools used for collecting, 

annotating, and verifying<BR>MT data such as Web crawling, crowdsourcing, tools 

for language<BR>experts and for finding MT data in comparable corpora.</DIV>

<DIV> </DIV>

<DIV><BR>TOPICS</DIV>

<DIV> </DIV>

<DIV>We solicit contributions including but not limited to the following 

topics:</DIV>

<DIV> </DIV>

<DIV>Topics related to the special theme:<BR>  * Methods and tools for 

collecting and processing MT data,<BR>        

including crowdsourcing<BR>  * Methods and tools for quality 

control<BR>  * Tools for efficient annotation<BR>  * Bilingual term 

and named entity collections<BR>  * Multilingual treebanks, wordnets, 

propbanks, etc.<BR>  * Comparable corpora with parallel units 

annotated<BR>  * Comparable corpora for under-resourced languages and 

specific domains<BR>  * Multilingual corpora with rich 

annotations:<BR>        POS tags, NEs, 

dependencies, semantic roles, etc.<BR>  * Data for special applications: 

patent translation, movie<BR>        

subtitles, MOOCs, meetings, chat-rooms, social media, etc.<BR>  * Legal 

issues with collecting and redistributing 

data<BR>        and generating 

derivatives</DIV>

<DIV> </DIV>

<DIV>Building comparable corpora:<BR>  * Human translations<BR>  * 

Automatic and semi-automatic methods<BR>  * Methods to mine parallel and 

non-parallel corpora from the Web<BR>  * Tools and criteria to evaluate the 

comparability of corpora<BR>  * Parallel vs non-parallel corpora, 

monolingual corpora<BR>  * Rare and minority languages, across language 

families<BR>  * Multi-media/multi-modal comparable corpora</DIV>

<DIV> </DIV>

<DIV>Applications of comparable corpora:<BR>  * Human 

translations<BR>  * Language learning<BR>  * Cross-language 

information retrieval & document categorization<BR>  * Bilingual 

projections<BR>  * Machine translation<BR>  * Writing assistance</DIV>

<DIV> </DIV>

<DIV>Mining from comparable corpora:<BR>  * Extraction of parallel segments 

or paraphrases from comparable corpora<BR>  * Extraction of bilingual and 

multilingual translations of single 

words<BR>        and multi-word expressions; 

proper names, named entities, etc.</DIV>

<DIV> </DIV>

<DIV><BR>IMPORTANT DATES<BR> <BR>  February 10, 2014    

Deadline for submission of full papers<BR>      March 

10, 2014    Notification of 

acceptance<BR>      March 27, 2014    

Camera-ready papers due<BR>         May 

27, 2014    Workshop date<BR> </DIV>

<DIV> </DIV>

<DIV>SUBMISSION INFORMATION</DIV>

<DIV> </DIV>

<DIV>Papers should follow the LREC main conference formatting details (to 

be<BR>announced on the conference website <A 

href="http://lrec2014.lrec-conf.org/en/">http://lrec2014.lrec-conf.org/en/</A> 

)<BR>and should be submitted as a PDF-file via the START workshop manager 

at<BR>  <A 

href="https://www.softconf.com/lrec2014/BUCC2014/">https://www.softconf.com/lrec2014/BUCC2014/</A></DIV>

<DIV> </DIV>

<DIV>Contributions can be short or long papers. Short paper submission 

must<BR>describe original and unpublished work without exceeding six 

(6)<BR>pages. Characteristics of short papers include: a small, 

focused<BR>contribution; work in progress; a negative result; an opinion 

piece;<BR>an interesting application nugget. Long paper submissions 

must<BR>describe substantial, original, completed and unpublished work 

without<BR>exceeding ten (10) pages.</DIV>

<DIV> </DIV>

<DIV>Reviewing will be double blind, so the papers should not reveal 

the<BR>authors' identity. Accepted papers will be published in the 

workshop<BR>proceedings.<BR> <BR>Double submission policy: Parallel 

submission to other meetings or<BR>publications is possible but must be 

immediately notified to the<BR>workshop organizers.<BR> <BR>When submitting 

a paper from the START page, authors will be asked to<BR>provide essential 

information about resources (in a broad sense,<BR>i.e. also technologies, 

standards, evaluation kits, etc.) that have<BR>been used for the work described 

in the paper or are a new result of<BR>your research.  Moreover, ELRA 

encourages all LREC authors to share<BR>the described LRs (data, tools, 

services, etc.), to enable their<BR>reuse, replicability of experiments, 

including evaluation ones, etc.</DIV>

<DIV> </DIV>

<DIV>For further information, please contact<BR>    Pierre 

Zweigenbaum pz (at) limsi (dot) fr</DIV>

<DIV> </DIV>

<DIV><BR>ORGANISERS<BR> <BR>  Pierre Zweigenbaum, LIMSI, CNRS, Orsay 

(France)<BR>  Ahmet Aker, University of Sheffield (UK)<BR>  Serge 

Sharoff, University of Leeds (UK)<BR>  Stephan Vogel, QCRI 

(Qatar)<BR>  Reinhard Rapp, Universities of Mainz (Germany) and 

Aix-Marseille (France)</DIV>

<DIV> </DIV>

<DIV><BR>SCIENTIFIC COMMITTEE</DIV>

<DIV> </DIV>

<DIV>  * Ahmet Aker, University of Sheffield (UK)<BR>  * Srinivas 

Bangalore (AT&T Labs, US)<BR>  * Caroline Barri�re (CRIM, Montr�al, 

Canada)<BR>  * Chris Biemann (TU Darmstadt, Germany)<BR>  * Herv� 

D�jean (Xerox Research Centre Europe, Grenoble, France)<BR>  * Kurt Eberle 

(Lingenio, Heidelberg, Germany)<BR>  * Andreas Eisele (European Commission, 

Luxembourg)<BR>  * �ric Gaussier (Universit� Joseph Fourier, Grenoble, 

France)<BR>  * Gregory Grefenstette (INRIA, Saclay, France)<BR>  * 

Silvia Hansen-Schirra (University of Mainz, Germany)<BR>  * Hitoshi Isahara 

(Toyohashi University of Technology)<BR>  * Kyo Kageura (University of 

Tokyo, Japan)<BR>  * Adam Kilgarriff (Lexical Computing Ltd, UK)<BR>  

* Natalie K�bler (Universit� Paris Diderot, France)<BR>  * Philippe 

Langlais (Universit� de Montr�al, Canada)<BR>  * Michael Mohler (Language 

Computer Corp., US)<BR>  * Emmanuel Morin (Universit� de Nantes, 

France)<BR>  * Dragos Stefan Munteanu (Language Weaver, Inc., US)<BR>  

* Lene Offersgaard (University of Copenhagen, Denmark)<BR>  * Ted Pedersen 

(University of Minnesota, Duluth, US)<BR>  * Reinhard Rapp (Universit� 

Aix-Marseille, France)<BR>  * Sujith Ravi (Google, US)<BR>  * Serge 

Sharoff (University of Leeds, UK)<BR>  * Michel Simard (National Research 

Council Canada)<BR>  * Richard Sproat (OGI School of Science & 

Technology, US)<BR>  * Tim Van de Cruys (IRIT-CNRS, Toulouse, 

France)<BR>  * Stephan Vogel, QCRI (Qatar)<BR>  * Guillaume Wisniewski 

(Universit� Paris Sud & LIMSI-CNRS, Orsay, France)<BR>  * Pierre 

Zweigenbaum (LIMSI-CNRS, Orsay, 

France)<BR></DIV></DIV></DIV></DIV></DIV></DIV></BODY></HTML>