<HTML><HEAD></HEAD>
<BODY dir=ltr>
<DIV dir=ltr>
<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: 'Calibri'; COLOR: #000000">
<DIV>
<DIV
style='FONT-SIZE: small; TEXT-DECORATION: none; FONT-FAMILY: "Calibri"; FONT-WEIGHT: normal; COLOR: #000000; FONT-STYLE: normal; DISPLAY: inline'>We
apologize for multiple postings.</DIV>
<DIV dir=ltr>
<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: 'Calibri'; COLOR: #000000">
<DIV>Please distribute to interested colleagues </DIV>
<DIV><BR>============================================================</DIV>
<DIV> </DIV>
<DIV> 1st Call for Papers</DIV>
<DIV><BR> 7th WORKSHOP ON BUILDING AND USING COMPARABLE CORPORA</DIV>
<DIV> </DIV>
<DIV> Building Resources for Machine Translation Research</DIV>
<DIV> </DIV>
<DIV> <A
href="http://comparable.limsi.fr/bucc2014/">http://comparable.limsi.fr/bucc2014/</A></DIV>
<DIV> </DIV>
<DIV> May 27, 2014<BR> Co-located with LREC 2014<BR> Harpa
Conference Centre, Reykjavik (Iceland) </DIV>
<DIV> </DIV>
<DIV> DEADLINE FOR PAPERS: February 10, 2014<BR> <A
href="https://www.softconf.com/lrec2014/BUCC2014/">https://www.softconf.com/lrec2014/BUCC2014/</A></DIV>
<DIV> </DIV>
<DIV>============================================================</DIV>
<DIV> </DIV>
<DIV>MOTIVATION</DIV>
<DIV> </DIV>
<DIV>In the language engineering and the linguistics communities, research<BR>in
comparable corpora has been motivated by two main reasons. In<BR>language
engineering, on the one hand, it is chiefly motivated by the<BR>need to use
comparable corpora as training data for statistical<BR>Natural Language
Processing applications such as statistical machine<BR>translation or
cross-lingual retrieval. In linguistics, on the other<BR>hand, comparable
corpora are of interest in themselves by making<BR>possible inter-linguistic
discoveries and comparisons. It is generally<BR>accepted in both communities
that comparable corpora are documents in<BR>one or several languages that are
comparable in content and form in<BR>various degrees and dimensions. We believe
that the linguistic<BR>definitions and observations related to comparable
corpora can improve<BR>methods to mine such corpora for applications of
statistical NLP. As<BR>such, it is of great interest to bring together builders
and users of<BR>such corpora.<BR> <BR>The scarcity of parallel corpora has
motivated research concerning<BR>the use of comparable corpora: pairs of
monolingual corpora selected<BR>according to the same set of criteria, but in
different languages<BR>or language varieties. Non-parallel yet comparable
corpora overcome<BR>the two limitations of parallel corpora, since sources for
original,<BR>monolingual texts are much more abundant than translated
texts.<BR>However, because of their nature, mining translations in
comparable<BR>corpora is much more challenging than in parallel corpora.
What<BR>constitutes a good comparable corpus, for a given task or per
se,<BR>also requires specific attention: while the definition of a
parallel<BR>corpus is fairly straightforward, building a non-parallel
corpus<BR>requires control over the selection of source texts in both
languages.</DIV>
<DIV> </DIV>
<DIV>Parallel corpora are a key resource as training data for
statistical<BR>machine translation, and for building or extending bilingual
lexicons<BR>and terminologies. However, beyond a few language pairs such
as<BR>English- French or English-Chinese and a few contexts such
as<BR>parliamentary debates or legal texts, they remain a scarce
resource,<BR>despite the creation of automated methods to collect parallel
corpora<BR>from the Web. To exemplify such issues in a practical setting,
this<BR>year's special focus will be on</DIV>
<DIV> </DIV>
<DIV> Building Resources for Machine Translation
Research</DIV>
<DIV> </DIV>
<DIV>This special topic aims to address the need for:<BR>(1) Machine Translation
training and testing data such as spoken or<BR>written monolingual, comparable
or parallel data collections, and<BR>(2) methods and tools used for collecting,
annotating, and verifying<BR>MT data such as Web crawling, crowdsourcing, tools
for language<BR>experts and for finding MT data in comparable corpora.</DIV>
<DIV> </DIV>
<DIV><BR>TOPICS</DIV>
<DIV> </DIV>
<DIV>We solicit contributions including but not limited to the following
topics:</DIV>
<DIV> </DIV>
<DIV>Topics related to the special theme:<BR> * Methods and tools for
collecting and processing MT data,<BR>
including crowdsourcing<BR> * Methods and tools for quality
control<BR> * Tools for efficient annotation<BR> * Bilingual term
and named entity collections<BR> * Multilingual treebanks, wordnets,
propbanks, etc.<BR> * Comparable corpora with parallel units
annotated<BR> * Comparable corpora for under-resourced languages and
specific domains<BR> * Multilingual corpora with rich
annotations:<BR> POS tags, NEs,
dependencies, semantic roles, etc.<BR> * Data for special applications:
patent translation, movie<BR>
subtitles, MOOCs, meetings, chat-rooms, social media, etc.<BR> * Legal
issues with collecting and redistributing
data<BR> and generating
derivatives</DIV>
<DIV> </DIV>
<DIV>Building comparable corpora:<BR> * Human translations<BR> *
Automatic and semi-automatic methods<BR> * Methods to mine parallel and
non-parallel corpora from the Web<BR> * Tools and criteria to evaluate the
comparability of corpora<BR> * Parallel vs non-parallel corpora,
monolingual corpora<BR> * Rare and minority languages, across language
families<BR> * Multi-media/multi-modal comparable corpora</DIV>
<DIV> </DIV>
<DIV>Applications of comparable corpora:<BR> * Human
translations<BR> * Language learning<BR> * Cross-language
information retrieval & document categorization<BR> * Bilingual
projections<BR> * Machine translation<BR> * Writing assistance</DIV>
<DIV> </DIV>
<DIV>Mining from comparable corpora:<BR> * Extraction of parallel segments
or paraphrases from comparable corpora<BR> * Extraction of bilingual and
multilingual translations of single
words<BR> and multi-word expressions;
proper names, named entities, etc.</DIV>
<DIV> </DIV>
<DIV><BR>IMPORTANT DATES<BR> <BR> February 10, 2014
Deadline for submission of full papers<BR> March
10, 2014 Notification of
acceptance<BR> March 27, 2014
Camera-ready papers due<BR> May
27, 2014 Workshop date<BR> </DIV>
<DIV> </DIV>
<DIV>SUBMISSION INFORMATION</DIV>
<DIV> </DIV>
<DIV>Papers should follow the LREC main conference formatting details (to
be<BR>announced on the conference website <A
href="http://lrec2014.lrec-conf.org/en/">http://lrec2014.lrec-conf.org/en/</A>
)<BR>and should be submitted as a PDF-file via the START workshop manager
at<BR> <A
href="https://www.softconf.com/lrec2014/BUCC2014/">https://www.softconf.com/lrec2014/BUCC2014/</A></DIV>
<DIV> </DIV>
<DIV>Contributions can be short or long papers. Short paper submission
must<BR>describe original and unpublished work without exceeding six
(6)<BR>pages. Characteristics of short papers include: a small,
focused<BR>contribution; work in progress; a negative result; an opinion
piece;<BR>an interesting application nugget. Long paper submissions
must<BR>describe substantial, original, completed and unpublished work
without<BR>exceeding ten (10) pages.</DIV>
<DIV> </DIV>
<DIV>Reviewing will be double blind, so the papers should not reveal
the<BR>authors' identity. Accepted papers will be published in the
workshop<BR>proceedings.<BR> <BR>Double submission policy: Parallel
submission to other meetings or<BR>publications is possible but must be
immediately notified to the<BR>workshop organizers.<BR> <BR>When submitting
a paper from the START page, authors will be asked to<BR>provide essential
information about resources (in a broad sense,<BR>i.e. also technologies,
standards, evaluation kits, etc.) that have<BR>been used for the work described
in the paper or are a new result of<BR>your research. Moreover, ELRA
encourages all LREC authors to share<BR>the described LRs (data, tools,
services, etc.), to enable their<BR>reuse, replicability of experiments,
including evaluation ones, etc.</DIV>
<DIV> </DIV>
<DIV>For further information, please contact<BR> Pierre
Zweigenbaum pz (at) limsi (dot) fr</DIV>
<DIV> </DIV>
<DIV><BR>ORGANISERS<BR> <BR> Pierre Zweigenbaum, LIMSI, CNRS, Orsay
(France)<BR> Ahmet Aker, University of Sheffield (UK)<BR> Serge
Sharoff, University of Leeds (UK)<BR> Stephan Vogel, QCRI
(Qatar)<BR> Reinhard Rapp, Universities of Mainz (Germany) and
Aix-Marseille (France)</DIV>
<DIV> </DIV>
<DIV><BR>SCIENTIFIC COMMITTEE</DIV>
<DIV> </DIV>
<DIV> * Ahmet Aker, University of Sheffield (UK)<BR> * Srinivas
Bangalore (AT&T Labs, US)<BR> * Caroline Barrière (CRIM, Montréal,
Canada)<BR> * Chris Biemann (TU Darmstadt, Germany)<BR> * Hervé
Déjean (Xerox Research Centre Europe, Grenoble, France)<BR> * Kurt Eberle
(Lingenio, Heidelberg, Germany)<BR> * Andreas Eisele (European Commission,
Luxembourg)<BR> * Éric Gaussier (Université Joseph Fourier, Grenoble,
France)<BR> * Gregory Grefenstette (INRIA, Saclay, France)<BR> *
Silvia Hansen-Schirra (University of Mainz, Germany)<BR> * Hitoshi Isahara
(Toyohashi University of Technology)<BR> * Kyo Kageura (University of
Tokyo, Japan)<BR> * Adam Kilgarriff (Lexical Computing Ltd, UK)<BR>
* Natalie Kübler (Université Paris Diderot, France)<BR> * Philippe
Langlais (Université de Montréal, Canada)<BR> * Michael Mohler (Language
Computer Corp., US)<BR> * Emmanuel Morin (Université de Nantes,
France)<BR> * Dragos Stefan Munteanu (Language Weaver, Inc., US)<BR>
* Lene Offersgaard (University of Copenhagen, Denmark)<BR> * Ted Pedersen
(University of Minnesota, Duluth, US)<BR> * Reinhard Rapp (Université
Aix-Marseille, France)<BR> * Sujith Ravi (Google, US)<BR> * Serge
Sharoff (University of Leeds, UK)<BR> * Michel Simard (National Research
Council Canada)<BR> * Richard Sproat (OGI School of Science &
Technology, US)<BR> * Tim Van de Cruys (IRIT-CNRS, Toulouse,
France)<BR> * Stephan Vogel, QCRI (Qatar)<BR> * Guillaume Wisniewski
(Université Paris Sud & LIMSI-CNRS, Orsay, France)<BR> * Pierre
Zweigenbaum (LIMSI-CNRS, Orsay,
France)<BR></DIV></DIV></DIV></DIV></DIV></DIV></BODY></HTML>