<HTML><HEAD></HEAD>
<BODY dir=ltr>
<DIV dir=ltr>
<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: 'Calibri'; COLOR: #000000">
<DIV
style='FONT-SIZE: small; TEXT-DECORATION: none; FONT-FAMILY: "Calibri"; FONT-WEIGHT: normal; COLOR: #000000; FONT-STYLE: normal; DISPLAY: inline'>
<DIV dir=ltr>
<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: 'Calibri'; COLOR: #000000">
<DIV>
<DIV
style='FONT-SIZE: small; TEXT-DECORATION: none; FONT-FAMILY: "Calibri"; FONT-WEIGHT: normal; COLOR: #000000; FONT-STYLE: normal; DISPLAY: inline'>
<DIV
style='FONT-SIZE: small; TEXT-DECORATION: none; FONT-FAMILY: "Calibri"; FONT-WEIGHT: normal; COLOR: #000000; FONT-STYLE: normal; DISPLAY: inline'>
<DIV
style='FONT-SIZE: small; TEXT-DECORATION: none; FONT-FAMILY: "Calibri"; FONT-WEIGHT: normal; COLOR: #000000; FONT-STYLE: normal; DISPLAY: inline'>We
apologize for multiple postings<BR>Please distribute to interested
colleagues</DIV>
<DIV dir=ltr>
<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: 'Calibri'; COLOR: #000000">
<DIV dir=ltr>
<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: 'Calibri'; COLOR: #000000">
<DIV dir=ltr>
<DIV style="FONT-SIZE: 12pt; FONT-FAMILY: 'Calibri'; COLOR: #000000">
<DIV> </DIV>
<DIV>============================================================</DIV>
<DIV>
DEADLINE EXTENSION AND JOURNAL SPECIAL ISSUE
<DIV>============================================================</DIV></DIV>
<DIV> </DIV>
<DIV> 7th WORKSHOP ON BUILDING AND USING COMPARABLE CORPORA</DIV>
<DIV> </DIV>
<DIV> Building Resources for Machine Translation Research</DIV>
<DIV> </DIV>
<DIV> <A
href="http://comparable.limsi.fr/bucc2014/">http://comparable.limsi.fr/bucc2014/</A></DIV>
<DIV> </DIV>
<DIV> May 27, 2014<BR> Co-located with LREC 2014<BR> Harpa
Conference Centre, Reykjavik (Iceland)</DIV>
<DIV> </DIV>
<DIV> EXTENDED DEADLINE FOR PAPERS: February 23, 2014<BR> <A
href="https://www.softconf.com/lrec2014/BUCC2014/">https://www.softconf.com/lrec2014/BUCC2014/</A></DIV>
<DIV> </DIV>
<DIV><BR> *** INVITED SPEAKER ***</DIV>
<DIV> </DIV>
<DIV> Chris Callison-Burch (University of Pennsylvania)</DIV>
<DIV> </DIV>
<DIV>============================================================</DIV>
<DIV> </DIV>
<DIV>MOTIVATION</DIV>
<DIV> </DIV>
<DIV>In the language engineering and the linguistics communities, research<BR>in
comparable corpora has been motivated by two main reasons. In<BR>language
engineering, on the one hand, it is chiefly motivated by the<BR>need to use
comparable corpora as training data for statistical<BR>Natural Language
Processing applications such as statistical machine<BR>translation or
cross-lingual retrieval. In linguistics, on the other<BR>hand, comparable
corpora are of interest in themselves by making<BR>possible inter-linguistic
discoveries and comparisons. It is generally<BR>accepted in both communities
that comparable corpora are documents in<BR>one or several languages that are
comparable in content and form in<BR>various degrees and dimensions. We believe
that the linguistic<BR>definitions and observations related to comparable
corpora can improve<BR>methods to mine such corpora for applications of
statistical NLP. As<BR>such, it is of great interest to bring together builders
and users of<BR>such corpora.</DIV>
<DIV> </DIV>
<DIV>The scarcity of parallel corpora has motivated research concerning<BR>the
use of comparable corpora: pairs of monolingual corpora selected<BR>according to
the same set of criteria, but in different languages<BR>or language varieties.
Non-parallel yet comparable corpora overcome<BR>the two limitations of parallel
corpora, since sources for original,<BR>monolingual texts are much more abundant
than translated texts.<BR>However, because of their nature, mining translations
in comparable<BR>corpora is much more challenging than in parallel corpora.
What<BR>constitutes a good comparable corpus, for a given task or per
se,<BR>also requires specific attention: while the definition of a
parallel<BR>corpus is fairly straightforward, building a non-parallel
corpus<BR>requires control over the selection of source texts in both
languages.</DIV>
<DIV> </DIV>
<DIV>Parallel corpora are a key resource as training data for
statistical<BR>machine translation, and for building or extending bilingual
lexicons<BR>and terminologies. However, beyond a few language pairs such
as<BR>English- French or English-Chinese and a few contexts such
as<BR>parliamentary debates or legal texts, they remain a scarce
resource,<BR>despite the creation of automated methods to collect parallel
corpora<BR>from the Web. To exemplify such issues in a practical setting,
this<BR>year's special focus will be on</DIV>
<DIV> </DIV>
<DIV> Building Resources for Machine Translation
Research</DIV>
<DIV> </DIV>
<DIV>This special topic aims to address the need for:<BR>(1) Machine Translation
training and testing data such as spoken or<BR>written monolingual, comparable
or parallel data collections, and<BR>(2) methods and tools used for collecting,
annotating, and verifying<BR>MT data such as Web crawling, crowdsourcing, tools
for language<BR>experts and for finding MT data in comparable corpora.</DIV>
<DIV> </DIV>
<DIV><BR>TOPICS</DIV>
<DIV> </DIV>
<DIV>We solicit contributions including but not limited to the following
topics:</DIV>
<DIV> </DIV>
<DIV>Topics related to the special theme:<BR> * Methods and tools for
collecting and processing MT data,<BR>
including crowdsourcing<BR> * Methods and tools for quality
control<BR> * Tools for efficient annotation<BR> * Bilingual term
and named entity collections<BR> * Multilingual treebanks, wordnets,
propbanks, etc.<BR> * Comparable corpora with parallel units
annotated<BR> * Comparable corpora for under-resourced languages and
specific domains<BR> * Multilingual corpora with rich
annotations:<BR> POS tags, NEs,
dependencies, semantic roles, etc.<BR> * Data for special applications:
patent translation, movie<BR>
subtitles, MOOCs, meetings, chat-rooms, social media, etc.<BR> * Legal
issues with collecting and redistributing
data<BR> and generating
derivatives</DIV>
<DIV> </DIV>
<DIV>Building comparable corpora:<BR> * Human translations<BR> *
Automatic and semi-automatic methods<BR> * Methods to mine parallel and
non-parallel corpora from the Web<BR> * Tools and criteria to evaluate the
comparability of corpora<BR> * Parallel vs non-parallel corpora,
monolingual corpora<BR> * Rare and minority languages, across language
families<BR> * Multi-media/multi-modal comparable corpora</DIV>
<DIV> </DIV>
<DIV>Applications of comparable corpora:<BR> * Human
translations<BR> * Language learning<BR> * Cross-language
information retrieval & document categorization<BR> * Bilingual
projections<BR> * Machine translation<BR> * Writing assistance</DIV>
<DIV> </DIV>
<DIV>Mining from comparable corpora:<BR> * Extraction of parallel segments
or paraphrases from comparable corpora<BR> * Extraction of bilingual and
multilingual translations of single
words<BR> and multi-word expressions;
proper names, named entities, etc.</DIV>
<DIV> </DIV>
<DIV><BR>IMPORTANT DATES</DIV>
<DIV> </DIV>
<DIV> February 23, 2014 Deadline for submission of
papers (extended)<BR> March 10,
2014 Notification of
acceptance<BR> March 27, 2014
Camera-ready papers due<BR> May
27, 2014 Workshop date</DIV>
<DIV> </DIV>
<DIV><BR>SUBMISSION INFORMATION</DIV>
<DIV> </DIV>
<DIV>Papers should follow the LREC main conference formatting details (to
be<BR>announced on the conference website <A
href="http://lrec2014.lrec-conf.org/en/">http://lrec2014.lrec-conf.org/en/</A>
)<BR>and should be submitted as a PDF-file via the START workshop manager
at<BR> <A
href="https://www.softconf.com/lrec2014/BUCC2014/">https://www.softconf.com/lrec2014/BUCC2014/</A></DIV>
<DIV> </DIV>
<DIV>Contributions can be short or long papers. Short paper submission
must<BR>describe original and unpublished work without exceeding six
(6)<BR>pages. Characteristics of short papers include: a small,
focused<BR>contribution; work in progress; a negative result; an opinion
piece;<BR>an interesting application nugget. Long paper submissions
must<BR>describe substantial, original, completed and unpublished work
without<BR>exceeding ten (10) pages.</DIV>
<DIV> </DIV>
<DIV>Reviewing will be double blind, so the papers should not reveal
the<BR>authors' identity. Accepted papers will be published in the
workshop<BR>proceedings.</DIV>
<DIV> </DIV>
<DIV>Double submission policy: Parallel submission to other meetings
or<BR>publications is possible but must be immediately notified to
the<BR>workshop organizers.</DIV>
<DIV> </DIV>
<DIV>When submitting a paper from the START page, authors will be asked
to<BR>provide essential information about resources (in a broad sense,<BR>i.e.
also technologies, standards, evaluation kits, etc.) that have<BR>been used for
the work described in the paper or are a new result of<BR>your research.
Moreover, ELRA encourages all LREC authors to share<BR>the described LRs (data,
tools, services, etc.), to enable their<BR>reuse, replicability of experiments,
including evaluation ones, etc.</DIV>
<DIV> </DIV>
<DIV><BR>JOURNAL SPECIAL ISSUE</DIV>
<DIV> </DIV>
<DIV>Authors of selected papers will be encouraged to submit
substantially<BR>extended versions of their manuscripts to an upcoming special
issue<BR>on ‘Machine Translation Using Comparable Corpora’ of the Journal<BR>of
Natural Language Engineering.</DIV>
<DIV> </DIV>
<DIV><BR>ORGANISERS</DIV>
<DIV> </DIV>
<DIV> Pierre Zweigenbaum, LIMSI, CNRS, Orsay (France)<BR> Ahmet
Aker, University of Sheffield (UK)<BR> Serge Sharoff, University of Leeds
(UK)<BR> Stephan Vogel, QCRI (Qatar)<BR> Reinhard Rapp, Universities
of Mainz (Germany) and Aix-Marseille (France)</DIV>
<DIV> </DIV>
<DIV><BR>CONTACT</DIV>
<DIV> </DIV>
<DIV> Pierre Zweigenbaum: pz (at) limsi (dot) fr</DIV>
<DIV> </DIV>
<DIV><BR>SCIENTIFIC COMMITTEE</DIV>
<DIV> </DIV>
<DIV> * Ahmet Aker, University of Sheffield (UK)<BR> * Srinivas
Bangalore (AT&T Labs, US)<BR> * Caroline Barrière (CRIM, Montréal,
Canada)<BR> * Chris Biemann (TU Darmstadt, Germany)<BR> * Hervé
Déjean (Xerox Research Centre Europe, Grenoble, France)<BR> * Kurt Eberle
(Lingenio, Heidelberg, Germany)<BR> * Andreas Eisele (European Commission,
Luxembourg)<BR> * Éric Gaussier (Université Joseph Fourier, Grenoble,
France)<BR> * Gregory Grefenstette (INRIA, Saclay, France)<BR> *
Silvia Hansen-Schirra (University of Mainz, Germany)<BR> * Hitoshi Isahara
(Toyohashi University of Technology)<BR> * Kyo Kageura (University of
Tokyo, Japan)<BR> * Adam Kilgarriff (Lexical Computing Ltd, UK)<BR>
* Natalie Kübler (Université Paris Diderot, France)<BR> * Philippe
Langlais (Université de Montréal, Canada)<BR> * Michael Mohler (Language
Computer Corp., US)<BR> * Emmanuel Morin (Université de Nantes,
France)<BR> * Dragos Stefan Munteanu (Language Weaver, Inc., US)<BR>
* Lene Offersgaard (University of Copenhagen, Denmark)<BR> * Ted Pedersen
(University of Minnesota, Duluth, US)<BR> * Reinhard Rapp (Université
Aix-Marseille, France)<BR> * Sujith Ravi (Google, Mountain View,
US)<BR> * Serge Sharoff (University of Leeds, UK)<BR> * Michel
Simard (National Research Council Canada)<BR> * Richard Sproat (OGI School
of Science & Technology, US)<BR> * Tim Van de Cruys (IRIT-CNRS,
Toulouse, France)<BR> * Stephan Vogel (QCRI, Qatar)<BR> * Guillaume
Wisniewski (Université Paris Sud & LIMSI-CNRS, Orsay, France)<BR> *
Pierre Zweigenbaum (LIMSI-CNRS, Orsay, France)</DIV>
<DIV> </DIV>
<DIV> </DIV></DIV></DIV></DIV></DIV></DIV></DIV></DIV></DIV></DIV></DIV></DIV></DIV></DIV></DIV></BODY></HTML>