<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body bgcolor="#FFFFFF" text="#000000">
[Apologies for multiple postings]<small><small><span class="c10"><br>
<br>
Call for Papers: LREC 2012 Workshop</span></small></small> <br>
<p>C<span class="c7 c10">hallenges in the management of large
corpora</span></p>
<h1 class="c6 c0"> </h1>
<p class="c1 c0"><span></span></p>
<p class="c2 c0"><span>We live in an age where the well-known maxim
that “the only thing better than data is more data” is something
that no longer sets unattainable goals. Creating extremely large
corpora is no longer a challenge, given the proven methods that
lie behind e.g. applying the Web-as-Corpus approach or utilizing
Google's n-gram collection. Indeed, the challenge is now shifted
towards dealing with the large amounts of primary data and much
larger amounts of annotation data. On the one hand, this
challenge concerns finding new (corpus-) linguistic
methodologies that can make use of such </span><span class="c7">extremely
large corpora</span><span> e.g. in order to investigate rare
phenomena involving multiple lexical items or to find and
represent fine-grained sub-regularities; on the other hand, some
fundamental technical methods and strategies are being called
into question. These include e.g. successful curation of the
data, management of collections that span multiple volumes or
that are distributed across several centres, methods to clean
the data from non-linguistic intrusions or duplicates, as well
as automatic annotation methods or innovative corpus
architectures that maximise the usefulness of data or allow to
search and to analyze it efficiently. Among the new tasks are
also collaborative manual annotation and methods to manage it as
well as new challenges to the statistical analysis of such data
and metadata.</span></p>
<p class="c1 c0"><span></span></p>
<p class="c2 c0"><span>The half-day workshop on “Challenges in the
management of large corpora” aims at gathering the leading
researchers in the field of Language Resource creation and
Corpus Linguistics, in order to provide for an intensive
exchange of expertise, results and ideas.</span></p>
<p class="c1 c0"><span></span></p>
<p class="c0"><span>We invite submissions dealing with:</span></p>
<ol class="c5">
<li class="c3 c0"><span>building tools for all aspects of
management of very large corpora,</span></li>
<li class="c3 c0"><span>dealing with large data sets (file system
architecture, database architecture), </span></li>
<li class="c3 c0"><span>dealing with heavily annotated corpora,</span></li>
<li class="c3 c0"><span>managing multiple and concurrent
annotation layers,</span></li>
<li class="c3 c0"><span>use of annotation standards for large data
sets,</span></li>
<li class="c3 c0"><span>issues of interoperability and
tool-chaining</span><span>,</span></li>
<li class="c3 c0"><span>crowd sourcing for large data sets,</span></li>
<li class="c0 c3"><span>quality control of annotations in large
data sets,</span></li>
<li class="c3 c0"><span>analytic tools used in research
infrastructure initiatives, such as, e.g., the Common Language
Resource and Technology Infrastructure (CLARIN),</span></li>
<li class="c3 c0"><span>dealing with corpora physically
distributed over different </span><span>locations,</span></li>
<li class="c3 c0"><span>managing metadata for extremely large
corpus collections,</span></li>
<li class="c3 c0"><span>efficient user interfaces,</span></li>
<li class="c3 c0"><span>effective querying of large corpora with
multiple annotation layers</span><span>,</span></li>
<li class="c3 c0"><span>“bringing the code to the data” as the
strategy for dealing with IPR restrictions,</span></li>
<li class="c3 c0"><span>open-source software and open-data corpora
strategies,</span></li>
<li class="c3 c0"><span>other issues that arise in the context of
management of large datasets.</span></li>
</ol>
<p class="c0 c1"><span></span></p>
<p class="c0"><span>Current information is available at: </span><span
class="c4"><a class="c8"
href="http://corpora.ids-mannheim.de/cmlc.html">http://corpora.ids-mannheim.de/cmlc.html</a></span><span> </span></p>
<h2 class="c0"><span>Abstract submission</span></h2>
<p class="c0 c2"><span>We invite extended abstracts (</span><span>1500
to 2000 words</span><span>) for 20+10 minute presentations, as
well as posters and demos. All abstracts have to be submitted
via the START Conference Manager, </span><span>available from </span><span
class="c4"><a class="c8"
href="https://www.softconf.com/lrec2012/LargeCorpora2012/">https://www.softconf.com/lrec2012/LargeCorpora2012/</a></span><span> </span><span>.</span></p>
<p class="c1 c0"><span></span></p>
<p class="c2 c0"><span>Please note: when submitting a contribution
to the START, authors will be asked to provide essential
information about resources (in a broad sense, i.e. also
technologies, standards, evaluation kits, etc.) that have been
used for the work described in the contribution or are a new
result of their research. For further information on this new
initiative, please refer to</span><span><a class="c8"
href="http://www.lrec-conf.org/lrec2012/?LRE-Map-2012"> </a></span><span
class="c4"><a class="c8"
href="http://www.lrec-conf.org/lrec2012/?LRE-Map-2012">http://www.lrec-conf.org/lrec2012/?LRE-Map-2012</a></span></p>
<h2 class="c0"><span>Important dates</span></h2>
<p class="c0"><span>Workshop</span><span>: 22 May 2012, afternoon
session.</span></p>
<p class="c0"><span>Deadline for submission of extended abstracts:
February 15.</span></p>
<p class="c0"><span>Notification of acceptance: February 29.</span></p>
<p class="c0"><span>Submission of full, camera-ready papers: March
23.</span></p>
<h2 class="c0"><span>Venue</span></h2>
<p class="c2 c0"><span>The workshop will take place at the
Conference venue, the Lütfi Kirdar Istanbul Exhibition and
Congress Centre. Further details will be available in due time
from conference homepage.</span></p>
<h2 class="c0"><span>Organizing Committee</span></h2>
<p class="c0"><span>The workshop is co-organized by the following
three institutions:</span></p>
<h4 class="c0"><span>Institut für Deutsche Sprache, Mannheim</span></h4>
<p class="c0"><span>Piotr Bański, Marc Kupietz, Andreas Witt</span></p>
<h4 class="c0"><span>Institute for Language Information and
Technology, Eastern Michigan University</span></h4>
<p class="c0"><span>Helen Aristar-Dry, Anthony Aristar, Damir Ćavar</span></p>
<h4 class="c0"><span>ICAR Laboratory, Lyon University</span></h4>
<p class="c0"><span>Serge Heiden</span></p>
<h2 class="c0"><span>Programme </span><span>committee</span><span>:</span></h2>
<p class="c0"><span>Núria Bel (</span><span>Universitat Pompeu
Fabra)</span></p>
<p class="c0"><span>Mark Davies (Brigham Young University)</span></p>
<p class="c0"><span>Stefanie Dipper (Ruhr-Universität Bochum)</span></p>
<p class="c0"><span>Tomaž Erjavec (</span><span>Jožef Stefan
Institute</span><span>)</span></p>
<p class="c0"><span>Stefan Evert (Technische Universität Darmstadt)</span></p>
<p class="c0"><span>Alexander Geyken (Berlin-Brandenburgische
Akademie der Wissenschaften)</span></p>
<p class="c0"><span>Andrew Hardie (University of Lancaster)</span></p>
<p class="c0"><span>Nancy Ide (Vassar College)</span></p>
<p class="c0"><span>Sandra Kübler (Indiana University)</span></p>
<p class="c0"><span>Martin Mueller (Northwestern University)</span></p>
<p class="c0"><span>Mark Olsen (University of Chicago)</span></p>
<p class="c0"><span>Adam Przepiórkowski (Polish Academy of Sciences,
University of Warsaw)</span></p>
<p class="c0"><span>Reinhard Rapp (Johannes Gutenberg-Universität
Mainz, University of Leeds)</span></p>
<p class="c0"><span>Laurent Romary (INRIA, Humboldt-Universität zu
Berlin)</span></p>
<p class="c0"><span>Serge Sharoff (University of Leeds)</span></p>
<p class="c0"><span>Pavel Straňák (Charles University in Prague)</span></p>
<p class="c0"><span>Amir Zeldes (Humboldt-Universität zu Berlin)</span></p>
<p class="c1 c0"><span></span></p>
<p class="c0"><span class="c11">Workshop homepage</span><span>: </span><span
class="c4"><a class="c8"
href="http://corpora.ids-mannheim.de/cmlc.html">http://corpora.ids-mannheim.de/cmlc.html</a></span><span> </span></p>
<div>
<p class="c1 c0 c9"><span></span></p>
</div>
<br>
</body>
</html>