<html>

  <head>


    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    [Apologies for multiple postings]<small><small><span class="c10"><br>

          <br>

          Call for Papers: LREC 2012 Workshop</span></small></small> <br>

    <p>C<span class="c7 c10">hallenges in the management of large

        corpora</span></p>

    <h1 class="c6 c0"> </h1>

    <p class="c1 c0"><span></span></p>

    <p class="c2 c0"><span>We live in an age where the well-known maxim

        that “the only thing better than data is more data” is something

        that no longer sets unattainable goals. Creating extremely large

        corpora is no longer a challenge, given the proven methods that

        lie behind e.g. applying the Web-as-Corpus approach or utilizing

        Google's n-gram collection. Indeed, the challenge is now shifted

        towards dealing with the large amounts of primary data and much

        larger amounts of annotation data. On the one hand, this

        challenge concerns finding new (corpus-) linguistic

        methodologies that can make use of such </span><span class="c7">extremely

        large corpora</span><span> e.g. in order to investigate rare

        phenomena involving multiple lexical items or to find and

        represent fine-grained sub-regularities; on the other hand, some

        fundamental technical methods and strategies are being called

        into question. These include e.g. successful curation of the

        data, management of collections that span multiple volumes or

        that are distributed across several centres, methods to clean

        the data from non-linguistic intrusions or duplicates, as well

        as automatic annotation methods or innovative corpus

        architectures that maximise the usefulness of data or allow to

        search and to analyze it efficiently. Among the new tasks are

        also collaborative manual annotation and methods to manage it as

        well as new challenges to the statistical analysis of such data

        and metadata.</span></p>

    <p class="c1 c0"><span></span></p>

    <p class="c2 c0"><span>The half-day workshop on “Challenges in the

        management of large corpora” aims at gathering the leading

        researchers in the field of Language Resource creation and

        Corpus Linguistics, in order to provide for an intensive

        exchange of expertise, results and ideas.</span></p>

    <p class="c1 c0"><span></span></p>

    <p class="c0"><span>We invite submissions dealing with:</span></p>

    <ol class="c5">

      <li class="c3 c0"><span>building tools for all aspects of

          management of very large corpora,</span></li>

      <li class="c3 c0"><span>dealing with large data sets (file system

          architecture, database architecture),         </span></li>

      <li class="c3 c0"><span>dealing with heavily annotated corpora,</span></li>

      <li class="c3 c0"><span>managing multiple and concurrent

          annotation layers,</span></li>

      <li class="c3 c0"><span>use of annotation standards for large data

          sets,</span></li>

      <li class="c3 c0"><span>issues of interoperability and

          tool-chaining</span><span>,</span></li>

      <li class="c3 c0"><span>crowd sourcing for large data sets,</span></li>

      <li class="c0 c3"><span>quality control of annotations in large

          data sets,</span></li>

      <li class="c3 c0"><span>analytic tools used in research

          infrastructure initiatives, such as, e.g., the Common Language

          Resource and Technology Infrastructure (CLARIN),</span></li>

      <li class="c3 c0"><span>dealing with corpora physically

          distributed over different </span><span>locations,</span></li>

      <li class="c3 c0"><span>managing metadata for extremely large

          corpus collections,</span></li>

      <li class="c3 c0"><span>efficient user interfaces,</span></li>

      <li class="c3 c0"><span>effective querying of large corpora with

          multiple annotation layers</span><span>,</span></li>

      <li class="c3 c0"><span>“bringing the code to the data” as the

          strategy for dealing with IPR restrictions,</span></li>

      <li class="c3 c0"><span>open-source software and open-data corpora

          strategies,</span></li>

      <li class="c3 c0"><span>other issues that arise in the context of

          management of large datasets.</span></li>

    </ol>

    <p class="c0 c1"><span></span></p>

    <p class="c0"><span>Current information is available at: </span><span

        class="c4"><a class="c8"

          href="http://corpora.ids-mannheim.de/cmlc.html">http://corpora.ids-mannheim.de/cmlc.html</a></span><span> </span></p>

    <h2 class="c0"><span>Abstract submission</span></h2>

    <p class="c0 c2"><span>We invite extended abstracts (</span><span>1500

        to 2000 words</span><span>) for 20+10 minute presentations, as

        well as posters and demos. All abstracts have to be submitted

        via the START Conference Manager, </span><span>available from </span><span

        class="c4"><a class="c8"

          href="https://www.softconf.com/lrec2012/LargeCorpora2012/">https://www.softconf.com/lrec2012/LargeCorpora2012/</a></span><span> </span><span>.</span></p>

    <p class="c1 c0"><span></span></p>

    <p class="c2 c0"><span>Please note: when submitting a contribution

        to the START, authors will be asked to provide essential

        information about resources (in a broad sense, i.e. also

        technologies, standards, evaluation kits, etc.) that have been

        used for the work described in the contribution or are a new

        result of their research. For further information on this new

        initiative, please refer to</span><span><a class="c8"

          href="http://www.lrec-conf.org/lrec2012/?LRE-Map-2012"> </a></span><span

        class="c4"><a class="c8"

          href="http://www.lrec-conf.org/lrec2012/?LRE-Map-2012">http://www.lrec-conf.org/lrec2012/?LRE-Map-2012</a></span></p>

    <h2 class="c0"><span>Important dates</span></h2>

    <p class="c0"><span>Workshop</span><span>: 22 May 2012, afternoon

        session.</span></p>

    <p class="c0"><span>Deadline for submission of extended abstracts:

        February 15.</span></p>

    <p class="c0"><span>Notification of acceptance: February 29.</span></p>

    <p class="c0"><span>Submission of full, camera-ready papers: March

        23.</span></p>

    <h2 class="c0"><span>Venue</span></h2>

    <p class="c2 c0"><span>The workshop will take place at the

        Conference venue, the Lütfi Kirdar Istanbul Exhibition and

        Congress Centre. Further details will be available in due time

        from conference homepage.</span></p>

    <h2 class="c0"><span>Organizing Committee</span></h2>

    <p class="c0"><span>The workshop is co-organized by the following

        three institutions:</span></p>

    <h4 class="c0"><span>Institut für Deutsche Sprache, Mannheim</span></h4>

    <p class="c0"><span>Piotr Bański, Marc Kupietz, Andreas Witt</span></p>

    <h4 class="c0"><span>Institute for Language Information and

        Technology, Eastern Michigan University</span></h4>

    <p class="c0"><span>Helen Aristar-Dry, Anthony Aristar, Damir Ćavar</span></p>

    <h4 class="c0"><span>ICAR Laboratory, Lyon University</span></h4>

    <p class="c0"><span>Serge Heiden</span></p>

    <h2 class="c0"><span>Programme </span><span>committee</span><span>:</span></h2>

    <p class="c0"><span>Núria Bel (</span><span>Universitat Pompeu

        Fabra)</span></p>

    <p class="c0"><span>Mark Davies (Brigham Young University)</span></p>

    <p class="c0"><span>Stefanie Dipper (Ruhr-Universität Bochum)</span></p>

    <p class="c0"><span>Tomaž Erjavec (</span><span>Jožef Stefan

        Institute</span><span>)</span></p>

    <p class="c0"><span>Stefan Evert (Technische Universität Darmstadt)</span></p>

    <p class="c0"><span>Alexander Geyken (Berlin-Brandenburgische

        Akademie der Wissenschaften)</span></p>

    <p class="c0"><span>Andrew Hardie (University of Lancaster)</span></p>

    <p class="c0"><span>Nancy Ide (Vassar College)</span></p>

    <p class="c0"><span>Sandra Kübler (Indiana University)</span></p>

    <p class="c0"><span>Martin Mueller (Northwestern University)</span></p>

    <p class="c0"><span>Mark Olsen (University of Chicago)</span></p>

    <p class="c0"><span>Adam Przepiórkowski (Polish Academy of Sciences,

        University of Warsaw)</span></p>

    <p class="c0"><span>Reinhard Rapp (Johannes Gutenberg-Universität

        Mainz, University of Leeds)</span></p>

    <p class="c0"><span>Laurent Romary (INRIA, Humboldt-Universität zu

        Berlin)</span></p>

    <p class="c0"><span>Serge Sharoff (University of Leeds)</span></p>

    <p class="c0"><span>Pavel Straňák (Charles University in Prague)</span></p>

    <p class="c0"><span>Amir Zeldes (Humboldt-Universität zu Berlin)</span></p>

    <p class="c1 c0"><span></span></p>

    <p class="c0"><span class="c11">Workshop homepage</span><span>: </span><span

        class="c4"><a class="c8"

          href="http://corpora.ids-mannheim.de/cmlc.html">http://corpora.ids-mannheim.de/cmlc.html</a></span><span> </span></p>

    <div>

      <p class="c1 c0 c9"><span></span></p>

    </div>

    <br>

  </body>

</html>