[Corpora-List] Call for Participation: Challenges in the Management of Large Corpora (@LREC-2012)

Piotr Bański bansp at o2.pl
Mon Mar 19 15:31:11 UTC 2012


[Apologies for multiple postings]

Call for Participation: LREC 2012 Workshop

CHALLENGES IN THE MANAGEMENT OF LARGE CORPORA
---------------------------------------------

Please note: March 21 is the early-bird registration deadline


We live in an age where the well-known maxim that “the only thing better
than data is more data” is something that no longer sets unattainable
goals. Creating extremely large corpora is no longer a challenge, given
the proven methods that lie behind e.g. applying the Web-as-Corpus
approach or utilizing Google's n-gram collection. Indeed, the challenge
is now shifted towards dealing with the large amounts of primary data
and much larger amounts of annotation data. On the one hand, this
challenge concerns finding new (corpus-) linguistic methodologies that
can make use of such /extremely large corpora/, e.g. in order to
investigate rare phenomena involving multiple lexical items or to find
and represent fine-grained sub-regularities; on the other hand, some
fundamental technical methods and strategies are being called into
question. These include e.g. successful curation of the data, management
of collections that span multiple volumes or that are distributed across
several centres, methods to clean the data from non-linguistic
intrusions or duplicates, as well as automatic annotation methods or
innovative corpus architectures that maximise the usefulness of data or
allow to search and to analyse it efficiently. Among the new tasks are
also collaborative manual annotation and methods to manage it as well as
new challenges to the statistical analysis of such data and metadata.

The half-day workshop on “Challenges in the management of large corpora”
aims at gathering the leading researchers in the field of Language
Resource creation and Corpus Linguistics, in order to provide for an
intensive exchange of expertise, results and ideas.


Current information is available at:
http://corpora.ids-mannheim.de/cmlc.html


Keynote Speaker
---------------

Nancy Ide (Vassar College), title TBA


Accepted submissions
--------------------

* The AAC Container. Managing Text Resources for Text Studies,
  Hanno Biber and Evelyn Breiteneder

* Creating and Managing a large annotated parallel corpora of Indian
languages,
  Ritesh Kumar, Pinkey Nainwani, Girish Nath Jha and Shiv Bhusan Kaushik

* Introducing the CLARIN-NL Data Curation Service,
  Nelleke Oostdijk and Henk van den Heuvel

* Efficient N-gram Language Modeling for Billion Word Web-Corpora,
  Lars Bungum and Björn Gambäck

* Evaluating DBMS-based access strategies to very large multi-layer corpora,
  Roman Schneider

* Dependency Bank,
  Hans Martin Lehmann and Gerold Schneider

* Large Mailing List Corpora: Management, Annotation and Repository,
  Damir Ćavar, Helen Aristar-Dry and Anthony Aristar


Important dates
---------------

* Deadline for early-bird registration: March 21.
( http://www.lrec-conf.org/lrec2012/?-Registration- )

* Workshop: May 22, 2 pm. - 6.30 pm.


Venue
-----

The workshop will take place at the Conference venue, the Lütfi Kirdar
Istanbul Exhibition and Congress Centre. Further details will be
available in due time from the LREC homepage.


Organizing Committee
--------------------

The workshop is co-organized by the following three institutions:

* Institut für Deutsche Sprache, Mannheim *

	Piotr Bański, Marc Kupietz, Andreas Witt


* Institute for Language Information and Technology, Eastern Michigan
University *

	Helen Aristar-Dry, Anthony Aristar, Damir Ćavar


* ICAR Laboratory, Lyon University *

	Serge Heiden


Programme Committee
-------------------

Núria Bel (Universitat Pompeu Fabra)
Mark Davies (Brigham Young University)
Stefanie Dipper (Ruhr-Universität Bochum)
Tomaž Erjavec (Jožef Stefan Institute)
Stefan Evert (Technische Universität Darmstadt)
Alexander Geyken (Berlin-Brandenburgische Akademie der Wissenschaften)
Andrew Hardie (University of Lancaster)
Nancy Ide (Vassar College)
Sandra Kübler (Indiana University)
Martin Mueller (Northwestern University)
Mark Olsen (University of Chicago)
Adam Przepiórkowski (Polish Academy of Sciences, University of Warsaw)
Reinhard Rapp (Johannes Gutenberg-Universität Mainz, University of Leeds)
Laurent Romary (INRIA, Humboldt-Universität zu Berlin)
Pavel Straňák (Charles University in Prague)
Amir Zeldes (Humboldt-Universität zu Berlin)


=> Workshop homepage: http://corpora.ids-mannheim.de/cmlc.html

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list