23.1397, Confs: Text/Corpus Ling/Turkey

linguist at linguistlist.org linguist at linguistlist.org
Tue Mar 20 14:32:54 UTC 2012


LINGUIST List: Vol-23-1397. Tue Mar 20 2012. ISSN: 1069 - 4875.

Subject: 23.1397, Confs: Text/Corpus Ling/Turkey

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews: Veronika Drake, U of Wisconsin-Madison
Monica Macaulay, U of Wisconsin-Madison
Rajiv Rao, U of Wisconsin-Madison
Joseph Salmons, U of Wisconsin-Madison
Anja Wanner, U of Wisconsin-Madison
       <reviews at linguistlist.org>

Homepage: http://linguistlist.org

The LINGUIST List is a non-profit organization dedicated to providing the
discipline of linguistics with the infrastructure necessary to function in
the digital world. Donate to keep our services freely available!
https://linguistlist.org/donation/donate/donate1.cfm

Editor for this issue: Amy Brunett <brunett at linguistlist.org>
================================================================  

LINGUIST is pleased to announce an exciting service:  
Easy Abstracts! Easy Abs is a free abstract submission and review facility 
designed to help conference organizers and reviewers accept and process 
abstracts online.  Just go to: http://www.linguistlist.org/confcustom, and 
begin your conference customization process today! With Easy Abstracts, 
submission and review will be as easy as 1-2-3!


Date: Tue, 20 Mar 2012 10:32:21
From: Piotr Banski [banski at ids-mannheim.de]
Subject: Challenges in the Management of Large Corpora

E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=23-1397.html&submissionid=4542849&topicid=4&msgnumber=1
 
Challenges in the Management of Large Corpora 
Short Title: CMLC 

Date: 22-May-2012 - 22-May-2012 
Location: Istanbul, Turkey 
Contact: Piotr Banski 
Contact Email: banski at ids-mannheim.de 
Meeting URL: http://corpora.ids-mannheim.de/cmlc.html 

Linguistic Field(s): Text/Corpus Linguistics 

Meeting Description: 

We live in an age where the well-known maxim that 'the only thing better than data is more data' is something that no longer sets unattainable goals. Creating extremely large corpora is no longer a challenge, given the proven methods that lie behind e.g. applying the Web-as-Corpus approach or utilizing Google's n-gram collection. Indeed, the challenge is now shifted towards dealing with the large amounts of primary data and much larger amounts of annotation data. On the one hand, this challenge concerns finding new (corpus-) linguistic methodologies that can make use of such /extremely large corpora/, e.g. in order to investigate rare phenomena involving multiple lexical items or to find and represent fine-grained sub-regularities; on the other hand, some fundamental technical methods and strategies are being called into question. These include e.g. successful curation of the data, management of collections that span multiple volumes or that are distributed across several centres, methods to clean the data from non-linguistic intrusions or duplicates, as well as automatic annotation methods or innovative corpus architectures that maximise the usefulness of data or allow to search and to analyse it efficiently. Among the new tasks are also collaborative manual annotation and methods to manage it as well as new challenges to the statistical analysis of such data and metadata.

The half-day LREC-2012 workshop on 'Challenges in the Management of Large Corpora' aims at gathering the leading researchers in the field of Language Resource creation and Corpus Linguistics, in order to provide for an intensive exchange of expertise, results and ideas.

Venue:

The workshop will take place at the Conference venue, the Lütfi Kirdar Istanbul Exhibition and Congress Centre. Further details will be available in due time from conference homepage. 

Keynote Speaker:

Nancy Ide (Vassar College), title TBA

Accepted Submissions:

-The AAC Container. Managing Text Resources for Text Studies,
Hanno Biber and Evelyn Breiteneder

-Creating and Managing a large annotated parallel corpora of Indian languages,
  Ritesh Kumar, Pinkey Nainwani, Girish Nath Jha and Shiv Bhusan Kaushik

-Introducing the CLARIN-NL Data Curation Service,
  Nelleke Oostdijk and Henk van den Heuvel

-Efficient N-gram Language Modeling for Billion Word Web-Corpora,
  Lars Bungum and Björn Gambäck

-Evaluating DBMS-based access strategies to very large multi-layer corpora,
  Roman Schneider

-Dependency Bank,
  Hans Martin Lehmann and Gerold Schneider

-Large Mailing List Corpora: Management, Annotation and Repository,
  Damir Ćavar, Helen Aristar-Dry and Anthony Aristar

Important Dates:

Deadline for early-bird registration: March 21.
( http://www.lrec-conf.org/lrec2012/?-Registration- )

Workshop: May 22, 2 pm. - 6.30 pm.







------------------------------------------------------------------------------
This Year the LINGUIST List hopes to raise $67,000. This money will go to help 
keep the List running by supporting all of our Student Editors for the coming
year.

See below for donation instructions, and don't forget to check out Fund 
Drive 2012 site!

http://linguistlist.org/fund-drive/2012/

There are many ways to donate to LINGUIST!

You can donate right now using our secure credit card form at  
https://linguistlist.org/donation/donate/donate1.cfm

Alternatively you can also pledge right now and pay later. To do so, go to: 
https://linguistlist.org/donation/pledge/pledge1.cfm

For all information on donating and pledging, including information on how to 
donate by check, money order, or wire transfer, please visit: 
http://linguistlist.org/donation/

The LINGUIST List is under the umbrella of Eastern Michigan University and as 
such can receive donations through the EMU Foundation, which is a registered 
501(c) Non Profit organization. Our Federal Tax number is 38-6005986. These 
donations can be offset against your federal and sometimes your state tax
return (U.S. tax payers only). For more information visit the IRS Web-Site,
or contact your financial advisor.

Many companies also offer a gift matching program, such that they will match 
any gift you make to a non-profit organization. Normally this entails your 
contacting your human resources department and sending us a form that the 
EMU Foundation fills in and returns to your employer. This is generally a
simple administrative procedure that doubles the value of your gift to
LINGUIST, without costing you an extra penny. Please take a moment to check if
your company operates such a program.

Thank you very much for your support of LINGUIST!
 


----------------------------------------------------------
LINGUIST List: Vol-23-1397	
----------------------------------------------------------



More information about the LINGUIST mailing list