Dear all,<br>

<br>

on behalf of the CREDISLAS organising committee, please find below details<br>

of the LREC workshop that may be of interest to the list.<br>

<br>

Regards,<br>

Sara<br><br><br>==============================<div class="gmail_quote">========================================================<br>

<br>

Workshop on<br>

<br>

CREATING CROSS-LANGUAGE RESOURCES FOR DISCONNECTED LANGUAGES AND STYLES<br>

<br>

Co-located with LREC 2012 (<a href="http://www.lrec-conf.org/lrec2012/" target="_blank">http://www.lrec-conf.org/lrec2012/</a>)<br>

Istanbul, Turkey<br>

May 27, 2012 (afternoon session)<br>

<br>

Deadline for paper submissions: February 26, 2012<br>

<br>

<a href="http://www-lium.univ-lemans.fr/credislas2012" target="_blank">http://www-lium.univ-lemans.fr/credislas2012</a><br>

<br>

======================================================================================<br>

<br>

This half-day workshop aims at developing strategies and sharing 

experiences on creating resources for reducing the linguistic gap 

between those language pairs for which cross-language resources are 

scarce. Although this specific situation has been most commonly 

addressed for the case of minority languages that have scarce resources 

by themselves, it also happens to be an important issue in some other 

situations such as: majority languages that, because of their cultural, 

historical and/or geographical disconnection, do not count with a 

significant amount of cross-language resources between them (as Chinese 

and Spanish, just to mention an excellent example in this category); or,

 single languages for which new communication trends and styles do not 

have available cross-language resources between the main formal language

 and it (as chat speak style communications and formal languages).<br>


<br>

Current computational and data storage capabilities have favoured the 

proliferation of data-driven and statistical approaches in natural 

language processing and computational linguistics. Empirical evidence 

has demonstrated in a large number of cases and applications how the 

availability of appropriate datasets can boost the performance of 

processing methods and analysis techniques. In this scenario, the 

availability of data has become to play a fundamental role. On the other

 hand, both the diversity of languages and the emergence of new 

communication media and stylistic trends are responsible for the 

scarcity of resources in the case of some specific tasks and 

applications. In this sense, this workshop attempts to focus its 

attention on those specific applications or cases for which data 

scarcity poses a restrictive problem for data-driven approaches. This 

includes the following three specific situations:<br>


<br>

Minority Languages, for which scarcity of resources is a consequence of 

the minority nature of the language itself. In this case, attention is 

focused on the development of both monolingual and cross-lingual 

resources. Some examples in this category include: Basque, Pashto and 

Haitian Creole, just to mention a few.<br>


<br>

Disconnected Languages, for which a large amount of monolingual 

resources are available, but due to cultural, historical and/or 

geographical reasons cross-language resources are actually scarce. Some 

examples in this category include language pairs such as Chinese and 

Spanish, Russian and Portuguese, and Arabic and Japanese, just to 

mention a few.<br>


<br>

New Language Styles, which represent different communication forms or 

emerging stylistic trends in languages for which the available resources

 are practically useless. This case includes the typical examples of 

tweets and chat speak communications, as well as other informal form of 

communications, in many languages.<br>


<br>

The main topics of interest for this workshop include, but are not limited to, the following ones:<br>

<br>

 * Construction and collection of monolingual resources<br>

 * Construction and collection of cross-language resources<br>

 * Annotation guidelines and evaluation<br>

 * Automatic extraction of linguistic resources<br>

 * Automatic annotation of linguistic resources<br>

 * Use of crowdsourcing for generating and annotating resources<br>

 * Use of pivot languages for bridging unconnected languages<br>

 * Methods  to adapt existing resources to new domains and styles<br>

 * Generation of resources for informal communication styles<br>

 * Evaluation of monolingual resources: tasks and protocols<br>

 * Evaluation of cross-language resources: tasks and protocols<br>

<br>

SUBMISSION INSTRUCTIONS<br>

<br>

Authors are invited to submit papers on original and previously unpublished work. Formatting should<br>

be according to LREC 2012 specifications (see <a href="http://www.lrec-conf.org/lrec2012/?Authors-Kit" target="_blank">http://www.lrec-conf.org/lrec2012/?Authors-Kit</a>)<br>

using LaTeX or MS-Word style files (available for download at <a href="http://www.lrec-conf.org/lrec2012/?Download-Templates,178" target="_blank">http://www.lrec-conf.org/lrec2012/?Download-Templates,178</a>).<br>


<br>

Submission is electronic in PDF format using the START submission system at<br>

<br>

<a href="https://www.softconf.com/lrec2012/CREDISLAS2012/" target="_blank">https://www.softconf.com/lrec2012/CREDISLAS2012/</a><br>

<br>

Double submission policy: Parallel submission to other meetings or publications are possible but<br>

must be immediately notified to the workshop contact person (see below).<br>

<br>

Authors of accepted papers will be invited to present their research at the workshop.<br>

The workshop papers will be part of the LREC proceedings and published on the web site of LREC 2012 before the conference.<br>

<br>

IMPORTANT DATES<br>

<br>

February 26, 2012: Paper submissions due<br>

March 16, 2012: Notification of acceptance<br>

March 30, 2012: Camera ready papers due<br>

May 27, 2012:  Workshop in Istanbul (afternoon session)<br>

<br>

ORGANIZERS<br>

<br>

Contact person: Patrik Lambert (e-mail: <a href="mailto:patrik.lambert@lium.univ-lemans.fr" target="_blank">patrik.lambert@lium.univ-lemans.fr</a> )<br>

<br>

Patrik Lambert (University of Le Mans),<br>

Marta R. Costa-jussà (Barcelona Media Innovation Center),<br>

Rafael E. Banchs (Institute for Infocomm Research)<br>

<br>

PROGRAMME COMMITTEE<br>

<br>

Iñaki Alegria, University of the Basque Country, Spain<br>

Marianna Apidianaki, LIMSI-CNRS, Orsay, France<br>

Victoria Arranz, ELDA, Paris, France<br>

Jordi Atserias, Yahoo! Research, Barcelona, Spain<br>

Joan Codina, Barcelona Media, Barcelona, Spain<br>

Gareth Jones, Dublin City University, Ireland<br>

Min-Yen Kan, National University of Singapore<br>

Philipp Koehn, University of Edinburgh, UK<br>

Udo Kruschwitz, University of Essex, UK<br>

Yanjun Ma, Baidu Inc. Beijing, China<br>

Sara Morrissey, Dublin City University, Ireland<br>

Maja Popovic, DFKI, Berlin, Germany<br>

Paolo Rosso, Universidad de Valencia, Spain<br>

Marta Recasens, Stanford University, USA<br>

Wade Shen, Massachusetts Institute of Technology, Cambridge, USA<br>

Haifeng Wang, Baidu Inc. Beijing, China<br>

</div>