Appel: Language Resources: From Storyboard to Sustainability and LR Lifecycle Management, LREC2010 Workshop

Thierry Hamon thierry.hamon at UNIV-PARIS13.FR
Tue Jan 5 21:00:51 UTC 2010

Date: Thu, 31 Dec 2009 11:04:17 +0100
From: Helene Mazo <mazo at>
Message-ID: <4B3C7721.9070006 at>

**[Apologies for cross-postings]


Workshop on

*Language Resources: From Storyboard to Sustainability and LR
Lifecycle Management*

* *

To be held in conjunction with the 7^th International Language
Resources and Evaluation Conference (LREC 2010)

*23 May 2010, Mediterranean Conference Centre, Valletta, Malta* (under construction)

/Deadline for submission: 22 February 2010/**




The life of a language resource (LR), from its mere conception and
drafting to its adult phases of active exploitation by the HLT
community, varies considerably. Ensuring that language resources be a
part of a sustainable and endurable living process represents a
multi-faceted challenge that certainly calls for well-planned
anti-neglecting actions to be put into action by the different actors
participating in the process. Clearing all IPR issues, exploiting best
practices at specification and production time are just a few samples
of such actions. Sustainability and lifecycle management issues are
thus concepts that should be addressed before endeavouring into any
serious LR production.


When thinking of long-term LRs a number of aspects come to our minds
which do not always succeed to be taken into account before
development.  Some of these aspects are /usability/, /accessibility,
interoperability/ and /scalability/, which inevitably call for a long
list of neglected points that would need to be taken into account at a
very early stage of development. Looking further into the
/portability/ and /scalability/ of a language resource, a number of
dimensions should be taken into account to ensure that a language
resource reaches its adult life in an active and productive way.


An aspect that is often neglected is the /accessibility/ and thus
/secured reusability/ of a language resource. Institutions such as
ELRA (European Language resources Association) and LDC (Linguistic
Data Consortium), at a European and American level, respectively, as
well as BAS (Bavarian Archive for Speech Signals) and TST-Centrale
(Flemish-Dutch Human Language Technology Agency), at a
language-specific level, have worked on these aspects for a large
number of years. Through their different activities, they have
successfully implemented a sharing policy which allows different users
to gain access to already existing resources. Other emerging
programmes such as CLARIN (Common Language Resources and Technology
Infrastructure) are also looking into these aspects. Nevertheless,
many resources still follow development without a long-term
accessibility plan into place which makes impossible to gain access
once the resource is finished. This accessibility plan should consider
issues such as ownership rights, licensing, types of use, aiming for a
wide community from the very beginning. This accessibility plan calls
for an optimal co-operation between all actors (LR users, financing
bodies, owners, developers and organisations) so that issues related
to the life of a LR are well established, roles and actors are clearly
identified within the cycle and best practices are defined towards the
management of the entire LR lifecycle.


We are aware, though, that these above-presented ideas are but a
take-off for discussion. It is at this point that we would like to
invite the community to participate in this workshop and share with us
their views on these and other relevant issues of concern. A fruitful
discussion could lead us to finding new mechanisms to support
perpetuating language resources, and may lead us towards a
sustainability model that guarantees an appropriate and well-defined
LR storyboard and lifecycle management plan in the future.


Among the many issues and topics that may be presented and discussed
during this workshop, we would like to already suggest the following:


- Which fields require LRs and which are their respective needs?

- What needs to be part of a LR storyboard? What points are we missing
  in its design?

- General specifications vs. detailed specifications and design

- Annotation frameworks and layers: interoperable at all?

- Should creation and provision of LRs be included in higher education

- How to plan for scalable resources?

- Language Resource maintenance and improvement: feasible?

- Sharing language resources: how to bear this in mind and implement
  it? Logistics of the sharing: online vs. offline

- Centralised vs. decentralised, and national vs. international
  management and maintenance of LRs

- What happens when users create updated or derived LRs?

- Sharing language resources: legal issues concerned

- Sharing language resources: pricing issues concerned, commercial
  vs. non-commercial use

- Do LR actors work in a synchronised manner?

- What should be the roles of the different actors?

- What are the business models and arrangements for IPRs?

- Self-supporting vs. subsidised LR organisations

- Other general problems faced by the community

We solicit papers that address these questions and other related
issues relevant to the workshop.

*Workshop Programme and Audience Addressed*

This full-day workshop aims to address all those involved with
language resources at some point of their research/work (LR users,
producers, ...) and all those with an interest in the different
aspects involved, whether universities, companies or funding agencies
of some nature. It aims to be a meeting and discussion point for the
so many bottlenecks surrounding the life of a resource and which
remain to be addressed with a sustainability plan.


The workshop features two invited talks, opening the morning and
afternoon sessions, submitted papers, and will conclude with a round
table to brainstorm on the issues raised during the presentations and
the individual discussions. This round table will be run by a number
of experts already experienced in some of the highlighted problems and
in open discussion with the workshop participants. In short, this
workshop will result in a plan of action towards a sustainability and
lifecycle management plan to implement.


*Invited Speakers*

To be announced on the workshop web site.

* *

*Organising Committee*

Victoria Arranz (Evaluations and Language resources Distribution
Agency (ELDA) / European Language resources Association (ELRA),

Khalid Choukri (ELDA - Evaluations and Language resources Distribution
Agency / ELRA - European Language resources Association, France)

Christopher Cieri (LDC - Linguistic Data Consortium, USA)

Laura van Eerten (Flemish-Dutch HLT Agency, Instituut voor Nederlandse
Lexicologie, The Netherlands)

Bente Maegaard (CST, University of Copenhagen, Denmark)

Stelios Piperidis (ILSP -- Institute for Language and Speech
Processing / ELRA - European Language resources Association, France)

Remco van Veenendaal (Flemish-Dutch HLT Agency, Instituut voor
Nederlandse Lexicologie, The Netherlands)


*Programme Committee*

Núria Bel (Institut Universitari de Lingüística Aplicada, Universitat
Pompeu Fabra, Spain)

Nicoletta Calzolari (Istituto di Linguistica Computazionale del CNR
(ILC-CNR) -- Italy)

Jean Carletta (Human Communication Research Centre, School of
Informatics, University of Edinburgh, UK)

Catia Cucchiarini (Nederlandse Taalunie, The Netherlands)

Christoph Draxler (Bavarian Archive for Speech Signals, Institute of
Phonetics and Speech Processing (BAS), Germany)

Maria Gavrilidou (Institute for Language and Speech Processing (ILSP),

Nancy Ide (Department of Computer Science, Vassar College, USA)

Steven Krauwer (UiL OTS, Utretch University, The Netherlands)

Asunción Moreno (Universitat Politècnica de Catalunya (UPC), Spain)

Dirk Roorda (Data Archiving and Networked Services, The Netherlands)

Ineke Schuurman (Centre for Computational Linguistics, Catholic
University Leuven, Belgium)

Claudia Soria (Istituto di Linguistica Computazionale del CNR
(ILC-CNR) -- Italy)

Stephanie M. Strassel (Linguistic Data Consortium (LDC), USA)

Andreas Witt (IDS Mannheim, Germany)

Peter Wittenburg (Max Planck Institute for Psycholinguistics, The


*Important dates*

Deadline for abstracts: Monday 22 February 2010

Notification to Authors: Friday 12 March 2010

Submission of Final Version: Sunday 21 March 2010

Workshop: Sunday 23 May 2010



Abstracts should be no longer than 1500 words and should be submitted
in PDF format through the online submission form on START
( For further
queries, please contact Victoria Arranz at arranz at or Laura
van Eerten at laura.vaneerten at .

/When submitting a paper through the START page, authors will be
kindly asked to provide relevant information about the resources that
have been used for the work described in their paper or that are the
outcome of their research. For further information on this new
initiative, please refer to

Message diffuse par la liste Langage Naturel <LN at>
Informations, abonnement :
English version       : 
Archives                 :

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  :

More information about the Ln mailing list