[Corpora-List] Call for Papers: Sustainability of Language Resources

Andreas Witt Andreas.Witt at uni-tuebingen.de
Wed Feb 13 16:25:39 UTC 2008


                        2nd Call for Papers
               Sustainability of Language Resources and                 
                Tools for Natural Language Processing

                     --- Deadline extended ---
New deadline for submissions: March 2nd, 2008

Meeting Description:

One of the problems in Natural Language Processing and related fields
is that the sustainability of language resources and of language
technology tools are neglected. The very complex question of how to
ensure or maybe even guarantee sustainability is a multi-faceted one
and depends on different individual subtasks. Several of these tasks
will be addressed by contributions of this workshop.

One of the problems in Natural Language Processing and related fields
is that the sustainability of language resources (e.g., corpora) and
of language technology tools (e.g. annotation or query tools) are
neglected on a regular basis.

This results in, for example, tools whose algorithms and data
structures are poorly documented and whose area of application is
evident only to the people who built the software. Similar issues
arise with regard to language resources: often, these are tailored to
the needs of an individual application or of a project with a very
specific research question. When the project is finished it becomes
next to impossible (especially for third parties) to gain access to
the resource that may have taken several months or even years to
create.

The very complex question of how to ensure or maybe even guarantee
sustainability is related to several key issues spanning a broad
spectrum across several closely related fields: in the area of
language documentation, seven dimensions of portability (content,
format, discovery, access, citation, preservation, rights) have been
suggested. Another area of research is primarily concerned with
annotation technology, especially the problem of building generic
annotation frameworks as well as representing several different layers
of linguistic annotation referring to one specific set of primary data
by means of standoff annotation. Closely related work deals with the
standardisation of annotation frameworks, especially with regard to
the level of impact a specific linguistic theory has on their
vocabularies and markup grammars. A last area concerns the fostering
of sustainability through specific Software Engineering processes for
Computational Linguistics and Natural Language Processing tools,
applications and resources.

Providing sustainability for linguistic tools and language resources
becomes increasingly important for the research community. Nowadays, this
is also acknowledged by funding organisations - they often encourage
research projects to make sure that language resources will still be
accessible and (re-)usable in ten, 15, or 20 years time.

The problem of ensuring sustainability is a multi-faceted one and depends
on several individual subtasks. At least one of these tasks should
be addressed by contributions to this workshop. The topics of interest
include but are not limited to:

- Archiving linguistic data and resources - Annotation technology,
e.g., generic corpus annotation frameworks; the relationship of
linguistic theories to corpus annotation; metadata annotation schemes,
and related tools and applications - Reusability of treebanks, e.g.,
annotations according to one specific linguistic framework should be
applicable to NLP tasks that are based on different linguistic
paradigms - Sustainability in Software Engineering for Computational
Linguistics - Copyright issues, e.g., legal restrictions, copyright of
web pages (for example, in a web as corpus approach), software
patents, intellectual property, national and international issues etc.
- Privacy protection, e.g., automatic anonymisation of language data -
Sustainability, maintenance, and adaptability of NLP applications and
tools, e.g., to new domains, to new linguistic resources, or even to
new linguistic frameworks or theories - Querying linguistic data,
e.g., the usability and adaptability of query interfaces or query
toolboxes - Usability and acceptance of NLP software, e.g., corpus
query interfaces


Submission Instructions

Submissions should not exceed ten (10) pages, including references. We
strongly recommend the use of the LaTeX style files or Microsoft Word
document template that will be made available on the LREC Conference
Web site. A description of the required format will be made available to
those who are unable to make direct use of these style files.

Submission will be electronic. The only accepted format for submitted
papers is Adobe PDF. The papers must be submitted no later than
March 2nd 2008. Papers submitted after that time will not be
reviewed. For details of the submission procedure, please consult the
submission webpage reachable via the workshop website.


Important Dates

Deadline for submission of Papers: March 2nd, 2008
Notification of Acceptance:  March 18th, 2008
Deadline for final paper submission: April, 2nd 2008


Organizing Committee

Lou Burnard, Oxford University
Khalid Choukri, ELRA/ELDA
Georg Rehm, Tübingen University
Thomas Schmidt, University of Hamburg
Andreas Witt, Tübingen University


Program Committee

Helen Aristar-Dry, Eastern Michigan University, USA
Jeannine Beeken, Instituut voor Nederlandse Lexicologie, The Netherlands
Jean Carletta, University of Edinburgh, School of Informatics, UK
Dan Cristea, University of Iasi, Romania
Stefanie Dipper, Bochum University, Germany
Jost Gippert, Johann-Wolfgang-Goethe-Universität Frankfurt, Germany
Erhard Hinrichs, Tübingen University, Germany
Marc Kupietz, Institut für Deutsche Sprache Mannheim, Germany
Sandra Kübler, Indiana University, Computational Linguistics, USA
D. Terence Langendoen, NSF, USA
Joakim Nivre, Växjö University & Uppsala University, Sweden
Massimo Poesio, University of Trento, Italy
Kiril Ribarov, Charles University Prague, Czech Republic
Laurent Romary, Max-Planck Digital Library, Germany
Hinrich Schuetze, Stuttgart University, Germany
Serge Sharoff, University of Leeds, UK
Gary F. Simons, SIL International, USA
Manfred Stede, Potsdam University, Germany
Simone Teufel, University of Cambridge, Computer Laboratory, UK
Peter Wittenburg, MPI for Psycholinguistics, Nijmegen, The Netherlands
Martin Wynne, Oxford Text Archive, UK
Heike Zinsmeister, Heidelberg University, Germany


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list