[Corpora-List] Call for Participation: LREC Workshop "Quality assurance and quality measurement for language and speech resources"
Steven Krauwer
steven.krauwer at let.uu.nl
Tue Apr 18 10:31:11 UTC 2006
"Quality assurance and quality measurement
for language and speech resources"
on Saturday, May 27th 2006
in conjunction with
Genoa, Italy, 24-26 May 2006
Workshop Programme
Saturday, May 27
09:15 09:30 Introduction (Steven Krauwer and Uwe Quasthoff)
09:30 10:20 What is quality (Chris Cieri, invited talk)
10:20 10:40 Validation of third party Spoken and Written Language
Resources - Methods for performing Quick Quality Checks
(Hanne Fers›e, Henk van den Heuvel, Sussi Olsen)
10:40 11:00 Improving the Quality of FrameNet (J. Scheffczyk, M.
11:00 11:30 BREAK
11:30 12:10 Valid Validations: Bare Basics and Proven Procedures
(Henk van den Heuvel, invited talk, in collaboration with
Eric Sanders)
12:10 12:50 Validation of the written part of the Dutch CGN
(provisional title, Hanne Fers›e, invited talk, in
collaboration with Sussi Olsen and Bart Jongejan)
12:50 13:10 Quality control of treebanks: documenting, converting,
patching (Sabine Buchholz, Darren Green)
13:10 13:30 Evaluation of a diachronic text corpus (Mikko Lounela)
13:30 14:30 LUNCH
14:30 14:50 Measuring Monolinguality (Uwe Quasthoff, Chris Biemann)
14:50 15:10 JTaCo & SProUTomat: Automatic Evaluation and Testing of
Multilingual Language Technology Resources and Components
(Christian Bering and Ulrich Sch„fer)
15:10 16:20 Panel session (Chris Cieri LDC, Chu-Ren Huang Acad. Sin.,
Takenobu Tokunaga TIT, Khalid Choukri ELDA) [t.b.c.]
16:20 16:30 Winding up & Closing (Steven Krauwer and Uwe Quasthoff)
16:30 17:00 BREAK
Workshop description
The workshop aims at
* bringing together experience with and insights in quality
assurance and measurement for language and speech resources in
a broad sense (including multimodal resources, annotations,
tools, etc),
* covering both qualitative and quantitative aspects,
* identifying the main tools and strategies,
* analysing the strengths and weaknesses of current practice,
* establishing what can be seen as current best practice,
* reflecting on trends and future needs.
It can be seen as a follow-up of the workshop on speech
resources that took place at LREC 2004, but the scope is wider
as we include both language and speech resources. We feel that
there is a lot to be gained by bringing these communities
together, if only because the speech community seems to have a
longer tradition in resources evaluation than the written
language community.
Quality assurance is an important concern for both the provider,
the distributor and the user of language and speech resources.
The concept of quality is only meaningful if both the producer
and the user of the resources can rely on the same set of quality
criteria, and if there are effective procedures to check whether
these criteria are met. The universe of possible types of
language resources is huge and evolves over time, and there is no
universal set of qualitative or quantitative criteria and tests
that can be applied to all sorts of resources. In this workshop
we will try to investigate what sorts of criteria, tests and
measures are being used by providers, users and distribution
agencies such as ELRA and LDC, and we will try to distill from
this current practice general recommendations for quality
assurance and measurement for language and speech resources, The
workshop will look at quality assurance and quality measures both
from the provider, the distributor and the user point of view,
and will explicitly address special problems in connection with
very large corpora, including numerical measures, comparison of
corpora, exchange formats, etc.
Workshop committee
* Steven Krauwer (UU/ELSNET, steven.krauwer at let.uu.nl)
* Uwe Quasthoff (Leipzig, quasthoff at informatik.uni-leipzig.de)
* Simo Goddijn (INL, goddijn at inl.nl)
* Jan Odijk (ELRA/Nuance/UU, jan.odijk at nuance.com)
* Khalid Choukri (ELDA, choukri at elda.org)
* Nicoletta Calzolari (ILC-CNR/WRITE, glottolo at ilc.cnr.it)
* Bente Maegaard (CST, bente at cst.dk)
* Chris Cieri (LDC, ccieri at ldc.upenn.edu)
* Chu-ren Huang (Ac Sin, churen at gate.sinica.edu.tw)
* Takenobu Tokunaga (TIT, take at cl.cs.titech.ac.jp)
* Harald Hoege (Siemens, harald.hoege at siemens.com)
* Henk van den Heuvel (CLST/SPEX, H.vandenHeuvel at let.ru.nl)
* Dafydd Gibbon (Bielefeld, gibbon at spectrum.uni-bielefeld.de)
* Key-Sun.Choi (KORTERM, Key-Sun.Choi at kaist.ac.kr)
* Jorg Asmussen, (DSL, ja at dsl.dk)
Main contact and further info
* Contact: Steven Krauwer, steven.krauwer at let.uu.nl
* Workshop URL: http://utrecht.elsnet.org/lrec2006qa
* Conference URL: http://www.lrec-conf.org/lrec2006
This workshop is supported by ELSNET and WRITE (the
international coordination committee for written language resources
and evaluation).
Steven Krauwer, ELSNET / UiL OTS, Trans 10, 3512 JK Utrecht, Nederland
phone: +31 30 2536050, fax: +31 30 2536000, email: s.krauwer at let.uu.nl
More information about the Corpora
mailing list