Corpora: LREC Workshop on meta-descriptions and annotation schemes for multimedia Language Resources

Hamish Cunningham hamish at dcs.shef.ac.uk
Wed Jan 19 15:22:09 UTC 2000


*******************************************************************
*                                                                 *
*                 First EAGLES/ISLE Workshop on                   *
*          Meta-Descriptions and Annotation Schemas for           *
*            Multimodal/Multimedia Language Resources             *
*                                                                 *
*                                                                 *
*               LREC 2000 Pre-Conference Workshop                 *
*                         Athens, Greece                          *
*                                                                 *
*                       29 or 30 May 2000                         *
*                                                                 *
*                       1st Announcement                          *
*                              and                                *
*                       Call for Papers                           *
*                                                                 *
*                                                                 *
*******************************************************************

1. Workshop Outline
===================
Currently, we can identify a number of trends in the community dealing
with multimodal/multimedia language resources:

 - The number of resources is increasing rapidly.
 - Due to multimedia extensions and rich annotations the structural
   complexity of the resources is entering new dimensions.
 - The quantity of data to be handled is increasing enormously due to
   multimedia extensions, demanding new solutions.
 - The development of technology makes us assume that more and more of
   these resources will be available on the Internet.

The joint EC/NSF funded EAGLES/ISLE [1] initiative aims to create
standards and guidelines that can be applied to natural interactivity
and multimodal language reources (e.g. speech, gesture, facial
expressions, manual languages) that support the creation, use, re-use
of and access to such resources. As part of this initiative, the
workshop will address current trends and discuss structures which
could simplify and assist the creation and use of annotated
multimodal/multimedia resources, the process of finding suitable
resources, and accessing them, for instance, via the Web. The workshop
will address two related areas: annotation schemas and
meta-descriptions for multimodal/multimedia language resources.

Meta-Descriptions for Multimodal/Multimedia Language Resources (MMLR)
---------------------------------------------------------------------
Similar to other communities it is time to bring the widespread users
of multimedia language resources together and start a discussion about
meta schemas describing these resources. The goal is to have the
available multimedia language resources associated with linked
meta-descriptions which form a browsable and searchable universe open
to the Internet. A known portal, standardised meta-descriptions and
suitable tools will help users to more easily find suitable resources
for the task in mind. This interest unifies people from science,
industry, and also general users who have to use annotated multimedia
resources for their scientific analysis, training of commercial
components and many more.

Part of the proposed workshop will be dedicated to discussing the need
for such a universe of linked meta-descriptions, the scope of the
community, and existing work in this area. Also the nature of the
meta-descriptions must be extensively discussed with an emphasis on
questions such as: (1) Which are the elements which describe the
various language resources? (2) Is a more minimal schema preferable or
a more elaborate one? (3) How can we achieve flexibility within the
standard meta-description? (4) How can we automatically derive
meta-descriptions to make it a feasible task?

The workshop will also discuss whether benefits can be taken from
existing standards such as Dublin-Core from the community of digital
libraries, whether initiatives in the telecommunication and
broadcasting community are of relevance for our goals, and the impacts
of the W3C initiative toward a unifying framework called Resource
Description Framework for all these initiatives.

Annotation Schemas for MMLR
---------------------------
A second part of the workshop will be dedicated to discussing
annotation schemas for multimodal/multimedia language resources. Until
now the community has experience with text-only corpora based mostly
on orthographical transcriptions (with all their limitations) and with
corpora covering speech data often associated with one layer of
orthographic transcriptions and specifically tailored to the needs of
Automatic Speech Recognition systems. With the increasing power of
computer technology we see that people are starting to build corpora
based on several video and sound tracks with rich annotations covering
easily more than 50 layers. These annotations have complex time
relationships and various dependencies between and within layers. It
seems to be clear, therefore, that a large number of such complex
structured corpora will emerge and the community needs guidelines to
restrict the heterogeneity of such corpora.

At the Granada LREC conference we have heard about initial projects
having implemented "Abstract Data Models" for such multimedia corpora
[2]. In the meantime a broad discussion about the underlying universal
structure for such annotations has also been initiated [3]. A number
of projects in the US and Europe were and are funded to develop
annotation and exploitation tools to cope with such complex multimedia
databases. To guarantee a high amount of interoperability and unified
access to the resources it is time to have a separate workshop
dedicated to the nature of annotation schemas. Only good agreement in
this respect will limit the number of access tools needed to exploit
such databases.

The emergence of multimedia on computers has changed traditional
views, since direct media access allows us to refer to media time
which will never change instead of referring only to transcriptions
which can be modified and often are not adequate for coding complex
time relationships. However, the workshop will not only address
theoretical matters such as the underlying common structure and
abstract data models, but also raise questions of suitable
representation formats important for implementation. Formats suitable
for open exchange and long-term archiving will not be the optimal
choice for all types of program access and vice versa. We expect that
modern tools have to rely on several co-existing representation
formats. We also have to deal with the question of how we can
integrate existing textually based corpora or corpora which are
stepwise extended with media data afterwards.

2. Call for Papers
==================
The workshop will have two subsequent sessions: One will focus on
Internet-accessible Meta-Descriptions of MMLR. The other will be
dedicated to Annotation Schemas for MMLR. This workshop is seen as a
first one in a series which will help understand the complexity of the
problems and the various approaches found until now. Each session will
be started by an invited talk to introduce the problem and define the
scope and be finished by a summary from the organizers. The workshop
will focus on oral contributions and give enough space for broad
discussions. Papers are invited which can contribute to these two
topics.

Format of Submission
--------------------
Submissions should consist of an extended abstract of about one page
(DIN A4) and a separate title page providing the following
information: Official title of the paper; names and affiliations of
the authors; full address of the first author including phone, fax,
email, URL; required facilities. Only electronic submissions in ASCII,
Word, or HTML format will be accepted. The submissions should be sent
to: ISLE-2000 at mpi.nl. The reception of the submissions will be
notified within 3 days. If you did not get a notification, email could
have been erroneous.

Proceedings
-----------
The workshop organizers will produce proceedings. Therefore,
print-ready versions of the papers have to be submitted as WORD, PDF
or PS files. They should not exceed 5 pages (DIN A4).These final
versions have to be submitted electronically to the same email
address: ISLE-2000 at mpi.nl.

Important Dates
---------------
Deadline for submissions of papers:             March 17th
Notification of acceptance:                     April 3rd
Final versions of papers for proceedings:       May 12th
Workshop:                                       May 29th afternoon and
                                                30th morning

3. Organizational Issues
========================
Organizers of the workshop
--------------------------
P. Wittenburg, Technical Department, Max-Planck-Institute for
        Psycholinguistics, Nijmegen
D. Roy, Natural Interactive Systems Laboratory, Faculty of Science and
        Engineering, University of Southern Denmark Odense
H. Cunningham, Department of Computer Science, University Sheffield


Questions
---------
For all questions with respect to the workshop focus, please, use the
email address: ISLE-2000 at mpi.nl
For all questions with respect to organisational issues, accommodation
etc, please, contact the LREC secretariate: LREC2000 at ilsp.gr

Information
-----------
Information about the workshop such as call, schedule, and program can
be found on the web-page:  http://www.mpi.nl/world/ISLE
Information about the LREC conference can be found on the web-page:
http://www.icp.grenet.fr/ELRA/lrec2000.html

Registration
------------
The registration fee for the workshop is:
        - 120 EURO for those not attending LREC
        - 80 EURO for those attending LREC
Registration and payment is explained on the LREC web-page.

Included in the registration fee are the proceedings and coffee at the
breaks.

Program Committee


N.O. Bernsen (U Odense)
S. Bird (U Penn)
P. Bonhomme (LORIA Nancy)
D. Broeder (MPI Nijmegen)
H. Brugman (MPI Nijmegen)
L. Burnard (U Oxford)
N. Calzolari (ILC Pisa)
K. Choukri (ELRA Paris)
B. Comrie (MPI Leipzig)
H. Cunningham (U Sheffield)
U. Heid (U Stuttgart)
N. Ide (Vassar College)
T. McEnery (U Lancaster)
B. MacWhinney (CMU Pitsburgh)
L. Noldus (Noldus Wageningen)
S. Piperides (ILSP Athens)
W. Peters (U Sheffield)
L. Romary (LORIA Nancy)
A. Russel (MPI Nijmegen)
D. Roy (U Odense)
D. Slobin (U Berkeley)
S. Steininger (U München)
S. Stromqvist (U Lund)
H. Thompson (HCRC Edinburgh)
Y. Wilks (U Sheffield)
P. Wittenburg (MPI Nijmegen)
A. Zampolli (ILC Pisa



[1]  International Standards in Language Engineering project funded by
        EC and NSF
[2]  see http://www.dcs.shef.ac.uk/~hamish/dalr/
[3]  see http://www.ldc.upenn.edu/annotation/ and
        http://www.ltg.ed.ac.uk



More information about the Corpora mailing list