CFP: First Steps for Language Documentation of Minority Languages

Steven Bird sb at CS.MU.OZ.AU
Fri Dec 19 20:21:02 UTC 2003

                              C A L L    F O R   P A P E R S

             4th International SALTMIL (ISCA SIG) LREC workshop on

         First Steps for Language Documentation of Minority Languages:
                           Computational Linguistic Tools for
                      Morphology, Lexicon and Corpus Compilation

                                24 May 2004, Lisbon, Portugal

Motivation and Aims

The minority or “lesser used” languages of the world are under increasing
pressure from the major languages (especially English), and many of them lack
full political recognition. Some minority languages have been well researched
linguistically, but most have not, and the vast majority do not yet possess
basic speech and language resources (such as text and speech corpora) which
are sufficient to permit research or commercial development of products.

If this situation were to continue, the minority languages would fall a long
way behind the major languages, as regards the availability of commercial
speech and language products. This in turn will accelerate the decline of
those languages that are already struggling to survive. To break this vicious
circle, it is important to encourage the development of basic language
resources as a first step.

The workshop is intended to continue the series of SALTMIL (ISCA SIG) LREC 
1) "Language Resources for European Minority Languages" (LREC1998) Granada, 
2) "Developing Language Resources for Minority Languages: Re-usability and 
Priorities" (LREC2000) Athens, Greece.
3) "Portability Issues in Human Language Technologies " (LREC2002) Las 
Palmas de Gran Canaria, Spain.

The proposed workshop aims to share information on tools and best practice, so
that isolated researchers will not need to start from scratch. An important
aspect will be the forming of personal contacts, which can minimise
duplication of effort. Information on sources of funding for minority
languages will also be presented, and there will be discussion on the
strategic priorities that need to be addressed in this area. There will be a
balance between presentations of existing language resources, and more general
presentations designed to give background information needed by all
researchers present.

One potential means of ameliorating this imbalance in technology resources is
through encouraging research in the portability of human language 
technology for multilingual application.

Topics of Interest

The workshop will focus on the following topics and languages:
     * Existing projects in the field, with the opportunity to share useful 
     * Presentations of existing speech and text databases for minority 
        with particular emphasis on software tools that have been found 
useful in their development.
            * Linguistic corpora
           * Automatic Speech Recognition
           * Acoustic modelling
           * Dictionary development
            * Language modelling .
            * Natural Language Processing:
           * Computational lexicography
           * Morphology
            * Syntax
            * Machine Translation.
            * Information retrieval


The first session of the workshop will consist of invited talks focusing on 
current methodologies for language documentation and computational 
linguistic tools which are available for minority languages. Each invited 
speaker will be asked to comment on the following:
  * how current research relates to minority languages, perhaps indicating 
how they would approach their work within this context
  * which methodologies and tools they find most useful
  * which of those methodologies are defined as portable for different 
  * how these tools could extend the use of the language
  * how these basis could be used in further work on HLT

The second session will be an oral session focusing on programmes and 
initiatives for supporting minority language documentation. The main aim of 
this session is to provide a forum for fostering new contacts among 
researchers working in this area.

Invited speakers

  * Dafydd Gibbon, Univ. Bielefeld.
                 "First steps in  corpus compilation"
  * Xabier Artola, Ixa group, Univ. of the Basque Country.
                "First steps in  lexicon resources"
  * Bojan Petek, University of Ljubljana. Slovenia.
                 “Experiences defining a Network of Excellence
                   on Portability of Human Language Technologies”
  * Kenneth R. Beesley, Xerox (to be confirmed)
               "First steps in  morphology"

Workshop Organizing and Program Committee

Bojan Petek, University of Ljubljana. Slovenia
Julie Berndsen, University College Dublin, Ireland
Oliver Streiter, EURAC; European Academy, Bolzano/Bozen, Italy
Atelach Alemu, Addis Ababa University. Ethiopia
Kepa Sarasola,University of the Basque Country, Donostia


Papers are invited that describe research and development in the area of 
Human Language Technology portability. All contributed papers will be 
presented in poster format. Each submission should include: title; 
author(s); affiliation(s); and contact author's e-mail address, postal 
address, telephone and fax numbers. Abstracts (maximum 500 words, 
plain-text format) should be sent via email to:
  Julie Berndsen Julie.Berndsen at
All contributions (including invited papers) will be printed in the 
workshop proceedings (CD). They also will be published on the SALTMIL website.

Submissions of papers for poster presentations should follow
the same style as the ones for regular LREC paper and not be longer than
6000 words. The final details will be published as soon as they become
We allow simultaneous paper submission to the workshop and the LREC
main conference. If a paper is accepted by both the conference and the
workshop, the paper will be presented at the conference, rather than at
the workshop. The author(s) should notify the workshop chair.

Important Dates:

Deadline for workshop abstract submission                   11th February 2004
Notification of acceptance                                            25th 
February 2004
Final version of the paper for the workshop proceedings  1st April 2004
Workshop                                                                 24 
May 2004, morning

Workshop Registration Fees

The registration fees for the workshop are:
·If you are not attending LREC: 85 EURO
·If you are attending LREC: 50 Euro
These fees will include a coffee break and the Proceedings of the Workshop.
Registration will be handled by the LREC Secretariat.

More information about the Endangered-languages-l mailing list