Natural Language Processing for Minority Languages

Thu Feb 6 13:03:29 UTC 2003

NLP for Minority Languages with Few Computational Linguistic Resources

Short Title: NLP Minority Languages
Location: Batz-sur-Mer, France
Date: 14-Jun-2003 - 14-Jun-2003
Call Deadline: 19-Mar-2003

Web Site: http://dev.eurac.edu:8080/taln/workshop.minorities.txt
Contact Person: Oliver Streiter
Meeting Email: ostreiter at eurac.edu

This is a session of the following conference:
2003 Traitement Automatique des Langues Naturelles

Meeting Description:

The goal of the workshop is to get an overview of activities,
methodologies and achievements in the area of Natural Language Processing
of Minority Languages. Workshop on Natural Language Processing of Minority
Languages with few computational linguistic resources

BACKGROUND

Over the last few years, minority and small languages have attracted
considerable attention. Projects aiming at the revitalization,
standardization and linguistic normalization have been initiated to
promote usage of these languages and contribute to their survival.
Speakers of smaller languages have gained awareness that their languages
belong to the world's cultural heritage, and are becoming more and more
inclined to use their native tongues at a broader scale. The rising number
of web-pages in lesser-used languages demonstrates this fact.

PROBLEM DESCRIPTION

This workshop will approach the problem of minority languages from the
computational point of view. The workshop will focus on minority languages
with few computational linguistic resources, e.g. Occitan, Hakka, Corse,
Nahuatl, including specific minority languages as sign languages. Minority
languages with rich computational linguistic resources, as for example
Catalan, are not excluded from the workshop as they may function as an
example of a successful minority language. Papers related to majority
languages are equally accepted in case the languages treated face problems
similar to minority languages.

The goal of the workshop is to get an overview of activities,
methodologies and achievements in the area of Natural Language Processing
of Minority Languages, in order to promote the research in this area and
to enhance the prestige associated with this research. Automatic
processing of minority languages has to overcome a number of difficulties
which arise from their special status.

* As these languages have few speakers, there are few native linguists and
even fewer computational linguists. Rule-based approaches to tagging,
parsing, etc. may thus be difficult to apply.

* The scarce financial support that these languages enjoy equally seems to
virtually exclude rule-based approaches due to the amount of human labor
these approaches generally require. This problem might be overcome if
computational frameworks derived from other languages can be adopted.

* Corpus-based approaches are only applicable if adequate corpora are
available. However, creation of a corpus is time- and money-consuming and
requires linguistically sound conceptions, especially if general-purpose
corpora are to be created.

* Example-based approaches seem to be more promising in this light if no
general-purpose corpora, but specific examples are required. Compilation
of special examples also seems to be easier to implement than to write
formal rules. However, little is known of the feasibility of this paradigm
with respect to minority languages.

* Shallow knowledge techniques may be developed or are already in use,
which benefit from a specific property of a language or a language family.
This however may hamper the transfer of the approach from one language to
other languages. Some techniques might work with analytic languages and
not with agglutinative languages, etc . Different writing systems might
also prevent one simple approach from being applicable to another
language. The workshop is expected to stimulate research in this area. We
invite papers which are concerned with, but not restricted to, the
following topics:.

TOPICS OF INTEREST

* the relation between NLP and minority language support in general,
* development of specific NLP applications for minority languages,
e.g. tagging, morphological analysis, parsing, information retrieval,
machine translation
* development of corpora and machine-readable dictionaries for minority
languages,
* presentation of shallow knowledge NLP techniques which could be applied
to minority languages,
* overview studies that describe the state of the art of NLP for the
minority languages of a country, a region or a language type,
* comparative analysis of different NLP approaches to different minority
languages and languages types,
* free resources for NLP, their application areas and limitations,
* the requirements for NLP applications for special minority language
groups.

PROGRAM COMMITTEE

Shin-Hsi Chen           National Taiwan University,
                        hh_chen at csie.ntu.edu.tw
Vitelio Herrera         Union Latiner, Direction Terminologia et
Industries
                        de la Lange, Paris
                        v.herrera at unilat.org
Leonid Iomdin           Academia Auk Moscow, Laboratory of Computer
Linguistics
                        iomdin at iitp.ru
Harold Somers           Centre for Computational Linguistics, UMIST
                        Harold.Somers at umist.ac.uk
Oliver Streiter         EURAC, European Academy, Language & Law,
                        ostreiter at eurac.edu
Mathias Stuflesser      SPELL, Service de Personification y Elaboration
Dal
                        Lings Ladin,
                        spell-mathias at ladinia.net
Leonhard Voltmer        EURAC, European Academy, Minorities,
                        lvoltmer at eurac.edu
Wolfgang Wlck          University at Buffalo, SUNNY,
                        wwolck at acsu.buffalo.edu

IMPORTANT DATES

19.3.2003 Submission deadline
31.3.2003 Notification of acceptance
28.4.2003 Camera ready version

SUBMISSION FORMAT

Submissions should not be longer than 10 pages in Times 12, all
included.  For more detailed information in French see:

http://www.sciences.univ-nantes.fr/irin/taln2003/page/taln_appel.html

Style files can be downloaded here.

Latex French:
http://www.sciences.univ-nantes.fr/irin/taln2003/doc/StyleLatexTaln03_FR.tgz
Latex English:
http://www.sciences.univ-nantes.fr/irin/taln2003/doc/StyleLatexTaln03_EN.tgz
Word French:
http://www.sciences.univ-nantes.fr/irin/taln2003/doc/ModeleTaln2003_FR.dot
Word English:
http://www.sciences.univ-nantes.fr/irin/taln2003/doc/ModeleTaln2003_EN.dot

CONTACT ADDRESS

The contact address for submissions to the workshop and
further informations with respect to the workshop is

Oliver Streiter
European Academy
Language and law
mail: ostreiter at eurac.edu
tel:  +39 0471 055 115
fax:  +39 0471 055 199