Ecole: Language Engineering

Philippe Blache pb at
Fri Mar 5 12:20:10 UTC 1999

From: CRL SSLE <ssle at>

The NMSU Computing Research Laboratory Presents

The 1999 Summer School in Language Engineering
June 28-July 9

The 1999 Summer School in Language Engineering is designed for the
practical computational linguist or natural language processing

The program of the school stresses practical needs of application system
builders in such areas as machine translation, information retrieval and
extraction and text summarization. It stresses the broad range of
multilingual aspects of today's language engineering, from the support
for the various writing systems to acquisition of linguistic knowledge
for applications to languages that have not yet been widely studied.

The summer school is organized by the Computing Research Laboratory
(CRL) of New Mexico State University. The instructors, both members of
CRL staff and visiting professors, are all leaders in their respective
areas of expertise.

The school will feature two full weeks of instruction and hands-on
practical studies. The number of students in the school will be small,
keeping a high instructor-to-student ratio. Registrants will be accepted
on a first-come first-served basis. Preregistration and fees must be
received no later than June 1.

For more information,
please visit our web site at:

Course Descriptions

"Ecological" Issues in Language Engineering
This course will cover issues related to writing systems, encodings,
input and output methods; treatment of punctuation, special characters
and symbols, including mark-up; processing of dates and numbers; and a
variety of issues connected with managing large multilingual collections
of documents featuring different mark-up styles. A number of
computational tools will be introduced and used in practical exercises.

Approaches to Computational Morphology
After a presentation of several approaches to computational morphology,
with example systems for such widely different languages as Spanish,
Persian, Russian and Turkish, this course will concentrate on the
engineering of state-of-the-art morphological analysis and generation
systems, especially for languages other than English. Students will get
hands-on experience using sophisticated development and testing tools,
by building a morphological analyzer.

Lexicon Acquisition for NLP I: Morphology and Syntax
This course will describe the process of design and acquisition of
several types of lexicons for NLP systems: lexicons supporting
morphological and syntactic analysis of texts in a language, transfer
lexicons for machine translation and multilingual onomastica (lexicons
of proper names). A number of acquisition interfaces will be used in
practical exercises.

Lexicon Acquisition for NLP II: Ontological Semantics
This course will present the design and acquisition of static knowledge
sources to support analysis of meaning in natural language texts. In
particular, it will cover designing and building ontologies, or world
models, for NLP and lexicons for the support of semantic analysis of
particular languages. Practical exercises will be supported by
interactive acquisition interfaces.

Knowledge Elicitation from Informants
his course will present an environment for eliciting grammatical and
lexical knowledge about a language from a user who knows that language
and English but is not a trained linguist. This kind of environment is a
realistic alternative to experimenting with automatic elicitation of
language knowledge. It combines corpus-based, expectation-based and
failure-driven acquisition of declarative knowledge about a language and
is most useful for the languages for which few computational resources
are available. The design of the acquisition process and system will be
discussed, and the interface, Boas, will be used in practical exercises.

A Survey of Language Engineering Applications
This course will introduce language engineering applications such as
machine translation, information retrieval and extraction, text
summarization and language instruction. The tasks and techniques learned
in the other courses will be put in their context and further
illustrated. The following systems will be presented and available for
laboratory work: the Corelli machine translation environment; the MINDS
information retrieval and summarization system, the URSA cross-language
information retrieval engine, the Oleada language instruction
environment and translator's tool set, the Mikrokosmos machine
translation system and the Expedition environment for configuring
machine translation systems for low-density languages.

More information about the Ln mailing list