13.2577, Calls: Electronic Dictionaries, Update: Lang Models

Wed Oct 9 16:43:45 UTC 2002

LINGUIST List:  Vol-13-2577. Wed Oct 9 2002. ISSN: 1068-4875.

Subject: 13.2577, Calls: Electronic Dictionaries, Update: Lang Models

Moderators: Anthony Aristar, Wayne State U.<aristar at linguistlist.org>
            Helen Dry, Eastern Michigan U. <hdry at linguistlist.org>

Reviews (reviews at linguistlist.org):
	Simin Karimi, U. of Arizona
	Terence Langendoen, U. of Arizona

Consulting Editor:
        Andrew Carnie, U. of Arizona <carnie at linguistlist.org>

Editors (linguist at linguistlist.org):
	Karen Milligan, WSU 		Naomi Ogasawara, Arizona U.
	James Yuells, EMU		Marie Klopfenstein, WSU
	Michael Appleby, EMU		Heather Taylor, EMU
	Ljuba Veselinova, Stockholm U.	Richard John Harvey, EMU
	Dina Kapetangianni, EMU		Renee Galvis, WSU
	Karolina Owczarzak, EMU		Anita Huang, EMU
	Tomoko Okuno, EMU		Steve Moran, EMU
	Lakshmi Narayanan, EMU		Sarah Murray, WSU
	Marisa Ferrara, EMU

Software: Gayathri Sriram, E. Michigan U. <gayatri at linguistlist.org>
          Zhenwei Chen, E. Michigan U. <chen at linguistlist.org>
	  Prashant Nagaraja, E. Michigan U. <prashant at linguistlist.org>

Home Page:  http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Marie Klopfenstein <marie at linguistlist.org>
 ==========================================================================

As a matter of policy, LINGUIST discourages the use of abbreviations
or acronyms in conference announcements unless they are explained in
the text.

=================================Directory=================================

1)
Date:  Tue, 8 Oct 2002 13:37:49 +0200
From:  Michael ZOCK <zockm at noos.fr>
Subject:  Call for papers for the special issue of the TAL journal

2)
Date:  Tue, 8 Oct 2002 18:20:18 +0200
From:  Jardino <Michele.Jardino at limsi.fr>
Subject:  deadline extension : special issue of TAL on Language Models

-------------------------------- Message 1 -------------------------------

Date:  Tue, 8 Oct 2002 13:37:49 +0200
From:  Michael ZOCK <zockm at noos.fr>
Subject:  Call for papers for the special issue of the TAL journal

Call for papers for the special issue of the TAL journal

Title: ELECTRONIC DICTIONARIES, FOR MEN, MACHINES OR FOR BOTH?

Submission deadline: 15 December 2002

Guest Editors: Michael Zock (CNRS, LIMSI) & John Carroll (University
of Sussex).

http://www.atala.org/tal/appel-dictionnaires-electroniques.html

- --------------------------------------------------------------------------=
-

Dictionaries are the backbone of any NLP system. Understanding and
producing text, translation, summarisation, dialog, indexing or
finding information in a document all require lexical competency
represented in computers as a lexical resource / dictionary.

A good dictionary is characterized by the following features: broad
coverage (number of entries), rich annotation (a lot of information
associated with each entry) and ease of access of the information.

If electronic dictionaries compare favourably with paper dictionaries
(size, ease of access), they are still far from perfect in particular
with regard to content and access. Coverage is certainly not the only
criterion for evaluating a dictionary, because, what is a large
dictionary good for if the data is not easily accessible?

- --------------------------------------------------------------------------=
-

The goal of this special issue is to discuss challenges inherent to
the building and use of electronic dictionaries and approaches and
techniques to address them. We welcome work on any of the following
issues:

  o the problem of building a dictionary (method, know-how);
  o types of information to be stored in a dictionary;
  o representation, structuring (indexing) and visualisation of the data;
  o the problem of accessing information (aids for navigation,
interface, strategies);
  o acquisition of lexical data (corpus), reuse of existing data;
  o coherency checking;
  o problems related to multilinguality;
  o possibilities given to the user or lexicographer to edit entries
(annotation, updating);
  o the usage of dictionaries by people (learning/teaching; writing)
and by machines (NLP).

Given the wide spectrum of needs we welcome work from any of the
following perspectives, linguistics, computer science,
psycholinguistics, language learning, ergonomics, etc. provided the
contibution contains a computational element.

Reviewers
- --------------------------------------------------------------------------=
-

- Christian Boitet (GETA, Grenoble)
- Nicoletta Calzolari (CNR, Pisa)
- Christiane Fellbaum (University of Princeton)
- Charles Fillmore (University of Berkeley)
- Ulrich Heid (IMS-CL, University of Stuttgart)
- Jean-Marie Pierrel (ATILF, Nancy)
- Alain Polguere (University of Montreal)
- Thiery Selva (GRELEP, K.U.Leuven, Belgium)
- Gilles Serasset (GETA, Grenoble)
- Monique Slodzian (CRIM, INALCO, Paris)
- Patrick St. Dizier (University of Toulouse)
- Jean Veronis (University of Aix en Provence)
- Piek Vossen (Irion Technologies, Delft, The Netherlands)
- Leo Wanner (University of Stuttgart)

Format
- --------------------------------------------------------------------------=
-
Papers (25 pages maximum) may be submitted in Word, Postscript or
PDF. The Hermes style sheets are available at Lavoisier and from the
TAL journal web site (http://www.atala.org/tal/hermes/cons_actes.htm).

Language
- --------------------------------------------------------------------------=
-
The papers may be written either in French or in English (non-French
speaking authors only)

Schedule
- --------------------------------------------------------------------------=
-
The submission deadline is 15 December 2002. People intending to
submit a paper should contact Michael Zock (zock at limsi.fr) before
October 31st.

Articles will be reviewed by a member of the editorial board of the
journal (http://www.atala.org/tal/redaction.html) and two external
reviewers chosen by the editors of the special issue. Editorial board
decisions and referees' reports will be transmitted to the authors by
March 1st, 2003.

Final versions of accepted papers will be required by June 1st, 2003.
Publication is planned for the summer of 2003.

Submission
- --------------------------------------------------------------------------=
-
Submissions (25 pages maximum, following the Hermes style sheet)
should be sent either electronically (zock at limsi.fr), or by surface
mail (five copies) to

Michael Zock
Limsi-CNRS, B.P. 133
F-91403 Orsay-Cedex, FRANCE
- --------------------------------------------------------------------------=
-

-------------------------------- Message 2 -------------------------------

Date:  Tue, 8 Oct 2002 18:20:18 +0200
From:  Jardino <Michele.Jardino at limsi.fr>
Subject:  deadline extension : special issue of TAL on Language Models

*******************************************************************************************
Second call for papers (TAL journal ): Deadline Extension to October 14, 2002

                    Automated Learning of Language Models

                          Deadline for submission :

                             October 14, 2002

            Issue coordinated by Michèle Jardino (CNRS, LIMSI),
             and Marc El-Beze    (LIA, University of Avignon) .

- -

Language Models (LM) play a crucial role in the working of Automated
Natural Language Processing systems, when real-life problems (often
very large ones) are being dealt with. Instances are Speech
Recognition, Machine Translation and Information Retrieval. If we want
these systems to adapt to new applications, or to follow the evolution
in user behaviour, we need to automatize the learning of parameters in
the models we use. Adaptation should occur in advance or in real
time. Some applications do not allow us to build an adequate corpus,
either from a quantitative or qualitative point of view. The gathering
of learning data is made easier by the richness of Web resources, but
in that huge mass, we have to effectively separate the wheat from the
chaff.

When asked about the optimal size for a learning corpus, are we
satisfied to answer "The bigger, the better"?

Rather than training one LM on a gigantic learning corpus, would it
not be advisable to fragment this corpus into linguistically coherent
segments, and learn several language models, whose scores might be
combined when doing the test (model mixture)?

In the case of n-gram models, which is the optimal value for n? Should
it be fixed or variable?

A larger value allows us to capture linguistic constraints over a
context which goes beyond the mere two preceding words of the classic
trigram.  However, increasing n threatens us with serious coverage
problems. Which is the best trade-off between these two opposite
constraints?  How can we smooth models in order to approximate
phenomena that have not been learned? Which alternatives are to be
chosen, using which more general information (lesser-order n-grams,
n-classes?)

Beyond the traditional opposition between numerical and
knowledge-based approaches, there is a consensus about the
introduction of rules into stochastic models, or probability into
grammars, hoping to get the best of both strategies. Hybrid models can
be conceived in several ways, depending on which choices are made
regarding both of their sides, and also, the place where coupling
occurs. Because of discrepancies between the language a grammar
generates, and actually observed syntagms, some researchers decided to
reverse the situation and derive the grammar from observed facts.
However, this method yields disappointing results, since it does not
perform any better than n -gram methods, and is perhaps
inferior. Shouldn't we introduce here a good deal of supervision, if
we want to reach this goal?

Topics (non-exhaustive list)

  ------------------------------------------------------------------------

In this special issue, we would like to publish either innovative
papers, or surveys and prospective essays dealing with Language Models
(LM), Automated Learning of their parameters, and covering one of
following subtopics:

   * Language Models and Resources:
        o determination of the adequate lexicon
        o determination of the adequate corpus
   * Topical Models
   * LM with fixed or variable history
   * Probabilistic Grammars
   * Grammatical Inference
   * Hybrid Language Models
   * Static and dynamic adaptation of LMs
   * Dealing with the Unknown
        o Modelling words which do not belong to the vocabulary
        o Methods for smoothing LMs
   * Supervised and unsupervised learning of LMs
        o Automated classification of basic units
        o Introducing linguistic knowledge into LMs
   * Methods for LM learning
        o EM, MMI, others?
   * Evaluation of Language Models
   * Complexity and LM theory

   * Applications:
     - Speech Recognition
     - Machine Translation
     - Information Retrieval

Format

  ------------------------------------------------------------------------

Papers (25 pages maximum) are to be submitted in Word ou LaTeX. Style
sheets are available at HERMES : < http://www.hermes-science.com/ >.

Language

  ------------------------------------------------------------------------

Articles can be written either in French or in English, but English
will be accepted from non-French speaking authors only.

Deadlines

  ------------------------------------------------------------------------

Submission deadline is October 7, 2002.

Articles will be reviewed by a member of the editorial board and two
external reviewers designed by the editors of this issue. Decisions of
the editorial board and referees' report will be transmitted to the
authors before November 20, 2002.

The final version of the accepted papers will be required by February
20, 2003. Publication is planned during the spring of 2003.

Submission

  ------------------------------------------------------------------------

Submissions must be sent electronically to:

tal.ml at limsi.fr)

alias for e-mail adresses of Michele Jardino  ( jardino at limsi.fr) and
Marc El-Beze   ( marc.elbeze at lia.univ-avignon.fr),

or, in paper version (four copies), posted to:

Marc El-Beze Laboratoire d'Informatique
LIA - CERI BP 1228
84 911 AVIGNON CEDEX 9 FRANCE

---------------------------------------------------------------------------
LINGUIST List: Vol-13-2577