[Corpora-List] deadline extension of call for papers : special issue of TAL on Language Models

Tue Oct 8 22:54:51 UTC 2002

Dear colleague

Could you give a large diffusion to the deadline extension (Mon, 14 October)
of the Call for Papers, concerning a special issue of the TAL journal on
Language Models. (see below)

Thanks in advance
--
Best Regards, Michele Jardino

*****************************************************************************
************** Second call for papers (TAL journal ): Deadline Extension to
  October 14, 2002

                     Automated Learning of Language Models

                           Deadline for submission :

                              October 14, 2002

             Issue coordinated by Michèle Jardino (CNRS, LIMSI),
              and Marc El-Beze    (LIA, University of Avignon) .

---

Language Models (LM) play a crucial role in the working of Automated Natural
Language Processing systems, when real-life problems (often very large ones)
are being dealt with. Instances are Speech Recognition, Machine Translation
and Information Retrieval. If we want these systems to adapt to new
applications, or to follow the evolution in user behaviour, we need to
automatize the learning of parameters in the models we use. Adaptation
should occur in advance or in real time. Some applications do not allow us
to build an adequate corpus, either from a quantitative or qualitative point
of view. The gathering of learning data is made easier by the richness of
Web resources, but in that huge mass, we have to effectively separate the
wheat from the chaff.

When asked about the optimal size for a learning corpus, are we satisfied to
answer "The bigger, the better"?

Rather than training one LM on a gigantic learning corpus, would it not be
advisable to fragment this corpus into linguistically coherent segments, and
learn several language models, whose scores might be combined when doing the
test (model mixture)?

In the case of n-gram models, which is the optimal value for n? Should it be
fixed or variable?

A larger value allows us to capture linguistic constraints over a context
which goes beyond the mere two preceding words of the classic trigram.
However, increasing n threatens us with serious coverage problems. Which is
the best trade-off between these two opposite constraints?
How can we smooth models in order to approximate phenomena that have not
been learned? Which alternatives are to be chosen, using which more general
information (lesser-order n-grams, n-classes?)

Beyond the traditional opposition between numerical and knowledge-based
approaches, there is a consensus about the introduction of rules into
stochastic models, or probability into grammars, hoping to get the best of
both strategies. Hybrid models can be conceived in several ways, depending
on which choices are made regarding both of their sides, and also, the place
where coupling occurs. Because of discrepancies between the language a
grammar generates, and actually observed syntagms, some researchers decided
to reverse the situation and derive the grammar from observed facts.
However, this method yields disappointing results, since it does not perform
any better than n -gram methods, and is perhaps inferior. Shouldn't we
introduce here a good deal of supervision, if we want to reach this goal?

Topics (non-exhaustive list)

   ------------------------------------------------------------------------

In this special issue, we would like to publish either innovative papers, or
surveys and prospective essays dealing with Language Models (LM), Automated
Learning of their parameters, and covering one of following subtopics:

    * Language Models and Resources:
         o determination of the adequate lexicon
         o determination of the adequate corpus
    * Topical Models
    * LM with fixed or variable history
    * Probabilistic Grammars
    * Grammatical Inference
    * Hybrid Language Models
    * Static and dynamic adaptation of LMs
    * Dealing with the Unknown
         o Modelling words which do not belong to the vocabulary
         o Methods for smoothing LMs
    * Supervised and unsupervised learning of LMs
         o Automated classification of basic units
         o Introducing linguistic knowledge into LMs
    * Methods for LM learning
         o EM, MMI, others?
    * Evaluation of Language Models
    * Complexity and LM theory

    * Applications:
      - Speech Recognition
      - Machine Translation
      - Information Retrieval

Format

   ------------------------------------------------------------------------

Papers (25 pages maximum) are to be submitted in Word ou LaTeX. Style sheets
are available at HERMES :
< http://www.hermes-science.com/ >.

Language

   ------------------------------------------------------------------------

Articles can be written either in French or in English, but English will be
accepted from non-French speaking authors only.

Deadlines

   ------------------------------------------------------------------------

Submission deadline is October 14, 2002.

Articles will be reviewed by a member of the editorial board and two
external reviewers designed by the editors of this issue. Decisions of the
editorial board and referees' report will be transmitted to the authors
before November 20, 2002.

The final version of the accepted papers will be required by February 20,
2003. Publication is planned during the spring of 2003.

Submission

   ------------------------------------------------------------------------

Submissions must be sent electronically to:

tal.ml at limsi.fr)

alias for e-mail adresses of Michele Jardino  ( jardino at limsi.fr) and
Marc El-Beze   ( marc.elbeze at lia.univ-avignon.fr),

or, in paper version (four copies), posted to:

Marc El-Beze Laboratoire d'Informatique
LIA - CERI BP 1228
84 911 AVIGNON CEDEX 9 FRANCE

-------------------------------------------------------