13.2532, Calls: Language Models, Applied Linguistics

Fri Oct 4 22:43:19 UTC 2002

LINGUIST List:  Vol-13-2532. Fri Oct 4 2002. ISSN: 1068-4875.

Subject: 13.2532, Calls: Language Models, Applied Linguistics

Moderators: Anthony Aristar, Wayne State U.<aristar at linguistlist.org>
            Helen Dry, Eastern Michigan U. <hdry at linguistlist.org>

Reviews (reviews at linguistlist.org):
	Simin Karimi, U. of Arizona
	Terence Langendoen, U. of Arizona

Consulting Editor:
        Andrew Carnie, U. of Arizona <carnie at linguistlist.org>

Editors (linguist at linguistlist.org):
	Karen Milligan, WSU 		Naomi Ogasawara, Arizona U.
	James Yuells, EMU		Marie Klopfenstein, WSU
	Michael Appleby, EMU		Heather Taylor, EMU
	Ljuba Veselinova, Stockholm U.	Richard John Harvey, EMU
	Dina Kapetangianni, EMU		Renee Galvis, WSU
	Karolina Owczarzak, EMU		Anita Huang, EMU
	Tomoko Okuno, EMU		Steve Moran, EMU
	Lakshmi Narayanan, EMU		Sarah Murray, WSU
	Marisa Ferrara, EMU

Software: Gayathri Sriram, E. Michigan U. <gayatri at linguistlist.org>
          Zhenwei Chen, E. Michigan U. <chen at linguistlist.org>
	  Prashant Nagaraja, E. Michigan U. <prashant at linguistlist.org>

Home Page:  http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Marie Klopfenstein <marie at linguistlist.org>
 ==========================================================================

As a matter of policy, LINGUIST discourages the use of abbreviations
or acronyms in conference announcements unless they are explained in
the text.

=================================Directory=================================

1)
Date:  Wed, 2 Oct 2002 10:20:39 +0200
From:  Jardino <Michele.Jardino at limsi.fr>
Subject:  special issue of TAL on Language Models

2)
Date:  Fri, 04 Oct 2002 16:45:52 +0200
From:  Ruben Chacon <rchacon at flog.uned.es>
Subject:  Re: Call for papers

-------------------------------- Message 1 -------------------------------

Date:  Wed, 2 Oct 2002 10:20:39 +0200
From:  Jardino <Michele.Jardino at limsi.fr>
Subject:  special issue of TAL on Language Models

************************************************************************
Second call for papers (TAL journal ):

                    Automated Learning of Language Models

                          Deadline for submission :

                             October 7, 2002

            Issue coordinated by Michèle Jardino (CNRS, LIMSI),
             and Marc El-Beze    (LIA, University of Avignon) .

- -

Language Models (LM) play a crucial role in the working of Automated Natural
Language Processing systems, when real-life problems (often very large ones)
are being dealt with. Instances are Speech Recognition, Machine Translation
and Information Retrieval. If we want these systems to adapt to new
applications, or to follow the evolution in user behaviour, we need to
automatize the learning of parameters in the models we use. Adaptation
should occur in advance or in real time. Some applications do not allow us
to build an adequate corpus, either from a quantitative or qualitative point
of view. The gathering of learning data is made easier by the richness of
Web resources, but in that huge mass, we have to effectively separate the
wheat from the chaff.

When asked about the optimal size for a learning corpus, are we satisfied to
answer "The bigger, the better"?

Rather than training one LM on a gigantic learning corpus, would it not be
advisable to fragment this corpus into linguistically coherent segments, and
learn several language models, whose scores might be combined when doing the
test (model mixture)?

In the case of n-gram models, which is the optimal value for n? Should it be
fixed or variable?

A larger value allows us to capture linguistic constraints over a context
which goes beyond the mere two preceding words of the classic trigram.
However, increasing n threatens us with serious coverage problems. Which is
the best trade-off between these two opposite constraints?
How can we smooth models in order to approximate phenomena that have not
been learned? Which alternatives are to be chosen, using which more general
information (lesser-order n-grams, n-classes?)

Beyond the traditional opposition between numerical and knowledge-based
approaches, there is a consensus about the introduction of rules into
stochastic models, or probability into grammars, hoping to get the best of
both strategies. Hybrid models can be conceived in several ways, depending
on which choices are made regarding both of their sides, and also, the place
where coupling occurs. Because of discrepancies between the language a
grammar generates, and actually observed syntagms, some researchers decided
to reverse the situation and derive the grammar from observed facts.
However, this method yields disappointing results, since it does not perform
any better than n -gram methods, and is perhaps inferior. Shouldn't we
introduce here a good deal of supervision, if we want to reach this goal?

Topics (non-exhaustive list)

  ------------------------------------------------------------------------

In this special issue, we would like to publish either innovative papers, or
surveys and prospective essays dealing with Language Models (LM), Automated
Learning of their parameters, and covering one of following subtopics:

   * Language Models and Resources:
        o determination of the adequate lexicon
        o determination of the adequate corpus
   * Topical Models
   * LM with fixed or variable history
   * Probabilistic Grammars
   * Grammatical Inference
   * Hybrid Language Models
   * Static and dynamic adaptation of LMs
   * Dealing with the Unknown
        o Modelling words which do not belong to the vocabulary
        o Methods for smoothing LMs
   * Supervised and unsupervised learning of LMs
        o Automated classification of basic units
        o Introducing linguistic knowledge into LMs
   * Methods for LM learning
        o EM, MMI, others?
   * Evaluation of Language Models
   * Complexity and LM theory

   * Applications:
     - Speech Recognition
     - Machine Translation
     - Information Retrieval

Format

  ------------------------------------------------------------------------

Papers (25 pages maximum) are to be submitted in Word ou LaTeX. Style sheets
are available at HERMES :
< http://www.hermes-science.com/ >.

Language

  ------------------------------------------------------------------------

Articles can be written either in French or in English, but English will be
accepted from non-French speaking authors only.

Deadlines

  ------------------------------------------------------------------------

Submission deadline is October 7, 2002.

Articles will be reviewed by a member of the editorial board and two
external reviewers designed by the editors of this issue. Decisions of the
editorial board and referees' report will be transmitted to the authors
before November 20, 2002.

The final version of the accepted papers will be required by February 20,
2003. Publication is planned during the spring of 2003.

Submission

  ------------------------------------------------------------------------

Submissions must be sent electronically to:

tal.ml at limsi.fr)

alias for e-mail adresses of Michele Jardino  ( jardino at limsi.fr) and
Marc El-Beze   ( marc.elbeze at lia.univ-avignon.fr),

or, in paper version (four copies), posted to:

Marc El-Beze Laboratoire d'Informatique
LIA - CERI BP 1228
84 911 AVIGNON CEDEX 9 FRANCE

-------------------------------- Message 2 -------------------------------

Date:  Fri, 04 Oct 2002 16:45:52 +0200
From:  Ruben Chacon <rchacon at flog.uned.es>
Subject:  Re: Call for papers

EIGHTH ANNUAL UNIVERSITY OF SEVILLE CONFERENCE
ON APPLIED LINGUISTICS (ELIA)
Age-related factors in L2 acquisition and teaching
University of Seville, Spain
March 13-14, 2003

CALL FOR PAPERS

Seville, Spain, September 2002

Dear colleague,

The Research Group "La Lengua Inglesa en el Ãmbito Universitario" of the College of
Languages, Literatures and Linguistics at the University of Seville, Spain, announces
the 8th ELIA Conference, which will be held on March 13-14, 2003. The central theme of
this conference will be Age-related factors in foreign/second language acquisition and
teaching.

 Prominent national and international specialists in the broad domain of applied
linguistics have taken part in the seven previous ELIA conferences. The following
scholars, among others, have made a significant contribution to ELIA's success: Dr.
Kathleen Bardovi-Harlig (Indiana University), Dr. Jasone Cenoz (University of the
Basque Country), Dr. Jenny Thomas (University of Wales at Bangor), Dr. Enrique Alcaraz
(University of Alicante), Dr. Gabriele Kasper (University of Hawai'i at Manoa), Dr.
Guy Cook (University of Reading), Dr. Jane Arnold (University of Seville), Dr. Brian
Tomlinson (Leeds Metropolitan University) and Dr. Ãngela Labarca (Georgia Institute
of Technology).

 This year's ELIA will, once again, include the participation of renowned scholars as
plenary speakers, such as Dr. David Singleton (Trinity College, Dublin), Dr. Carme
Muñoz Lahoz (Barcelona University) and Dr. Carmen Pérez Vidal (Pompeu Fabra
University, Barcelona).

 Since the main theme of this year's conference is age-related factors, proposals
based on any aspect of it will be given preference. Nevertheless, proposals based on
other topics may also be admitted as long as they relate to the applied linguistics
fields that are central to ELIA, that is, the teaching, learning/acquisition and
discursive uses of English (either independently or in contact with some other
language). Proposals related to the above-mentioned theme and fields that deal with
Spanish as a target language are also welcome.

 Those interested in taking part as speakers are kindly requested to take the
following points into account:

1. SUBMISSION OF PROPOSALS AND DEADLINE: Application and proposal forms -including a
300 to 500-word abstract and a sequential outline- should be sent to the Secretaría
de ELIA either by regular mail (see below) or electronic mail to elia at siff.us.es
(WORD, RTF, ASCII format) by January 10, 2003.

2. The duration of papers and workshops will be 40 minutes, including 5-10 minutes for
questions and/or comments from the audience. Papers/workshops may be presented either
in English or Spanish. Speakers are requested to devote the first five minutes to
introducing and placing the topic in a wider framework so as to make it easier for the
audience -mostly advanced undergraduate students- to follow the talks.

3. NOTIFICATION OF ACCEPTANCE: Acceptance or refusal of proposals will be notified in
writing by postal mail or e-mail the week of January 20-24, 2003.

4. PUBLICATION: A selection of the papers presented will be published in the fifth
edition of the journal Estudios de Lingüística Inglesa Aplicada (ELIA).

For further information, please contact the organizing committee at the following
address:

elia at siff.us.es      or:

Secretaría de ELIA
Departamento de Lengua Inglesa
Facultad de Filología
Universidad de Sevilla
41004 Seville, Spain

Phones:  (34) 954 55 15 46
 (34) 954 55 15 50
    (34) 954 55 11 81
  Fax:   (34) 954 55 15 16

Thank you for your interest.

---------------------------------------------------------------------------
LINGUIST List: Vol-13-2532