Appel: Information Retrieval and Information Extraction for Less Resourced Languages (IE-IR-LRL)

Thierry Hamon thierry.hamon at LIPN.UNIV-PARIS13.FR
Sat Mar 28 11:47:33 UTC 2009

Date: Sat, 28 Mar 2009 07:32:15 +0100
From: "Mikel L. Forcada" <mlf at>
Message-Id: <200903280732.15718.mlf at>



Information Retrieval and Information Extraction for Less Resourced
Languages (IE-IR-LRL)

SEPLN 2009 pre-conference workshop
University of the Basque Country
Donostia-San Sebastián. Monday 7th September 2009
Organised by the SALTMIL Special Interest Group of ISCA

SEPLN 2009:   
Call For Papers:
Paper submission:
Deadline for submission: 8 June 2009

Papers are invited for the above half-day workshop, in the format
outlined below.  Most submitted papers will be presented in poster
form, though some authors may be invited to present in lecture format.


The phenomenal growth of the Internet has led to a situation where, by
some estimates, more than one billion words of text is currently
available. This is far more text than any given person can possibly
process. Hence there is a need for automatic tools to access and
process his mass of textual information. Emerging techniques of this
kind include Information Retrieval (IR), Information Extraction (IE),
and Question Answering (QA)
However, there is a growing concern among researchers about the
situation of languages other than English. Although not all Internet
text is in English, it is clear that non-English languages do not have
the same degree of representation on the Internet. Simply counting the
number of articles in Wikipedia, English is the only language with
more than 20 percent of the available articles. There then follows a
group of 17 languages with between one and ten percent of the
articles. The remaining 245 languages each have less than one percent
of the articles. Even these low-profile languages are relatively
privileged, as the total number of languages in the world is estimated
to be 6800.

Clearly there is a danger that the gap between high-profile and
low-profile languages on the Internet will continue to increase,
unless tools are developed for the low-profile languages to access
textual information. Hence there is a pressing need to develop basic
language technology software for less-resourced languages as well. In
particular, the priority is to adapt the scope of recently-developed
IE, IR and QA systems so that they can be used also for these
languages. In doing so, several questions will naturally arise, such

    * What problems emerge when faced with languages having different
       linguistic features from the major languages?

    * Which techniques should be promoted in order to get the maximum
      yield from sparse training data?

    * What standards will enable researchers to share tools and
      techniques across several different languages?

    * Which tools are easily re-useable across several unrelated

It is hoped that presentations will focus on real-world examples,
rather than purely theoretical discussions of the
questions. Researchers are encouraged to share examples of best
practice -- and also examples where tools have not worked as well as
expected. Also of interest will be cases where the particular features
of a less-resourced language raise a challenge to currently accepted
linguistic models that were based on features of major languages.


Given the context of IR, IE and QA, topics for discussion may include,
but are not limited to:

    *  Information retrieval;
    *  Text and web mining;
    *  Information extraction;
    *  Text summarization;
    *  Term recognition;
    *  Text categorization and clustering;
    *  Question answering;
    *  Re-use of existing IR, IE and QA data;
    *  Interoperability between tools and data.
    *  General speech and language resources for minority languages,
       with particular emphasis on resources for IR,IE and QA.


    * 8 June 2009:  Deadline for submission
    * 1 July 2009:  Notification
    * 15 July  2009: Final version
    * 7 September 2009:  Workshop


    * Kepa Sarasola, University of the Basque Country
    * Mikel Forcada, Universitat d'Alacant, Spain
    * Iñaki Alegria.  University of the Basque Country
    * Xabier Arregi,  University of the Basque Country
    * Arantza Casillas. University of the Basque Country
    * Briony Williams, Language Technologies Unit, Bangor University,
      Wales, UK


    * Iñaki Alegria. University of the Basque Country.
    * Atelach Alemu Argaw:  Stockholm University, Sweden
    * Xabier Arregi, University of the Basque Country.
    * Jordi Atserias, Barcelona Media (yahoo! research Barcelona)
    * Shannon Bischoff, Universidad de Puerto Rico, Puerto Rico
    * Arantza Casillas.  University of the Basque Country.
    * Mikel Forcada:  Universitat d'Alacant, Spain
    * Xavier Gomez Guinovart. University of Vigo.
    * Lori Levin, Carnegie-Mellon University, USA
    * Climent Nadeu, Universitat Politècnica de Catalunya
    * Jon Patrick, University of Sydney, Australia
    * Juan Antonio Pérez-Ortiz, Universitat d'Alacant, Spain
    * Bojan Petek, University of Ljubljana, Slovenia
    * Kepa Sarasola,  University of the Basque Country
    * Oliver Streiter, National University of Kaohsiung, Taiwan
    * Vasudeva Varma, IIIT, Hyderabad, India
    * Briony Williams:  Bangor University, Wales, UK


We expect short papers of max 3500 words (about 4-6 pages) describing
research addressing one of the above topics, to be submitted as PDF
documents by uploading to the following URL:

The final papers should not have more than 6 pages, adhering to the
stylesheet that will be adopted for the SEPLN Proceedings (to be
announced later on the Conference web site).

Mikel L. Forcada <mlf at>

Message diffuse par la liste Langage Naturel <LN at>
Informations, abonnement :
English version       : 
Archives                 :

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  :

More information about the Ln mailing list