[lg policy] CFP: Information Retrieval and Information Extraction for less resourced langua

Harold Schiffman hfsclpp at GMAIL.COM
Wed Jul 8 16:19:06 UTC 2009

Forwarded  From:  saltmil at yahoogroups.com

[Apologies in advance for any multiple postings]

                         Call for participation

           Information Retrieval and Information Extraction
                      for Less Resourced Languages

                  SEPLN 2009 pre-conference workshop
                   University of the Basque Country
             Donostia-San sebastián. Monday 7th September 2009
         Organised by the SALTMIL Special Interest Group of ISCA

Deadline for early registration: 15th July 2009

Details on how to register:


 09:00 Registration
 09:15 Opening
 09:30 Invited Talk. Lars Borin
 10:30 Papers  (20+5) min.
     1. Information retrieval and extraction in Maltese and Hebrew:
         Issues in creating web-based corpora and lexical tools for
         less-resourced languages.
         Adam Ussishkin, Jerid Francom, Dainon Woudstra
     2. TETEYEQ: A mharic question answering for factoid question.
         Seid Muhie Yimam, Mulugeta Libsie

 11:20 Coffee break

 11:40 Papers (20+5) min.
     3. Using Wikipedia for Named Entities Translation
         Izaskun Fernandez, Iñaki Alegria, Nerea Ezeiza
     4. Ihardetsi: A Question Answering system for Basque built on
         reused linguistic processors.
         Iñaki Alegria, Olatz Ansa, Xabier Arregi , Arantza Otegi,
         Ander Soraluze

 12:30 Projects (10 min. each)
     1. Babelium Project. Promoting the Use and Learning of Minority
        Juan A. Pereira Varela, Silvia Sanz-Santamaría, Julián
        Gutiérrez Serrano.
     2. A web-based system for multilingual school reports
        David Chan, Dewi Jones, Oggy East
     3. The SALT Cymru Special Interest Group – European Funding
        Encouraging Collaboration Between Academia and Business in
        Wales within the field of Speech and Language Technology.
        Gruffudd Prys
     4. Automated English subtitling of Welsh TV Programmes
         Llio Humphreys
     5. A Dictionary Shell
        Florie Moulin, Laura Laluque, Geróid Ó Néill

 13:20 Panel
      "Less resourced languages and Language technology.
       Short- and medium-term objectives"

 13:45 Closing


The phenomenal growth of the Internet has led to a situation where, by
some estimates, more than one billion words of text is currently
available. This is far more text than any given person can possibly
process. Hence there is a need for automatic tools to access and process
his mass of textual information. Emerging techniques of this kind
include Information Retrieval (IR), Information Extraction (IE), and
Question Answering (QA)

However, there is a growing concern among researchers about the
situation of languages other than English. Although not all Internet
text is in English, it is clear that non-English languages do not have
the same degree of representation on the Internet. Simply counting the
number of articles in Wikipedia, English is the only language with more
than 20 percent of the available articles. There then follows a group of
17 languages with between one and ten percent of the articles. The
remaining 245 languages each have less than one percent of the articles.
Even these low-profile languages are relatively privileged, as the total
number of languages in the world is estimated to be 6800.

Clearly there is a danger that the gap between high-profile and
low-profile languages on the Internet will continue to increase, unless
tools are developed for the low-profile languages to access textual
information. Hence there is a pressing need to develop basic language
technology software for less-resourced languages as well. In particular,
the priority is to adapt the scope of recently-developed IE, IR and QA
systems so that they can be used also for these languages. In doing so,
several questions will naturally arise, such as:

     *  What problems emerge when faced with languages having different
       linguistic features from the major languages?
     *  Which techniques should be promoted in order to get the maximum
       yield from sparse training data?
     *  What standards will enable researchers to share tools and
       techniques across several different languages?
     *  Which tools are easily re-useable across several unrelated

It is hoped that presentations will focus on real-world examples, rather
than purely theoretical discussions of the questions. Researchers are
encouraged to share examples of best practice -- and also examples where
tools have not worked as well as expected. Also of interest will be
cases where the particular features of a less-resourced language raise a
challenge to currently accepted linguistic models that were based on
features of major languages.


     * Kepa Sarasola, University of the Basque Country
     * Mikel Forcada, Universitat d'Alacant
     * Iñaki Alegria.  University of the Basque Country
     * Xabier Arregi,  University of the Basque Country
     * Arantza Casillas. University of the Basque Country
     * Francis Tyers, Universitat d'Alacant
     * Briony Williams, Language Technologies Unit, Bangor University,
       Wales, UK


The organisers


N.b.: Listing on the lgpolicy-list is merely intended as a service to
its members
and implies neither approval, confirmation nor agreement by the owner
or sponsor of the list as to the veracity of a message's contents.
Members who disagree with a message are encouraged to post a rebuttal.
(H. Schiffman, Moderator)

For more information about the lgpolicy-list, go to

This message came to you by way of the lgpolicy-list mailing list
lgpolicy-list at groups.sas.upenn.edu
To manage your subscription unsubscribe, or arrange digest format: https://groups.sas.upenn.edu/mailman/listinfo/lgpolicy-list

More information about the Lgpolicy-list mailing list