Job: Post Doc position at INRIA, Rennes, France

Fri Mar 10 07:18:03 UTC 2006

Date: Thu, 09 Mar 2006 18:03:13 +0100
From: Pascale Sebillot <Pascale.Sebillot at irisa.fr>
Message-ID: <44105FD1.4010305 at irisa.fr>
X-url: http://www.irisa.fr

Post Doc position at INRIA, Rennes, France.

INRIA is offering a one year position for a post doc in its lab
located in Rennes. This lab is called IRISA (see http://www.irisa.fr).

Title:  

Text databases and natural-language-processing-based information
retrieval: one step to a merging

Subject:  

In most current search engines (Google...), documents and
users'requests are represented by sets of individual terms extracted
from their textual contents.  The matching between a document and a
request only relies on comparisons of these graphical chains, without
taking the richness and flexibility of natural language into account.
Researches in information retrieval (IR) thus tend to try to integrate
some results from natural language processing (NLP) in order to
increase the performances of the search engines.

NLP technologies enrich the representation of text by submitting a
given corpus to empirical analysis. Among its many uses, NLP may solve
word sense ambiguity, label parts of speech of individual words, or
discover collocations such as "New York" or "stock market".
Additionally, NLP may infer semantic relationships among words via
contextual analysis. While it has been widely speculated that such
information would improve IR performance, devising methods for
importing linguistic data into standard retrieval models remains an
unsolved problem; indeed, one of the main recurrent difficulties is to
find and adapt an IR system model to allow information gleaned from
NLP to inform IR.

This quest for performance also has a second aspect: IR systems have
to be relevant and to provide better answers, but they also have to be
rapid and to answer quickly to users, even when they are questioning
huge textual databases. These two points currently seem to contradict
each other.  Increasing the relevance of search engines implies
integrating linguistic information to increase their faculties to
grasp the meaning of documents, and to use more sophisticated models
than the current traditional ones (boolean, vector space,
probabilistic...). But as mentioned above, no effective and efficient
implementation of such models currently exists. On the other hand very
efficient algorithms for IR on huge textual databases have been
developed within the scientific field of text databases (TDB). The
models exploiting linguistic knowledge that are used in these
approaches are however over-simplified.

The aim of the post-doctoral research is to find a way to merge the
models that are the most suitable to integrate linguistic information
and the best search algorithms from TDB.

Background in information retrieval will be favored; strong knowledge
in database implementation also wished

Contact:
- Laurent Amsaleg (Laurent.Amsaleg at irisa.fr)
- Pascale Sébillot (Pascale.Sebillot at irisa.fr)

-------------------------------------------------------------------------
Message diffusé par la liste Langage Naturel <LN at cines.fr>
Informations, abonnement : http://www.atala.org/article.php3?id_article=48
English version          : 
Archives                 : http://listes.cines.fr/wws/arc/ln
                           http://listserv.linguistlist.org/archives/ln.html

La liste LN est parrainée par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhésion  : http://www.atala.org/
-------------------------------------------------------------------------