Info: Appel a participation à la campagne INEX NLP Task 2006

Thierry Hamon thierry.hamon at LIPN.UNIV-PARIS13.FR
Fri Mar 3 12:54:38 UTC 2006

Date: Thu, 02 Mar 2006 08:41:09 +0100
From: Xavier Tannier <tannier at>
Message-ID: <4406A195.7060102 at>


INEX NLP Task, Appel à Participation

 Interfaces en Langage Naturel pour la Recherche d'Information
        dans les documents XML.


XML Retrieval

Content-oriented XML retrieval has been receiving increasing interest
fuelled by the widespread use of the eXtensible Markup Language (XML),
as a standard document format. The continuous growth in XML data
sources is matched by increasing efforts in the development of XML
retrieval systems, which aim at exploiting the available structural
information in documents to implement a more focused retrieval
strategy and return document components, the so-called XML elements -
instead of complete documents - in response to a user query.

Implementing this, more focused, retrieval paradigm means that an XML
retrieval system needs not only to find relevant information in the
XML documents, but also determine the appropriate level of granularity
to be returned to the user. In addition, the relevance of a retrieved
component is dependent on meeting both content and structural

NLP in XML Retrieval

For the third year, the INitiative for the Evaluation of XML Retrieval
(INEX) investigates the idea of using the specifics of XML retrieval
to allow users to address content and structural needs intuitively via
natural language queries.

* Like in traditional information retrieval, the user need is loose,
  linguistic variations are frequent, answers are a rank list of
  relevant elements.

* Like in database querying, structure is of importance and a simple
  list of keywords cannot be a sufficient query. Structured query
  languages have been developed, but appear to be difficult to use.

* Furthermore, the size of the unit of information is variable and
  elements overlap in the documents.

Therefore developing natural language interfaces for XML-IR is a
separate research domain requiring its own innovative solutions.

The ultimate goal is to design and build software that will analyse,
understand, and generate results in response to queries that humans
express naturally. The primary objective of retrieval would be to
interpret both structural and content constraints of an information
need expressed in a natural language query (as opposed to the rigid
syntax of XPath). The IR system would not only select and rank
suitable documents, but select the more suitable XML elements within
documents that best satisfy the information need (both accurately and

2006 INEX campaign uses English Wikipedia collection.
Queries will concern any content or structural elements that
can be find in this set of documents, will be written both
in English and in NEXI, a formal structured query language.

in English: "Find lists of air battles in article dealing with World War II"
in NEXI:    //article[about("World War II")]//list[about(. air battle)]

NLP Tasks

There are two distinct tasks in the NLP track in 2006 - NLQ2NEXI and NLP.

* NLQ2NEXI - a simplified task that does not require participants to
index the collection or to implement a search engine. Instead,
NLQ2NEXI requires the translation of a natural language query,
provided in the element of a topic, into a formal INEX query. The
submissions of all participants will be evaluated by a running the
titles on search engine/s that can operate on NEXI expressions. The
objective is to compare the results obtained with natural language
queries (translated into NEXI) with the results that are obtained by
the same search engine/s when using the original NEXI expressions.
This task is designed to allow new participants with NLP expertise to
join the INEX workshop without the need to develop a search engine.

* NLQ - this task has no restrictions on the use of any NLP technique
to interpret the queries as they appear in the <description> element
of a topic. Here participants are required to submit retrieval runs,
but enjoy the freedom to implement any NLP techniques in their search
engine. The objective is not only to compare between different NLP
based systems, but to also compare the results obtained with natural
language queries with the results obtained with NEXI queries by any
other system in the Ad-hoc track. We wish to test whether natural
language queries are effective alternatives to formal queries and to
quantify the trade off in performance.

Important Dates
 March 17: Deadline for declaration of intent to participate.
 May 05: Distribution of sets of topics.
 Jul 14: Submission deadline of search results.
 Dec 18-20: Workshop in Schloss Dagstuhl.

Shlomo Geva       s.geva at
Xavier Tannier    tannier at

Message diffusé par la liste Langage Naturel <LN at>
Informations, abonnement :
English version          : 
Archives                 :

La liste LN est parrainée par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhésion  :

More information about the Ln mailing list