Appel: KBGen Challenge, 2nd Call for Pre-Registration

Thierry Hamon thierry.hamon at UNIV-PARIS13.FR
Fri Jan 11 21:10:37 UTC 2013

Date: 09 Jan 2013 16:34:34 +0100
From: Claire Gardent <claire.gardent at>
Message-Id: <c98173$503h5j at>

The KBGen Challenge: Generating from Knowledge Bases

Please Note: We welcome comments and feedback on the pre-release data
till February, 1st 2013.

Important Dates

08 December 2012: Registration and Pre-Release Download
01 February 2013: Deadline for feedback and comments on pre-release data
01 March 2013: Release of full KBGen 2013 Task data
Early June 2013: Release of Test Data and Deadline for System Outputs
August 2013: Reporting and discussing results at ENLG

Call for Pre-Registration and Sample Data Release

We invite teams of researchers to pre-register now for the KBGen 2013
Task by filling in the registration form here:

Once registered, teams will be given access to sample data with which to
familiarise themselves with the input representation formats we have
developed. We welcome comments and feedback till February, 1st 2013. The
complete KBGen data will be distributed on March 1st, and the deadline
for submitting system outputs on unseen test data will be in early June
(exact date to be given later). Results and participating systems will
be presented at ENLG in August 2013.

Below we provide a brief overview of the KBGen Task. For more
information please visit the other pages on the KBGen site

KBGen Task

The task for participating teams is to develop systems that map the
input representations provided by the KBGen organisers to sentences, and
to submit system outputs for the inputs in the test data set.


The KBGen Task data is derived from the AURA Knowledge Base which was
developed in the context of the HALO Project at SRI International.  This
knowledge base encodes knowledge contained in a college-level biology
textbook. We have processed and adapted this data so that each input
provided by the KBGen task can be verbalised in a single, possibly
complex, sentence. To minimise the amount of engineering required to
participate, we also make available a lexicon mapping the concepts and
relations present in the KBGen data to words.


Submitted system outputs will be evaluated by a variety of automatic
metrics and human-assessed quality criteria.

Input Representations

The input representations are bundles of triples expressing relations
between entities. For the development phase, the data set will consist
of input and output pairs, where each input is associated with one or
more manually produced sentence verbalising this input.

Here is an example of the input-output pairs that we propose for the

"The rate of detoxification in the liver cell is directly proportional
to the quantity of smooth endoplasmic reticulum in the liver cell."

    :TRIPLES (
            (|Detoxification19144| |base| |Liver-Cell19145|)
            (|Detoxification19144| |rate| |Rate-Value19132|)
            (|Rate-Value19132| |directly-proportional| |Quantity-Value19135|)
            (|Liver-Cell19145| |has-part| |Smooth-Endoplasmic-Reticulum19149|)
            (|Smooth-Endoplasmic-Reticulum19149| |quantity| |Quantity-Value19135|))
            (|Detoxification19144| |instance-of| |Detoxification|)
            (|Rate-Value19132| |instance-of| |Rate-Value|)
            (|Liver-Cell19145| |instance-of| |Liver-Cell|)
            (|Smooth-Endoplasmic-Reticulum19149| |instance-of| |Smooth-Endoplasmic-Reticulum|)
            (|Quantity-Value19135| |instance-of| |Quantity-Value|))
            (|Detoxification19144| |instance-of| |Event|)
            (|Liver-Cell19145| |instance-of| |Entity|)
            (|Rate-Value19132| |instance-of| |Property-Value|)
            (|Quantity-Value19135| |instance-of| |Property-Value|)
            (|Smooth-Endoplasmic-Reticulum19149| |instance-of| |Entity|)))

As mentioned above, we make available a lexicon which maps each concept
and relation present in the input to words.

Organising Team

Eva Banik, Computational Linguistics Ltd, UK
Claire Gardent, CNRS/LORIA, Nancy, France
Eric Kow, Computational Linguistics Ltd, UK
Nikhil Dinesh, SRI International, Menlo Park, California, USA

Contact email:		 info at

Message diffuse par la liste Langage Naturel <LN at>
Informations, abonnement :
English version       : 
Archives                 :

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  :

More information about the Ln mailing list