Appel: KBGen Challenge

Thierry Hamon thierry.hamon at UNIV-PARIS13.FR
Sat Dec 8 13:10:59 UTC 2012

Date: 07 Dec 2012 16:38:33 +0100
From: Claire Gardent <claire.gardent at>
Message-Id: <caf27c$5gi44r at>

The KBGen Challenge: Generating from Knowledge Bases

Important Dates

08 December 2012: Registration and Pre-Release Download
01 February 2013: Deadline for feedback and comments on pre-release data
01 March 2013: Release of full KBGen 2013 Task data
01 June 2013: Deadline for System Outputs
August 2013: Reporting and discussing results at ENLG

Call for Pre-Registration and Sample Data Release

We invite teams of researchers to pre-register now for the KBGen 2013
Task by filling in the registration form here:

Once registered, teams will be given access to sample data with which to
familiarise themselves with the input representation formats we have
developed. We welcome comments and feedback till February, 1st 2013. The
complete KBGen data will be distributed on March 1st, and the deadline
for submitting system outputs will be in early June (exact date to be
confirmed). Results and participating systems will be presented at ENLG
in August 2013.

Below we provide a brief overview of the KBGen Task. For more
information please visit the other pages on this site.

KBGen Task

The task for participating teams is to develop systems that map the
input representations provided by the KBGen organisers to sentences, and
to submit system outputs for the inputs in the test data set.


The KBGen Task data is derived from the AURA Knowledge Base which was
developed in the context of the HALO Project at SRI International.  This
knowledge base encodes knowledge contained in a college-level biology
textbook. We have processed and adapted this data so that each input
provided by the KBGen task can be verbalised in a single, possibly
complex, sentence. To minimise the amount of engineering required to
participate, we also make available a lexicon mapping the concepts and
relations present in the KBGen data to words.


Submitted system outputs will be evaluated by a variety of automatic
metrics and human-assessed quality criteria.

Input Representations

The input representations are bundles of triples expressing relations
between entities. For the development phase, the data set will consist
of input and output pairs, where each input is associated with one or
more manually produced sentence verbalising this input.

Here is an example of the input-output pairs that we propose for the

"The temperature of a biomembrane is directly proportional to its
fluidity, and the density of a biomembrane is inversely proportional to
its fluidity."

( Biomembrane2717 fluidity Fluidity-Value2723 )
( Biomembrane2717 temperature Temperature-Value2673 )
( Temperature-Value2673 directly-proportional Fluidity-Value2723 )
( Biomembrane2717 density Density-Value2762 )
( Density-Value2762 inversely-proportional Fluidity-Value2723 ) )
( Biomembrane2717 instance-of Entity )
( Biomembrane2717 instance-of Biomembrane )
( Fluidity-Value2723 instance-of Fluidity-Value )
( Temperature-Value2673 instance-of Temperature-Value )
( Density-Value2762 instance-of Density-Value ))

As mentioned above, we make available a lexicon which maps each concept
and relation present in the input to words.

Organising Team

Eva Banik, Computational Linguistics Ltd, UK
Claire Gardent, CNRS/LORIA, Nancy, France
Eric Kow, Computational Linguistics Ltd, UK

Contact email		 info at

Message diffuse par la liste Langage Naturel <LN at>
Informations, abonnement :
English version       : 
Archives                 :

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  :

More information about the Ln mailing list