Appel: KBGen Challenge, Last Call for Pre-Registration

Thu Mar 14 13:24:48 UTC 2013

The KBGen Challenge: Generating from Knowledge Bases

Organised by:
Eva Banik, Computational Linguistics Ltd, UK
Claire Gardent, CNRS/LORIA, Nancy, France
Eric Kow, Computational Linguistics Ltd, UK
Nikhil Dinesh, SRI International, Menlo Park, California, USA

Endorsed by SIGGEN, the ACL Special Interest Group on Generation.

Important Dates

Please note: It is still possible to download the pre-release dataset
and join the campaign.

08 December 2012: Pre-Release of partial KBGen 2013 Task data
Around 15 March 2013: Release of full KBGen 2013 Task data
Early June 2013: Release of Test Data and Deadline for System Outputs
August 2013: Reporting and discussing results at ENLG

Call for Registration 

We invite teams of researchers to register for the KBGen 2013 Task by
filling in the registration form here:

Once registered, teams will be given access to sample data with which to
familiarise themselves with the input representation formats we have
developed. The complete KBGen data will be distributed around March
15th, and the deadline for submitting system outputs on unseen test data
will be in early June (exact date to be given later). Results and
participating systems will be presented at ENLG in August 2013.

Below we provide a brief overview of the KBGen Task. For more
information please visit the other pages on the KBGen site

KBGen Task

The task for participating teams is to develop systems that map the
input representations provided by the KBGen organisers to sentences, and
to submit system outputs for the inputs in the test data set.


The KBGen Task data is derived from the AURA Knowledge Base which was
developed in the context of the HALO Project at SRI International.  This
knowledge base encodes knowledge contained in a college-level biology
textbook. We have processed and adapted this data so that each input
provided by the KBGen task can be verbalised in a single, possibly
complex, sentence. To minimise the amount of engineering required to
participate, we also make available a lexicon mapping the concepts and
relations present in the KBGen data to words.


Submitted system outputs will be evaluated by a variety of automatic
metrics and human-assessed quality criteria.

Input Representations

The input representations are bundles of triples expressing relations
between entities. For the development phase, the data set will consist
of input and output pairs, where each input is associated with one or
more manually produced sentence verbalising this input.

Here is an example of the input-output pairs that we propose for the

"The rate of detoxification in the liver cell is directly proportional
to the quantity of smooth endoplasmic reticulum in the liver cell."

    :TRIPLES (
            (|Detoxification19144| |base| |Liver-Cell19145|)
            (|Detoxification19144| |rate| |Rate-Value19132|)
            (|Rate-Value19132| |directly-proportional| |Quantity-Value19135|)
            (|Liver-Cell19145| |has-part| |Smooth-Endoplasmic-Reticulum19149|)
            (|Smooth-Endoplasmic-Reticulum19149| |quantity| |Quantity-Value19135|))
            (|Detoxification19144| |instance-of| |Detoxification|)
            (|Rate-Value19132| |instance-of| |Rate-Value|)
            (|Liver-Cell19145| |instance-of| |Liver-Cell|)
            (|Smooth-Endoplasmic-Reticulum19149| |instance-of| |Smooth-Endoplasmic-Reticulum|)
            (|Quantity-Value19135| |instance-of| |Quantity-Value|))
            (|Detoxification19144| |instance-of| |Event|)
            (|Liver-Cell19145| |instance-of| |Entity|)
            (|Rate-Value19132| |instance-of| |Property-Value|)
            (|Quantity-Value19135| |instance-of| |Property-Value|)
            (|Smooth-Endoplasmic-Reticulum19149| |instance-of| |Entity|)))

As mentioned above, we make available a lexicon which maps each concept
and relation present in the input to words.

Contact email:		 info at

