seeking a "readable" corpus annotated with RST relations

Sandra Williams swilliam at CSD.ABDN.AC.UK
Thu Jul 1 18:05:21 UTC 2004


Dear RSTList,

I am working on a project involving readability and Natural Language
Generation. Specifically, I am investigating how discourse-level choices
affect readability of the generated output. In previous work, we analysed
the RST Discourse Treebank Corpus (purchased from the LDC) to acquire
knowledge about how human authors make discourse-level choices. The biggest
problem was that the corpus contained Wall Street Journal Articles which are
not generally very easy to read and this corpus was not therefore very
suitable for our purposes.

We are now searching for a corpus of English texts that is annotated with
discourse relations, similar to the RST Discourse Treebank Corpus, but
containing texts that are easy to read, e.g. text written for children. The
texts must be annotated with discourse relations, preferably using RST.
Ideally, the corpus should be machine-readable, but a hand-annotations would
be okay.

If you know of any such corpus, or similar, that is available for research
purposes, please let me know. I will summarise any useful answers I receive
for the benefit of others in the list.

Many thanks,

Sandra Williams
_____________________________________
Sandra Williams
Department of Computing Science,
Meston Building,
University of Aberdeen,
Aberdeen AB24 3UE
UK
Tel: +44 (0)1224 272839
mobile: 0781 6452184
Email: swilliam at csd.abdn.ac.uk
Web: www.csd.abdn.ac.uk/~swilliam



More information about the Rstlist mailing list