15.1894, Qs: Basic English Corpus;American First Name Corpus

Tue Jun 22 18:07:27 UTC 2004

LINGUIST List:  Vol-15-1894. Tue Jun 22 2004. ISSN: 1068-4875.

Subject: 15.1894, Qs: Basic English Corpus;American First Name Corpus

Moderators: Anthony Aristar, Wayne State U.<aristar at linguistlist.org>
            Helen Dry, Eastern Michigan U. <hdry at linguistlist.org>

Reviews (reviews at linguistlist.org):
	Sheila Collberg, U. of Arizona
	Terence Langendoen, U. of Arizona

Home Page:  http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Naomi Fox <fox at linguistlist.org>
 ==========================================================================
We'd like to remind readers that the responses to queries are usually
best posted to the individual asking the question. That individual is
then strongly encouraged to post a summary to the list. This policy was
instituted to help control the huge volume of mail on LINGUIST; so we
would appreciate your cooperating with it whenever it seems appropriate.

In addition to posting a summary, we'd like to remind people that it
is usually a good idea to personally thank those individuals who have
taken the trouble to respond to the query.

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.

=================================Directory=================================

1)
Date:  Mon, 21 Jun 2004 07:03:12 -0400 (EDT)
From:  Sandra Williams <swilliam at csd.abdn.ac.uk>
Subject:  Seeking "readable" English RST corpus

2)
Date:  Mon, 21 Jun 2004 15:22:20 -0400 (EDT)
From:  Charlotte Russell <charlotte_russell2003 at yahoo.ca>
Subject:  Transcribed corpus of first name

-------------------------------- Message 1 -------------------------------

Date:  Mon, 21 Jun 2004 07:03:12 -0400 (EDT)
From:  Sandra Williams <swilliam at csd.abdn.ac.uk>
Subject:  Seeking "readable" English RST corpus

Dear Linguist List,

I am working on a project involving readability and Natural Language
Generation. Specifically, I am investigating how discourse-level
choices affect reading ease of the generated output. In previous work,
we analysed the RST Discourse Treebank Corpus (purchased from the LDC)
to acquire knowledge about how human authors make discourse-level
choices. The biggest problem was that the corpus contained Wall Street
Journal Articles which are not generally very easy to read and this
corpus was not therefore not very suitable for our purposes.

We are now searching for a corpus that is annotated with discourse
relations, similar to the RST Discourse Treebank Corpus, but
containing texts that are easier to read. The corpus must contain
English texts. The texts in the corpus could be written for children,
or they could be easier texts written for adults. The texts must be
annotated with discourse relations, preferably using RST. Ideally, the
corpus should be machine-readable.

If you know of any such corpus, or similar, that is available for
research purposes, please let me know. I will summarise any useful
answers I receive for the benefit of others in the list.

Many thanks,

Sandra Williams
University of Aberdeen

-------------------------------- Message 2 -------------------------------

Date:  Mon, 21 Jun 2004 15:22:20 -0400 (EDT)
From:  Charlotte Russell <charlotte_russell2003 at yahoo.ca>
Subject:  Transcribed corpus of first name

Looking for a transcribed corpus of American first names for testing
against a commercial application. Can anyone point me in the right
direction?

Charlotte

---------------------------------------------------------------------------
LINGUIST List: Vol-15-1894