[Corpora-List] Corpus Benevolence

Eric Atwell eric at comp.leeds.ac.uk
Thu Feb 8 09:44:45 UTC 2007


Alexander,

"benevolence" is a term I've not heard of before in Corpus Linguistics,
but I think you mean something like "relevance" or "appropriateness" to the
specific research question...
One hint when selecting a Corpus is to look for similar studies to
yours, and see what Corpus they used; if you use the same corpus, your
results can be directly comparable (moreso than if you experiment with
different corpora). For example, several researchers have measured
prosody predictors using the MARSEC or Aix-MARSEC corpus; several 
parser developers have scored their systems using Penn Treebank samples,
or IPSM test corpus; several PoS-tagger reseachers have used Tagged
Brown corpus as a benchmark.

What is your research topic?  You presumably already know about other
related research, this could also guide you to a "benevolent" corpus.

Regards

Eric Atwell, Leeds University

On Thu, 8 Feb 2007, Alexander Osherenko wrote:

> Hello!
>
> Are there any measures that provide general estimation of the benevolence of 
> a corpus? The problem is - there are several corpora, doesn't matter 
> domain-specific or not, and I want to find a general measure or general hints 
> for choosing one or another. How can I estimate what corpus I take besides 
> that I calculate result measures whatever they are and compare them for every 
> corpus previously chosen by chance? Something like size, number of sentences, 
> genre...
>
> Best,
> Alexander
>



More information about the Corpora mailing list