[Corpora-List] Corpus Benevolence

Adam Kilgarriff adam at lexmasterclass.com
Sat Feb 10 10:56:35 UTC 2007


I'd say that the questions explored here

- how do you describe a corpus?
- how do you compare corpora?
- how do you decide what is suitable for a particular task?

are the meatiest and juiciest and most important in our field

	Adam Kilgarriff


-----Original Message-----
From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On
Behalf Of Alexander Osherenko
Sent: 10 February 2007 10:39
To: Santos Diana; corpora at hd.uib.no
Subject: Re: [Corpora-List] Corpus Benevolence

Hi Diana,

thank you for your comments. I've already thought I'm going mad with my 
ideas. :)

>- on the contrary, if you want to look for the best corpus to test
>something that you have developed and are not sure holds water in other
>conditions, you'd better choose the most different corpus possible (from
>your initial one)
>
>  
>
I do know that the results would be suitable if I take a different 
corpus to test since there are always very many reasons to argue bad 
results. ;-) I probably explain my ideas as follows: I have a small 
corpus that's why I want to extend it. When do I stop to extend? When 
the size of the corpus is big enough and what does it mean "big enough"? 
In my case "opinion mining" does "big enough" correspond to the number 
of "opinionated" expressions?

>I think your "general" measure has to be a
>specifically-related-to-opinion-mining measure...
>
>  
>
It is probably the same what I meant in my previous comment.

>I have also written something on the subject of validating corpus-based
>results, see
>
>Santos, Diana & Signe Oksefjell. "Using a Parallel Corpus to Validate
>Independent Claims", Languages in contrast, Vol. 2(1), 1999, pp.117-132.
>[tell me if you want me to send it to you]
>
>  
>
Could you please send me.

>Hope this is useful,
>  
>
It was very useful.

Alexander

>---------------
>Diana Santos
>www.linguateca.pt
>Linguateca, SINTEF ICT
>Pb 124 Blindern, N-0314 Oslo, Norway
>
>
>  
>
>>-----Original Message-----
>>From: owner-corpora at lists.uib.no 
>>[mailto:owner-corpora at lists.uib.no] On Behalf Of Alexander Osherenko
>>Sent: 8. februar 2007 10:00
>>To: corpora at hd.uib.no
>>Subject: [Corpora-List] Corpus Benevolence
>>
>>Hello!
>>
>>Are there any measures that provide general estimation of the 
>>benevolence of a corpus? The problem is - there are several 
>>corpora, doesn't matter domain-specific or not, and I want to 
>>find a general measure or general hints for choosing one or 
>>another. How can I estimate what corpus I take besides that I 
>>calculate result measures whatever they are and compare them 
>>for every corpus previously chosen by chance? 
>>Something like size, number of sentences, genre...
>>
>>Best,
>>Alexander
>>
>>
>>    
>>
>
>  
>



More information about the Corpora mailing list