[Corpora-List] little favour

Sun Sep 9 16:11:43 UTC 2007

John, 
the problem as I see it is that the order is opposite the one you
describe (and the one you describe is unojectionable):

>>From what I gathered from the previous mails, things developed this way:

First, one has a corpus description of a verb (which in principle has
stimulated and broadened the knowledge of the verb in question). And we
all agree how good this is :-)

Then, afterwards, one uses linguists as testers (?) without even telling
them that they are testing something, because they don't know that this
description already exists (or if they know, they did not see it).

Linguists (or native speakers, this is not really the point) come up
with their own individual descriptions (ie. sentences). Let us say (for
the sake of the argument), the same number of sentences that were used
to create the original corpus description in the first place.

Do you think that the description of this new "corpus" is a way to test
the original corpus-based description? If yes, and the two descriptions
do not agree (which I bet they won't), do you mean one should try a
merge of the two? 

Or do you mean that this is a new sample procedure that is legitimate
_after_ one came up with some hypotheses based on the orginal corpus
description, just a quick way to amass more data to test (but not
invalidate) corpus-based material?

Diana

> -----Original Message-----
> From: John F. Sowa [mailto:sowa at bestweb.net] 
> Sent: 9. september 2007 17:35
> To: Santos Diana
> Cc: W.Louw; corpora at uib.no
> Subject: Re: [Corpora-List] little favour
> 
> Diana,
> 
> Very similar phenomena arise in every field:
> 
>  > what is the relationship between corpus data (whose use  > 
> Ramesh so well defended to give an appropriate description  > 
> of an English verb) and the (hypothesized poor) correlation  
> > with made-up sentences from corpus linguists, which 
> Geoffrey  > Sampson and Ulf Magnusson consider such an 
> interesting question.
> 
> Anyone who designs a system of any kind (anything ranging 
> from a computer program to an automobile to a theory of 
> grammar) has a well-defined view of the system, and that view 
> will guide (usually unconsciously) how the designer chooses 
> test cases.
> 
> For that reason, companies typically make sure that the 
> people who test a program (or automobile) have different 
> skills, habits, and background from those who designed it.
> 
> For similar reasons, it is very difficult for a linguist who 
> has strong views about how language "should" work to imagine 
> all the possible ways that people actually use language.
> 
>  > I am absolutely for using our own language skills and 
> introspection  > to interpret corpus data.
> 
> That is a good way of using a corpus -- as a tool for 
> stimulating or broadening one's intuition.
> 
> John
> 
> 

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora