[Corpora-List] Chomsky and computationnel linguistics

Mike Maxwell maxwell at umiacs.umd.edu
Thu Jul 12 03:01:34 UTC 2007


This is probably my last posting on this (I know, I said that before...).

Oliver Mason wrote:
>> > I guess it all boils down to repeatability.  My main criticism with
>> > the invented examples of rare events is that you cannot challenge
>> > them, because you can't repeat the analysis with your own data.
>>
>> Exactly, except that you _can_ challenge them.  The made-up examples of
>> subjectless for-to sentences are testable by anyone who speaks that
>> dialect (and it is not an idiolect).
> But _how_ can you test them?  It's all subjective.  Maybe the same
> person that yesterday said a sentence was acceptable has changed their
> mind now and today claims it's wrong.  If you've got a corpus, then
> you can at least show that a particular construction has been used.

(BTW, my "exactly" referred to the (need for) repeatability--I think 
we're in agreement on that.)

If someone changes their mind, then you're right that it's not clear 
what to do with that datum, except search for more data like it--maybe 
clearer examples that make the same point, or examples that clarify why 
the example was borderline (like maybe you chose a pragmatically poor 
example, or the speaker was confusing one word with another), or else 
you look at the same example using more speakers.

But the same thing can happen in corpora, in the sense that while a 
particular construction may have been used once in some corpus, you 
don't know if the author of that construction really intended that.  I 
suspect we've all edited and re-edited papers, and at some point noticed 
that there was an out-and-out error, i.e. something that just wasn't 
"good" English.  Maybe it was the result of a partial correction, or a 
cut-and-paste that went awry, or any number of things.  If we had not 
noticed that error, it would have made it into print and could have 
become part of someone's corpus.  And that datum is no more reliable *as 
a fact about the grammar of a language* than an example which a linguist 
has thought up, but changed his mind about the next day.  (It may be 
useful as something else--an example of a slip of the tongue or pen, or 
part of an argument for a better spell checker, or a datum point in a 
corpus of spelling errors, or even as an indicator of how the mind gets 
confused; but it is probably not a *grammar* fact.)
-- 
	Mike Maxwell
	maxwell at umiacs.umd.edu
	"Theorists...have merely to lock themselves in a room
	with a blackboard and coffee maker to conduct their business."
	--Bruce A. Schumm, Deep Down Things



More information about the Corpora mailing list