[Corpora-List] Chomsky and computationnel (sic) linguistics (fwd)

Listserv Administrator listman at listserv.linguistlist.org
Sat Aug 4 18:03:35 UTC 2007


---------- Forwarded message ----------
Date: Thu, 12 Jul 2007 09:21:45 +0100 (BST)
From: Eric Atwell <eric at comp.leeds.ac.uk>
Reply-To: corpora at uib.no
To: Mike Maxwell <maxwell at umiacs.umd.edu>
Cc: corpora at uib.no
Subject: Re: [Corpora-List] Chomsky and computationnel (sic) linguistics

This whole debate assumes that the aim of CORPORA subscribers is to
"describe Language" or catalogue "grammar facts".  I guess many (most?)
subscribers to CORPORA have more tangible aims, e.g. to build systems
or resources with lower error-rates, or higher sales figures -
e.g. dictionaries, annotated corpora, PoS-taggers; even language
teachers aim to produce learners with "lower error rates"!

For this aim, intuitive insights about rare "unrepeatable" events are
unimportant, unless they can improve system performance. The best
systems/dictionaries/... are hybrids, using lots of empirical corpus-based
evidence, and also armchair intuitions IF they are practically useful.

Eric Atwell, Leeds University



   On Wed, 11 Jul 2007, Mike Maxwell wrote:

> This is probably my last posting on this (I know, I said that before...).
>
> Oliver Mason wrote:
>>>> I guess it all boils down to repeatability.  My main criticism with
>>>> the invented examples of rare events is that you cannot challenge
>>>> them, because you can't repeat the analysis with your own data.
>>>
>>> Exactly, except that you _can_ challenge them.  The made-up examples of
>>> subjectless for-to sentences are testable by anyone who speaks that
>>> dialect (and it is not an idiolect).
>> But _how_ can you test them?  It's all subjective.  Maybe the same
>> person that yesterday said a sentence was acceptable has changed their
>> mind now and today claims it's wrong.  If you've got a corpus, then
>> you can at least show that a particular construction has been used.
>
> (BTW, my "exactly" referred to the (need for) repeatability--I think we're in
> agreement on that.)
>
> If someone changes their mind, then you're right that it's not clear what to
> do with that datum, except search for more data like it--maybe clearer
> examples that make the same point, or examples that clarify why the example
> was borderline (like maybe you chose a pragmatically poor example, or the
> speaker was confusing one word with another), or else you look at the same
> example using more speakers.
>
> But the same thing can happen in corpora, in the sense that while a
> particular construction may have been used once in some corpus, you don't
> know if the author of that construction really intended that.  I suspect
> we've all edited and re-edited papers, and at some point noticed that there
> was an out-and-out error, i.e. something that just wasn't "good" English.
> Maybe it was the result of a partial correction, or a cut-and-paste that went
> awry, or any number of things.  If we had not noticed that error, it would
> have made it into print and could have become part of someone's corpus.  And
> that datum is no more reliable *as a fact about the grammar of a language*
> than an example which a linguist has thought up, but changed his mind about
> the next day.  (It may be useful as something else--an example of a slip of
> the tongue or pen, or part of an argument for a better spell checker, or a
> datum point in a corpus of spelling errors, or even as an indicator of how
> the mind gets confused; but it is probably not a *grammar* fact.)
>

--
Eric Atwell,
Senior Lecturer, Language research group leader, School of Computing
Faculty of Engineering, UNIVERSITY OF LEEDS, Leeds LS2 9JT, England
TEL: 0113-3435430  FAX: 0113-3435468  WWW/email: google Eric Atwell


_______________________________________________
corpora mailing list
corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list