Corpora: Chomsky and corpus linguistics

Mike Maxwell Mike_Maxwell at sil.org
Mon Apr 9 14:29:08 UTC 2001


I'll try to keep this one short :-).

Michael Barlow wrote:
>...we can also reasonably assume that language 
>cognitive structures are related in some 
>reasonably direct way to language performance...

That's probably the null assumption, but I don't think it's necessarily correct.

We're all familiar with the use of compilers.  If I can propose another analogy (I hear those groans), the language competence system might be like the source code, and the performance system might be like the compiled code.  The source and compiled versions are by no means direct mappings of each other.  For example, most higher level programming languages now a days eschew the use of 'goto', but compiled code is full of the equivalent 'jump' instructions.  That's just one difference; optimizing compilers make other changes which smudge the map.

I am currently evaluating the use of finite state technology for incremental grammar development.  The particular commercial system I'm testing is optimized for run-time, i.e. performance.  In the source code, you can have ordered phonological rules.  (I know, that's not the current fad in phonology, but...)  The "compiler" essentially pre-applies these rules to the lexicon, to produce a transducer that goes straight from gloss ("semantics") to surface form.  There is no trace left of the original phonological rules, and needless to say no trace of the rule ordering.  It has been suggested that the human phonology system, in its mature state, might be like that.  Learning a new word would presumably require applying the phonological rules, then merging the result with the "compiled" grammar--exactly what I'm trying to find an efficient way to do for incremental grammar development.  That requires leaving the ordered rules around somewhere, but it certainly does not require them to be applied in your every day word crunching.

S.t. similar might happen in syntax.  Chomsky long ago remarked that the human performance system has a hard time with center embedding, and could even be based on a finite state system.  The fact that we can figure out deeply center embedded sentences, given time, indicates that we have some language capabilities which we do not use in the normal course of events, and which are certainly not limited to finite state languages.  Of course, whether you consider those other capabilities part of our core language ability is another question...

All of this to say, there need not be any direct (meaning 'simple', 'straightforward') connection between performance and competence.

>What if Chomsky provided the answer to the miracle 
>of language learning?  Would that give us the theory 
>we needed to understand what we know when we
>know a language? No. We would only know how 
>we start to learn a language.

I think this statement is more applicable to approaches like that of Mike Tomasello (described in Michael Barlow's msg), which start with the data that children are actually exposed to.  Chomsky's approach, as Michael described, glosses over that sort of data, in favor of describing what the mature language capability (competence) is, that is, trying to use the outputs that black box produces to figure out what's inside it.  Based on assumptions about what input children get in learning, Chomsky _then_ asks how the inferred structure of the black box might have gotten there, i.e. how language learning might have worked.  But that requires first figuring out what the structure in the box is, which is exactly "what we know when we know a language."  (BTW, Noam's wife Carol has done research on syntactic acquisition, perhaps more along the lines of what Tomasello and others have done, but at--as I recall--an intermediate level, i.e. in children age 5-10.)

>Chomsky acts more like a philosopher than any regular scientist

Ah, but in Newton's time, there was no such distinction.  (No, that is not intended as a serious answer to Michael Barlow's comment, just a gibe!  It's just that I've already gotten beyond a short email...)

      Mike Maxwell
      Summer Institute of Linguistics
      Mike_Maxwell at sil.org



More information about the Corpora mailing list