[Corpora-List] ANC, FROWN, Fuzzy Logic
Seth Grimes
grimes at altaplana.com
Thu Jul 27 03:17:26 UTC 2006
> And no one ever worried, afaik, about whether the compression had to be
> perfect...
There's a certain irony contained in this sentence, eh?
Seth
On Wed, 26 Jul 2006, Mike Maxwell wrote:
> Rob Freeman wrote:
> > We've been running around for 50 years or more finding incomplete
> > compressions. You would think we'd get the hint.
>
> I don't get the hint, even after you've told me there is a hint :-).
>
> I can certainly believe that human beings internalize a grammar without
> believing that the grammar needs to be "perfect" in any sense.
>
> I can also believe that the grammar does not need to extract every last
> bit of entropy out of the language (and I mean _language_, not corpus,
> see below).
>
> But let's get down to some actual data, and theories. The degree to
> which the compression should proceed was precisely the point behind a
> lot of the arguments--particularly among phonologists, the point is less
> clear in syntax--over abstractness. To take an example, in one of his
> papers Morris Halle argued (or maybe just assumed) that such
> semi-regular verbs in English as 'weep' and 'keep' in fact have a
> rule-governed past tense ('wept' and 'kept', etc.). I, on the other
> hand, think it's completely possible that native speakers of English do
> not extract such a rule (although they do extract the rules for regular
> past tense verbs). (Of course it's possible that some native speakers
> do, and others do not, extract such a rule.)
>
> Another example along the same lines would be the diphthongizing verbs
> in Spanish, like 'venir', whose stem diphthongizes to 'vien' when
> stressed. James Harris has argued for a rule-governed approach, which
> requires a diacritic. Again, it's perfectly possible that native
> speakers of Spanish just memorize the irregular stems, i.e. that their
> internalized grammars don't do perfect compression.
>
> In cases like these, linguists can argue--and have argued--for a greater
> or lesser degree of compression. And no one ever worried, afaik, about
> whether the compression had to be perfect (although admittedly, there
> were some pretty abstract analyses in the bad olde days).
>
> (BTW, it's unclear to me--as I think another poster pointed out--whether
> compression of a corpus by a grammar is at all relevant. What grammars
> do, I would say, is compress the _language_, of which the corpus is but
> a small sample. One can test whether the grammar works by telling how
> well it compresses a given corpus of the language, but I don't see the
> point to asking whether we perfectly compress some arbitrary corpus.)
>
>
--
Seth Grimes Alta Plana Corp, analytical computing & data management
Intelligent Enterprise magazine (CMP), Contributing Editor
grimes at altaplana.com http://altaplana.com 301-270-0795
More information about the Corpora
mailing list