[Corpora-List] ANC, FROWN, Fuzzy Logic

Seth Grimes grimes at altaplana.com
Thu Jul 27 03:17:26 UTC 2006


> And no one ever worried, afaik, about whether the compression had to be
> perfect...

There's a certain irony contained in this sentence, eh?

					Seth


On Wed, 26 Jul 2006, Mike Maxwell wrote:

> Rob Freeman wrote:
> > We've been running around for 50 years or more finding incomplete
> > compressions. You would think we'd get the hint.
>
> I don't get the hint, even after you've told me there is a hint :-).
>
> I can certainly believe that human beings internalize a grammar without
> believing that the grammar needs to be "perfect" in any sense.
>
> I can also believe that the grammar does not need to extract every last
> bit of entropy out of the language (and I mean _language_, not corpus,
> see below).
>
> But let's get down to some actual data, and theories.  The degree to
> which the compression should proceed was precisely the point behind a
> lot of the arguments--particularly among phonologists, the point is less
> clear in syntax--over abstractness.  To take an example, in one of his
> papers Morris Halle argued (or maybe just assumed) that such
> semi-regular verbs in English as 'weep' and 'keep' in fact have a
> rule-governed past tense ('wept' and 'kept', etc.).  I, on the other
> hand, think it's completely possible that native speakers of English do
> not extract such a rule (although they do extract the rules for regular
> past tense verbs).  (Of course it's possible that some native speakers
> do, and others do not, extract such a rule.)
>
> Another example along the same lines would be the diphthongizing verbs
> in Spanish, like 'venir', whose stem diphthongizes to 'vien' when
> stressed.  James Harris has argued for a rule-governed approach, which
> requires a diacritic.  Again, it's perfectly possible that native
> speakers of Spanish just memorize the irregular stems, i.e. that their
> internalized grammars don't do perfect compression.
>
> In cases like these, linguists can argue--and have argued--for a greater
> or lesser degree of compression.  And no one ever worried, afaik, about
> whether the compression had to be perfect (although admittedly, there
> were some pretty abstract analyses in the bad olde days).
>
> (BTW, it's unclear to me--as I think another poster pointed out--whether
> compression of a corpus by a grammar is at all relevant.  What grammars
> do, I would say, is compress the _language_, of which the corpus is but
> a small sample.  One can test whether the grammar works by telling how
> well it compresses a given corpus of the language, but I don't see the
> point to asking whether we perfectly compress some arbitrary corpus.)
>
>

--
Seth Grimes   Alta Plana Corp, analytical computing & data management
              Intelligent Enterprise magazine (CMP), Contributing Editor
grimes at altaplana.com       http://altaplana.com    301-270-0795



More information about the Corpora mailing list