[Corpora-List] ANC, FROWN, Fuzzy Logic

Mon Jul 24 15:25:40 UTC 2006

Daoud Clarke wrote:
>
> As far as I understand it, fuzzy logic isn't about uncertainty in
> qualities, it is about degrees of qualities, or vagueness.
>
> <snip>

All this bit about fuzzy sets and Bayesian inference was very well put,
and I find nothing to disagree with.

> I think perhaps what the reference to Greg Chaitin's work was getting
> at was perhaps related to the following. In practice we are always
> faced with a finite corpus, whereas the theoretical corpora generated
> by rules are infinite. We can view our finite corpus as a sample from
> some hypothetical infinite corpus. The question is, what theory gives
> us the best estimate of this infinite corpus, given the finite sample?
> Using our finite corpus we can form theories about the infinite corpus,
> which may or may not incorporate our linguistic knowledge of the
> language in question. From an information theoretic perspective, the
> best theory would be the one that enabled us to express the finite
> corpus using the least amount of information -- the one that best
> compressed the information in the corpus.
>
> Of course theories become large and unwieldy, so we may prefer the
> minimum description length principle: the best theory for a sequence of
> data is the one that minimises the size of the theory plus the size of
> the data described using the theory.
>
> Some of this has been put into practice by Bill Teahan, who applies
> text compression techniques to NLP applications. It would be extremely
> interesting however to see whether the use of linguistic theories can
> help provide better text compression. To my awareness this has not been
> looked into.

I'd just want to point out that theory evaluation metrics based on
description length are only useful for some purposes, and that one need
not use them except when one's purposes are appropriate to such
evaluation. (There are no "universal" theory evaluation metrics, because
the space of purposes to which a theory can be put is infinite. I see this
as one of the root Cartesian flaws.)

A model that also predicted neuropsychological phenomena during speech
would be more useful in my book than one that only produced a formal
grammatical abstraction of utterances.

A model that also captured phenomena of language evolution over a social
network would be more useful in my book than one that only feeds a
treebank.

-- Mark

Mark P. Line
Polymathix
San Antonio, TX