[Corpora-List] Is a complete grammar possible (beyond thecorpus itself)?

Wed Sep 12 02:42:00 UTC 2007

On 9/12/07, John F. Sowa <sowa at bestweb.net> wrote:
>
>
> I think we agree on the following points:
>
>   1. When a system (human or computer) finds an unrecognizable
>      grammatical pattern, it should make its best effort to find
>      some interpretation for it.

This sounds like ad-hoc generalization, so this is what I think we need,
yes. But in my view this is not an error coping mechanism, it is the
mechanism. This is syntax. Syntax is the process of finding new
generalizations to justify new combinations of words.

  2. But it should also keep some kind of record of the original
>      pattern and the interpretation made.  That is necessary
>      in order to make a generalization, if the same or similar
>      pattern occurs again.

The combination is stored. But this is the path to lexicon. Everything
stored has the nature of lexicon. There is no "grammar" as such. There is
only the tendency to repetition, which is lexicon, and the ability to make
new (context specific) generalizations, which is syntax.

  3. If another unrecognizable pattern comes along, the system
>      should check whether there were earlier patterns like it
>      in whatever storage is used for "nonce grammar" instances.

An individual (context specific) generalization should be stored in case of
repetition, I grant you. Though once again this is best seen as lexicon, not
syntax. Syntax should be seen as the act of new (context specific)
generalization.

  4. As more examples of a nonce-grammar pattern accumulate,
>      its status increases from "probable error" to "temporary
>      innovation" to "common in the genre" to "standard".

I don't see the path as "probable error" to "temporary innovation" to
"standard". I see the path as "novel generalization" to "repeated
generalization" to, eventually, ossified generalization in lexicon (which is
often no longer justified by generalizations in the wider language.)

  5. The above points imply that some kind of storage is
>      required for every unrecognized pattern -- at least
>      until it has been assimilated into some encoding that
>      is similar in nature to the encoding of whatever is
>      typically called the "standard" grammar.

Every pattern is recognized, more or less. That is what syntax does, it
makes new generalizations. As each new pattern is repeated it becomes
assimilated. That assimilation is what we call lexicon.

But if you are valuing ad-hoc generalization, great. That is what I think we
need. Call it an "error coping mechanism" if you will.

Basically, to sum up, if we model syntax as ad-hoc generalization over a
corpus of examples, I think we can solve it. The point of view I've been
trying to present here is that we have failed to model syntax effectively
because we have assumed grammatical generalizations over corpora must be
complete.

Drop this one assumption, and I think we will start to get good results
immediately.

I hope people reading this will now have at least an awareness of that idea.

RF> There is a vague idea we have to merge lexical and syntactic
> > aspects of text, but no one has a clue how to do that.
>
> I would say that there are many clues, many proposals for doing
> different kinds of mergers, but not enough evidence to make a
> good recommendation about which one(s) to choose.
>
> I am encouraged by a steady stream of recent publications
> that indicate the "mainstream" is creeping along in this
> direction.  Following are a few (check Google for full ref's):
>
>   - _Simpler Syntax_ by Culicover & Jackendoff (2005) is a
>     recognition by long-time Chomskyans that a major overhaul
>     is long overdue.  However, they are still trying to preserve
>     a very large part of the results obtained by the Chomskyan
>     linguists in a way that is fairly conservative.
>
>   - _Dynamic Syntax_ by Kempson, Myer-Viol, & Gabbay (2001)
>     is a more radical approach to syntax, but the semantic
>     theory by Gabbay is a very formal logic-based approach.
>     Gabbay uses "decorated trees" instead of a linear notation
>     for the logic, which I like, since conceptual graphs can
>     be viewed as "decorated trees" glued together in similar ways.
>     But I believe the logic should be as dynamic as the syntax.
>
>   - _Cognitive Linguistics_ by Croft & Cruse (2004) combines
>     radical construction grammar (RCG) with lexical semantics
>     in a way that makes both more dynamic than the above approaches.
>     RCG does allow syntax to evolve from more primitive patterns,
>     but Croft and Cruse don't say how it would be possible for
>     logic to evolve.

Thanks for the references. Any new attempt is to be valued. When the
state-of-the-art is clearly flawed innovation must be the norm.

The fact that children by the age of 3 use words for the logical
> operators (e.g., 'and', 'not', 'some', and others) indicates that
> logic somehow evolves out of the infant's early one and two-word
> phrases.  And the fact that all mathematicians, logicians, and
> computer programmers use NLs to explain the most abstruse theories
> imaginable indicates that there is no limit to how far the
> expressive power can evolve.

This is by way of a new topic. They are related, and I'm interested in it,
but I think I'll post it up as another thread and leave this one for any
remaining quibbles about grammatical incompleteness.

-Rob
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20070912/36dc3832/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora