[Corpora-List] Is a complete grammar possible (beyond thecorpus itself)?

John F. Sowa sowa at bestweb.net
Sat Sep 8 19:04:13 UTC 2007


Rob,

The original definition of "generative grammar", which is used
for formal languages, very explicit defines "completeness":

    A language L is defined as the set of all and only those
    sentences that can be generated (or parsed) by a grammar G.

This definition has proved to be very useful for artificial
languages, such as programming languages and formal logics.

But it quickly became obvious that no grammar and parser could
come anywhere close to generating or parsing all and only the
sentences commonly used in any NL.  Therefore, Chomsky qualified
it by saying that G would only describe the "competence" of an
"ideal" speaker, not the performance of any actual speaker.

But even that definition is woefully inadequate, because there
is no grammar/parser combination in existence today that can
correctly parse more than about 50% of the sentences published
in well-edited texts.  (Many parsers can produce parses for more
than 50% of the sentences, but if you eliminate any parse that
has one or more errors, as judged by a competent linguist, even
the best have difficulty in reaching 50% completely correct.)

 > Take the opposite point of view. Assume only that language is
 > generally computable. Then it may be undecidable.

I don't know what you mean by "computable".  But the question
of undecidability is trivial to show for any NL grammar in
existence today.  Just pick up any any well-edited book, magazine,
or newspaper you can find around the house.  Then run the sentences
from the first page through the parser.  That will demonstrate
that at least 99% of the grammars fail on a small finite set.
In the unlikely event that one of the parsers actually produces
correct parses for all the sentences, just try it on the next
book, magazine, or newspaper.

By the way, you can get higher percentages of correct parses *if*
you supplement the grammar with semantic and pragmatic tests.
But that is harder to implement, and it violates Chomsky's
assumption of the autonomy of syntax.

John


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list