[Corpora-List] Is a complete grammar possible (beyond thecorpus itself)?

Rob Freeman lists at chaoticlanguage.com
Sun Sep 9 05:52:38 UTC 2007


John,

You'll confuse the issue with so many words.

For "completeness" I am happy to agree with Yorick Wilks and equate it with
"decidability". I'm indebted to Yorick for pointing out this was how the
problem was seen by generativists.

What it means to be "computable" was first defined by Alan Turing (and
Alonzo Church?) I do not intend my sense to differ in any way.

The question of decidability is a technical one within this framework.
According to Turing's theory there are computable problems which are not
decidable. It is not a question of adding more information, "semantic" or
otherwise, to make them decidable. They are not decidable because they have
too much power, not too little.

I am suggesting natural language might be such a system.

That would not be a bad thing by the way. Decidability acts as a kind of
straitjacket on computability. It is a limitation on its power. A generally
computable model of natural language would be more powerful than a decidable
model. It could be powerful enough to account for the detail of collocation
and phraseology, for instance.

To get that power we would only need to lose the ability to _label_ language
definitively. That is the content of decidability: the ability to fit
language to a grammar, nothing more. I personally would not be bothered it
if turned out that tags and tree-banks were officially meaningless, and
corpora the most complete description of a language possible, especially if
that meant we could recognize speech accurately, and index information
effectively.

Anyway, I think the possibility is worth considering.

-Rob

 On 9/9/07, John F. Sowa <sowa at bestweb.net> wrote:
>
> Rob,
>
> The original definition of "generative grammar", which is used
> for formal languages, very explicit defines "completeness":
>
>     A language L is defined as the set of all and only those
>     sentences that can be generated (or parsed) by a grammar G.
>
> This definition has proved to be very useful for artificial
> languages, such as programming languages and formal logics.
>
> But it quickly became obvious that no grammar and parser could
> come anywhere close to generating or parsing all and only the
> sentences commonly used in any NL.  Therefore, Chomsky qualified
> it by saying that G would only describe the "competence" of an
> "ideal" speaker, not the performance of any actual speaker.
>
> But even that definition is woefully inadequate, because there
> is no grammar/parser combination in existence today that can
> correctly parse more than about 50% of the sentences published
> in well-edited texts.  (Many parsers can produce parses for more
> than 50% of the sentences, but if you eliminate any parse that
> has one or more errors, as judged by a competent linguist, even
> the best have difficulty in reaching 50% completely correct.)
>
> > Take the opposite point of view. Assume only that language is
> > generally computable. Then it may be undecidable.
>
> I don't know what you mean by "computable".  But the question
> of undecidability is trivial to show for any NL grammar in
> existence today.  Just pick up any any well-edited book, magazine,
> or newspaper you can find around the house.  Then run the sentences
> from the first page through the parser.  That will demonstrate
> that at least 99% of the grammars fail on a small finite set.
> In the unlikely event that one of the parsers actually produces
> correct parses for all the sentences, just try it on the next
> book, magazine, or newspaper.
>
> By the way, you can get higher percentages of correct parses *if*
> you supplement the grammar with semantic and pragmatic tests.
> But that is harder to implement, and it violates Chomsky's
> assumption of the autonomy of syntax.
>
> John
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20070909/61064708/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list