[Corpora-List] Is a complete grammar possible (beyond thecorpus itself)?

Rob Freeman lists at chaoticlanguage.com
Mon Sep 10 23:52:59 UTC 2007


Chris,

On 9/10/07, chris brew <cbrew at acm.org> wrote:
>
>
> What does it mean when we label a tree-bank, or tag a corpus? What theory
> > is behind the idea of "parts-of-speech"?
> >
> ...
> This whole enterprise (not just Hockenmaier and Steedman, but ling-banking
> in general) strikes me as exactly "doing syntax", with rigour, on corpora.
>

Yes, work on tree-banks or tagging is purely generative in concept, that was
my point.

Now I feel the more someone identifies themselves as a "corpus linguist" the
less rigour with which they are likely to apply generative theory, e.g. John
Sowa with his insistence that we reject all formal theory.

But my point was that when corpus linguists do syntax what comes out is
mostly generativism. Your description of the tree-bank status quo bears this
out.

The effect of "corpus linguistics" on the way we do syntax has been nil (for
lexicon it has been a revolution, but for syntax, nothing.)

There is a vague idea we have to merge "lexical" and "syntactic" aspects of
text, but no-one has a clue how to do that.

What has changed is that we have stopped doing syntax. Sure, we've gained a
> > lot of insight about the importance of lexicon and phraseology. That is not
> > to be sniffed at. But when we try to do syntax what comes out is still
> > mostly generativism, without the rigour.
> >
>
> There may be a disconnect between the live issues in current formal syntax
> research and the concerns that are foregrounded in recent ACL papers, and
> there may be scope for deeper thinking about what it is that the learning
> systems are trying to learn, but I see plenty
> of rigour and care in the machine learning work, and some deep thinking on
> the bigger issues. I don't think things are that bad.
>

I see no new thinking. Yorick Wilks summed it up for corpus linguistics:
there are symbolic approaches, there are statistical approaches, and there
are those who say "trust the text" and leave it at that. This has not
changed for... 20 years(?) Statistical approaches are broadly generative
anyway (with the innateness hypothesis ignored), so it is really just
"generative" and those who say "trust the text."

Occasionally you see a presentation which admits the need for something new.
Viv Yngve impressed me with his courage to say we need to go back and
re-examine all our assumptions
http://www.dcs.shef.ac.uk/~yorick/YngveInterview.html<http://www.dcs.shef.ac.uk/%7Eyorick/YngveInterview.html>.
It was great to hear him say he abandoned the "depth hypothesis" for which
he is famous, because he was forced to conclude "there are different ways of
drawing tree structures". If only those managing tree-bank projects had
similar courage.

Not only is there nothing new, there is no willingness to contemplate
anything new.

In this thread no-one has challenged my theoretical claims. There has been
plenty of misinterpretation and arguing about definitions, but no-one has
said "Ah, you claim grammar may be necessarily incomplete, but this is
incorrect because..."

Which is a pity, because I have just realized the possibility is buried in
formal grammar theory, so it should be more accessible. The idea that
grammars may need to be incomplete is part of the theory. But where he could
see complexity, Mike Maxwell persists in seeing only "errors", and you would
prefer to ignore the possibility because you "don't think things are that
bad."

So _still_ no-one has considered the possibility.

No, things are not too bad. It is just we don't know how language works.
While we are happy with that it is unlikely to change.

-Rob
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20070911/f92a03ba/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list