Chris,<br><br>On 9/10/07, <b class="gmail_sendername">chris brew</b> <<a href="mailto:cbrew@acm.org">cbrew@acm.org</a>> wrote:<div><span class="gmail_quote"></span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<br><div><span class="q"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div><div>What does it mean when we label a tree-bank, or tag a corpus? What theory is behind the idea of "parts-of-speech"?

</div></div></blockquote></span><div>...<br>This whole enterprise (not just Hockenmaier and Steedman, but ling-banking in general) strikes me as exactly "doing syntax", with rigour, on corpora.

</div></div></blockquote><div><br>Yes, work on tree-banks or tagging is purely generative in concept, that was my point.<br><br>Now I feel the more someone identifies themselves as a "corpus linguist" the less rigour with which they are likely to apply generative theory, 

e.g. John Sowa with his insistence that we reject all formal theory.<br><br>But my point was that when corpus linguists do syntax what comes out is mostly generativism. Your description of the tree-bank status quo bears this out.

<br><br>The effect of "corpus linguistics" on the way we do syntax has been nil (for lexicon it has been a revolution, but for syntax, nothing.)<br><br>There is a vague idea we have to merge "lexical" and "syntactic" aspects of text, but no-one has a clue how to do that.

<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div><span class="q"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div><div>What has changed is that we have stopped doing syntax. Sure, we've gained a lot of insight about the importance of lexicon and phraseology. That is not to be sniffed at. But when we try to do syntax what comes out is still mostly generativism, without the rigour.

</div></div></blockquote></span><div><br>There may be a disconnect between the live issues in current formal syntax research and the concerns that are foregrounded in recent ACL papers, and there may be scope for deeper thinking about what it is that the learning systems are trying to learn, but I see plenty

<br>of rigour and care in the machine learning work, and some deep thinking on the bigger issues. I don't think things are that bad.</div></div></blockquote><div><br>I see no new thinking. Yorick Wilks summed it up for corpus linguistics: there are symbolic

approaches, there are statistical approaches, and there are those who

say "trust the text" and leave it at that. This has not changed for... 20 years(?) Statistical approaches are broadly generative anyway (with the innateness hypothesis ignored), so it is really just "generative" and those who say "trust the text."

<br><br>Occasionally you see a presentation which admits the need for something new. Viv Yngve impressed me with his courage to say we need to go back and re-examine all our assumptions <a onclick="return top.js.OpenExtLink(window,event,this)" href="http://www.dcs.shef.ac.uk/%7Eyorick/YngveInterview.html" target="_blank">

http://www.dcs.shef.ac.uk/~yorick/YngveInterview.html</a>. It was great to hear him say he abandoned the "depth hypothesis" for which he is famous, because he was forced to conclude "there are different ways of drawing tree structures". If only those managing tree-bank projects had similar courage.

<br><br>Not only is there nothing new, there is no willingness to contemplate anything new.<br><br>In this thread no-one has challenged my theoretical claims. There has been plenty of misinterpretation and arguing about definitions, but no-one has said "Ah, you claim grammar may be necessarily incomplete, but this is incorrect because..."

<br><br>Which is a pity, because I have just realized the possibility is buried in formal grammar theory, so it should be more accessible. The idea that grammars may need to be incomplete is part of the theory. But where he could see complexity, Mike Maxwell persists in seeing only "errors", and you would prefer to ignore the possibility because you "don't think things are that bad."

<br><br>So _still_ no-one has considered the possibility.<br><br>No, things are not too bad. It is just we don't know how language works. While we are happy with that it is unlikely to change.<br><br>-Rob<br></div></div>