Hi Paula,<br><br>Sorry to come down hard on your earlier post. It was a cumulative reaction to of a number of messages which seemed to question not only my concrete suggestions, but any desire to move from the status quo at all.

<br><br><div><span class="gmail_quote">On 9/14/07, <b class="gmail_sendername">Paula Newman</b> <<a href="mailto:paulan@earthlink.net">paulan@earthlink.net</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div>

<div>

<div>Rob,</div>

<div> </div>

<div>Re:</div>

<div>RF> does the study of language have to be divided up in the ways you describe?</div>

<div><br>Of  course not.  I was providing a framework in which to ask a question, namely, what is the purpose of your  proposal? </div>

<div> Is it to further the study of language?  To develop methods of implementing NL processors?  </div>

<div>To form the basis for new formalisms useful in both contexts? To develop new types of corpus annotation? Or?</div>

<div> </div>

<div>And that was just to get at (i.e. pin down) what you are actually suggesting.  </div>

<div> </div>

<div>Perhaps another way of getting there is via another question: </div>

<div>given that you have an idea in mind that you seem to think is new, how would you pursue it? </div>

<div> </div>

<div>That words, meanings, and the contexts in which they occur are interdependent is well known.  </div>

<div>What new approach are you proposing to deal with that fact?  People have been struggling over it for years, on both theoretical and practical levels?</div></div></div></blockquote><div><br>No-one has suggested a treatment of syntax which makes generalizations about word associations, ad-hoc, in context.

<br><br>This is of immediate relevance for machine learning. In machine learning work it is the goal of finding a complete grammar which needs to change, nothing else.<br><br>But the question "how would you pursue it" is as broad as the subject. As I say the implications for machine learning are that we should stop looking for complete grammatical descriptions of corpora (and focus instead on software for generating very precise incomplete generalizations, at will.)

<br><br>That is just the beginning. Grammatical incompleteness doesn't just suggest we should stop trying to label texts automatically, it suggests we should stop trying to label texts at all. What is the purpose of labeling your text if someone else can label it another way, and be right too.

<br><br>If we must label then we need to focus on talking about justifications for labels, not the labels themselves. Labels only give a point of view. (In principle corpus linguists already reject labels. In practice many use them, and their provisional status is not always clear. Formal incompleteness gives those corpus linguists who reject grammatical summarizations of corpora a first principles explanation _why_ corpora can't be summarized.)

<br><br>It suggests changes in the way we should teach language. If the corpus is the most complete description of a language, then we should teach examples, not grammar. If grammar can only be understood in terms of ad-hoc generalizations over examples, then grammatical explanations of language will be meaningless in the absence sufficient exposure to examples.

<br><br>There are implications for search engines. I'm suggesting language works much like an indexed search engine (ad-hoc search.)<br><br>As fields of technology natural language and indexed search are currently in disconnect. They should be the same.

<br><br>Arguably indexed search is already the most successful "natural language" technology of all time (check my definition: it does stuff with text, and it makes money...) But while search engine results can be seen as ad-hoc categories of "meaning", these categories are currently found by search engines solely with reference to lexicon. (The information currently given to search engines is a bit like Mike Maxwell's "syntaxless" example: "garden-the-to accompanied tomato-plant-his Tom".) If we now have a theory for the way syntax selects meaningful categories, ad-hoc, from text, in principle we could have a Web search that reflects the syntax of your query, not only the words. (And do this properly, mark you! Attempts to apply natural language to search have failed up to now because our model of natural language, and what we find works for search, have been different. Search indexes and clusters ad-hoc, NLP tries to find global classes. I would base them both on the same ad-hoc search model. The current model of search, which finds ad-hoc patterns among documents by indexing them on words, would be integrated with a model of meaning which uses ad-hoc syntax to make subtle distinctions between different uses of the same word--distinguishing different uses of "the man with a stick" for instance.)

<br><br>While we are indexing information more effectively, why stop at text? A model of language based on ad-hoc classes suggests why speech recognition does not work well. As I pointed out, this problem of incompleteness was first observed in phonemic categories. But what did linguistics do? Our reaction was to drop phonemics as a study and let the engineers get along as best they could! Now we can help them. The answer is that the categories of speech need to be treated on an ad-hoc basis, not learned globally in hidden markov models.

<br><br>Taking this to its logical extreme, it says things about the way we need to model knowledge. There is currently a vast disconnect between the way computers work and the way we think (exemplified by the way language works.) This is evident in the gap people see between "formal models" and natural language. If we can bridge that gap the possibilities are breathtaking.

<br><br>How much more do you want me to write?<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div>RB> I think the idea of "informal grammar" is a muddle too. I don't think grammar is "informal", I think it is "necessarily incomplete".

<div> </div>

<div>OK, I thought it was your term.  But, and as you have been advised many times, everyone knows the latter.  The observation that  "Any grammar leaks" is a very old one.  I used to think it was by Jane Robinson, but I've recently seen an attribution to Sapir.

</div></div></blockquote><div><br>Absolutely. However while we all seem to know this "informally", formally it has been ignored. This knowledge has not changed what we do one bit. "Gee, all our grammars leak. Oh well, better just look harder."

<br><br>If all grammars leak, why are we still looking for grammars? How about turning that around. Maybe the "leaks" are the system we've been looking for.<br><br>The history of science is full of such reversals. Maybe one is needed here.

<br><br>-Rob</div></div>