My purpose in recent posts has been to give a new point of view on some old arguments, and to propose some concrete solutions which follow from that new point of view.<br><br>I want to summarize some of the more practical aspects of those solutions. I have not simply been arguing to win points. I think there are practical applications to be made of these ideas, that they can clarify the way we think about language.
<br><br><br>The Basic Idea<br><br>The basic idea relates to an observation first made over 50 years ago, but which I claim was misinterpreted for 50 years.<br><br>The observation was that we cannot make global generalizations about natural language structure.
<br><br>What that means in practice is that there will always be generalizations which can be said to be true _and_ not true about natural language structure. This suggests we must make generalizations about language only with reference to a context.
<br><br>By analogy to formal grammar theory I summarize this by saying that natural language grammar appears to "necessarily incomplete".<br><br><br>True and Not True<br><br>I have illustrated my point with small examples of generalizations that can be said to be both true and not true of words in corpora.
<br><br>For instance:<br><br><br>To Specify Syntax<br><br>Peter Howarth gives the examples of slightly disfluent constructions produced by ESL students (Peter Howarth, Phraseology and Second Language Acquisition, 1998<font size="-1">
<span>)</span></font>:<br><br>e.g. "*_attempts_ and researches have been _done_ by psychologist..."<br><br>That we understand this, and yet that it seems odd, is explained by the observation that "done" and "made" can be considered to have the same syntax in some contexts (
e.g. "do/make a study"), but in the context of "attempt" they do not have the same syntax (for most people?) So it is true that "done" is in a class with "made" in some contexts (e.g
. "a study") but it is not true in other contexts (
e.g. "attempts".)<br><br>By this principle the more contexts two words share in common, the more similar we might expect their syntax to be. While always being aware of the possibility that in detail they may have different behaviour in a given context, (
e.g. "attempts" clearly selects "make" and not "do".)<br><br>This might be useful to explain the seemingly random vagaries of syntax to students in a language learning environment.<br><br>Or it might be used to improve predictions about what word sequences are possible in speech recognition systems (the same could be said of phonemes.)
<br><br><br>To Select Meaning<br><br>The generalizations given above are useful to predict syntax. But such ad-hoc generalizations can be used in another way. They not only restrict syntax in context specific ways. We can reverse our perspective and consider syntax to select classes of ad-hoc generalizations appropriate to a token, and associate these classes with meaning.
<br><br>E.g. for the two sentences:<br><span><br></span> I supported the man with a stick.<br>I accompanied the man with a stick.<br><br>The words "supported" and "accompanied" can not only be thought of as being selected by syntactic generalizations about the phrase "the man with a stick", they can also be thought of as _selecting_ classes of syntactic generalizations about "the man with a stick", which specify one or other meaning for that phrase.
<br><br>As I wrote earlier:<br><br>
'For instance, if the word used selects
a set of contexts which includes the context "tomato plant" we will see
one meaning ("supported" will do this), but if it selects a class which
does not include "tomato plant", we will see another ("accompanied"
will do this.)<br>
<br>
Note: you need an ad-hoc treatment of syntax for this to work.
Otherwise the classes ("the man with a stick" = "tomato plant" or "the
man with a stick" != "tomato plant") will be conflated, and "the man
with a stick" will always be the same.'<br><br>This could be useful for instance in search engines. Currently search engines index only words and phrases. They do not distinguish the meaning of a word or a phrase in the context in which it is used. According to the method outlined here, we might use the context about a word or phrase to select, ad-hoc, a class of words or phrases with are similar to that word or phrase (in that context.) These then might be considered to specify a meaning for that word or phrase. We could search not only for the word or phrase, but that phrase used in the same sense.
<br><br><br>Conclusion<br><br>While the arguments in the preceding threads have often been very theoretical and abstract, in practical terms what I am saying is not difficult. It just requires a slightly different way of thinking about problems. In particular it asks us to consider that there will be things which can be said to be true _and_ not true, of word associations in corpora, depending on context, and suggests that we can use these true/not true distinctions to select both syntax, and meaning, specific to context, in ways we have not been able up to now.
<br><br>Being able to have words and word groups which act both the same _and_ not the same in terms of the ways they associate with other words in a given corpus, means we cannot generalize a complete grammar for them. We will never have a complete grammar for any natural language (beyond the corpus.) We've really known this for a while. As Paula Newman noted it goes back at least to Sapir: "All grammars leak." What we now realize is that these "leaks" are not a bug, but a feature, as the programmers say. Paradoxically it is this same seeming limitation which enables us to pack more information into language than we would normally be able to, viz. the detail of collocational restriction. Most importantly, recognizing such contrasts exist enables us to unpack that information.
<br><br>-Rob<br>