[Corpora-List] corpus syntax (and how we can use it to code meaning)

Tue Sep 18 08:27:41 UTC 2007

On Tue, 18 Sep 2007, Rob Freeman wrote:

> I want to summarize some of the more practical aspects of those solutions.

thanks for this summary; now I won't have to re-read the whole thread to 
remind myself what's been discussed :-)

> ..., we might use the context about a word or phrase to
> select, ad-hoc, a class of words or phrases with are similar to that word or
> phrase (in that context.) ...  we can use these true/not
> true distinctions to select both syntax, and meaning, specific to context,
> in ways we have not been able up to now.

This suggests that corpus linguists should be interested in clustering
or unsupervised machine learning of words into classes according to
shared contexts; but they have been investigating this for some time, 
see e.g. papers in Proceedings of ICAME'86, EACL'87. 
The main difference between then and now is compute power: we can now
use more sophisticated clustering algorithms, and cluster according to
more complex context patterns, e.g. Roberts et al in Corpora, vol. 1,
pp. 39-57. 2006.

But my impression is that most Corpus Linguists are not really that
interested in unsupervised Machine Learning, i.e. letting the computer
work out the grammar/semantics "from scratch"; they prefer to examine and
analyse the corpus data "by hand" to select examples to back up their
own theories...

Eric Atwell, Leeds University       WWW/email: google Eric Atwell

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora