[Corpora-List] corpus ------>>>>> thesaurus
Dominic Widdows
widdows at maya.com
Tue Nov 9 14:42:07 UTC 2004
> Hi Vladimir,
>
> You can find a good introduction to lexical acquisition methods based
> on
> co-occurrence statistics in Manning and Schuetze's "Foundations of
> Statistical Natural Language Processing".
Hi Vladimir,
Just to add to Viktor's suggestion - we have a few demos of thesaurus
generation / lexical acquisition some of which are based directly on
Shuetze's work, at
http://infomap.stanford.edu/webdemo
There are a couple of fairly domain-specific models built from the
Ohsumed medical corpus and the Wall Street Journal (though the latter
has a lot of general topics as well).
You can find links to papers (including work on mapping words and
senses from corpus derived models into hand-built lexical resources)
and some software for processing corpora into vector word-association
models (using a form of latent semantic analysis) from the main site at
http://infomap.stanford.edu/
Best wishes,
Dominic
More information about the Corpora
mailing list