[Corpora-List] corpus ------>>>>> thesaurus

Dominic Widdows widdows at maya.com
Tue Nov 9 14:42:07 UTC 2004


> Hi Vladimir,
>
> You can find a good introduction to lexical acquisition methods based
> on
> co-occurrence statistics in Manning and Schuetze's "Foundations of
> Statistical Natural Language Processing".

Hi Vladimir,

Just to add to Viktor's suggestion - we have a few demos of thesaurus
generation / lexical acquisition some of which are based directly on
Shuetze's work, at
http://infomap.stanford.edu/webdemo

There are a couple of fairly domain-specific models built from the
Ohsumed medical corpus and the Wall Street Journal (though the latter
has a lot of general topics as well).

You can find links to papers (including work on mapping words and
senses from corpus derived models into hand-built lexical resources)
and some software for processing corpora into vector word-association
models (using a form of latent semantic analysis) from the main site at
http://infomap.stanford.edu/

Best wishes,
Dominic



More information about the Corpora mailing list