Corpora: Summary: automatic thesaurus generation

Adam Przepiórkowski adamp at
Sun Jan 27 09:55:43 UTC 2002

This a brief summary of responses to my query regarding automatic
thesaurus generation from large corpora.  I am very grateful to Bob
Krovetz, Johan Hagman, Sara Rydin and Bill Mann for helpful

The following people worked or are working on automatic generation of
meaningful hierarchical thesauri:

Sharon Caraballo (esp. her recent Ph.D. dissertation available from
  her home page);
Marti Hearst (a 1992 paper available from Marti Hearst's home page);
Gregory Grefenstette (I found it more difficult to locate relevant
Johan Hagman (results will be presented at JADT;
Sara Rydin (started work on this for her Ph.D. thesis).

Virtually all of the work I located concentrates on automatic
detection of hyponymy/hypernymy relations on the basis of textual
clues such as "X, including x, y and z" (this normally implies that x,
y and z are kinds of X).

Bill Mann also mentions the the Oingo search engine which, it is
claimed, actually takes advantage of such techniques.

	Adam P.

More information about the Corpora mailing list