Corpora: Summary: automatic thesaurus generation

Adam Przepiórkowski adamp at ipipan.waw.pl
Sun Jan 27 09:55:43 UTC 2002


This a brief summary of responses to my query regarding automatic
thesaurus generation from large corpora.  I am very grateful to Bob
Krovetz, Johan Hagman, Sara Rydin and Bill Mann for helpful
suggestions.

The following people worked or are working on automatic generation of
meaningful hierarchical thesauri:

Sharon Caraballo (esp. her recent Ph.D. dissertation available from
  her home page);
Marti Hearst (a 1992 paper available from Marti Hearst's home page);
Gregory Grefenstette (I found it more difficult to locate relevant
  papers);
Johan Hagman (results will be presented at JADT
  http://www.irisa.fr/manifestations/2002/JADT/programme.htm#programme);
Sara Rydin (started work on this for her Ph.D. thesis).

Virtually all of the work I located concentrates on automatic
detection of hyponymy/hypernymy relations on the basis of textual
clues such as "X, including x, y and z" (this normally implies that x,
y and z are kinds of X).

Bill Mann also mentions the the Oingo search engine which, it is
claimed, actually takes advantage of such techniques.

Best,
--
	Adam P.



More information about the Corpora mailing list