Ressources: Roget's Thesaurus as an Electronic Lexical Knowledge Base

Thierry Hamon thierry.hamon at LIPN.UNIV-PARIS13.FR
Tue Jun 13 15:25:09 UTC 2006

Date: Mon, 12 Jun 2006 16:54:14 -0400 (EDT)
From: Stan Szpakowicz <szpak at>
Message-Id: <200606122054.QAA27837 at kamla.csi.UOttawa.CA>

     Roget's Thesaurus as an Electronic Lexical Knowledge Base


Roget's Thesaurus in Java, designed for Natural Language Processing,
is now available for downloading. We distribute it under the GNU
General Public License. The system is the graduate work of Mario
Jarmasz <>, who
implemented it with the proprietary lexical data in the 1987 Penguin
Roget's. Olena Medelyan <> has
wonderfully reengineered Mario's system with the public-domain 1911

The Roget's ELKB package includes four examples of NLP applications:
detecting lexical chains in text, determining semantic distance
between words and phrases, clustering words based on their meaning and
solving a word quiz.

If you decide to use the ELKB, please put on your Web page a link to
the download site. (See my page home for a nifty logo.)

[The system is perfectly functional, but the 1911 data are antiquated.
We are in discussion with Pearson Education, the owner of the 1987
Penguin Roget's, about the fee structure and distribution mode that
would enable the NLP community to acquire the much more attractive

Stan Szpakowicz, PhD, Professor  613-562-5800/6687 /~\ The ASCII Ribbon
SITE, Computer Science       szpak at \ / Campaign Against
University of Ottawa  X     HTML Email

Message diffuse par la liste Langage Naturel <LN at>
Informations, abonnement :
English version       : 
Archives                 :

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  :

More information about the Ln mailing list