[Corpora-List] Roget's Thesaurus as an Electronic Lexical Knowledge Base

Stan Szpakowicz szpak at site.uottawa.ca
Mon Jun 12 20:53:11 UTC 2006


     Roget's Thesaurus as an Electronic Lexical Knowledge Base

                      http://www.nzdl.org/ELKB/

Roget's Thesaurus in Java, designed for Natural Language Processing, is
now available for downloading. We distribute it under the GNU General
Public License. The system is the graduate work of Mario Jarmasz
<http://www.site.uottawa.ca/~mjarmasz/thesis/>, who implemented it with
the proprietary lexical data in the 1987 Penguin Roget's. Olena Medelyan
<http://www.cs.waikato.ac.nz/~olena/> has wonderfully reengineered
Mario's system with the public-domain 1911 Roget's.

The Roget's ELKB package includes four examples of NLP applications:
detecting lexical chains in text, determining semantic distance between
words and phrases, clustering words based on their meaning and solving a
word quiz.

If you decide to use the ELKB, please put on your Web page a link to the
download site. (See my page home for a nifty logo.)

[The system is perfectly functional, but the 1911 data are antiquated.
We are in discussion with Pearson Education, the owner of the 1987
Penguin Roget's, about the fee structure and distribution mode that
would enable the NLP community to acquire the much more attractive
data.]

--
Stan Szpakowicz, PhD, Professor  613-562-5800/6687 /~\ The ASCII Ribbon
SITE, Computer Science       szpak at site.uottawa.ca \ / Campaign Against
University of Ottawa    www.site.uottawa.ca/~szpak  X     HTML Email



More information about the Corpora mailing list