[Corpora-List] [software announcement] DISCO: database of distributionally similar words

Peter Kolb pekoli at gmail.com
Wed Aug 18 07:40:14 UTC 2010


DISCO: A Multilingual Database of Distributionally Similar Words
version 1.1 is now freely available at

   http://www.linguatools.de/disco/disco_en.html

DISCO is a Java package that allows to retrieve the semantic
similarity between two words, and to retrieve the most similar words
for a given word. DISCO queries a database containing a pre-computed
word space. There are pre-computed word spaces for a number of
languages, including English, German, Spanish, French, Italian, Dutch,
and Czech.
Also, there is a plug-in for Protege 3.4 (http://protege.stanford.edu)
with which DISCO can be queried from within the Protege application:

   http://www.linguatools.de/disco/disco4protege.html

DISCO can also be queried online at

   http://www.linguatools.de/disco/disco-gui_en.html


FUNCTIONALITY

DISCO allows to
* compute the semantic similarity between two words,
* retrieve the n most similar words for a given word,
* retrieve the n most significant co-occurrences (collocations) for a word,
* retrieve the common context (co-occurrences) of two words,
* retrieve the frequency of a word in the corpus.

On a standard workstation the semantic similarity for about 50 word
pairs per second can be computed.

DISCO can be easily integrated into other applications via the DISCO
Java API (see API documentation at
http://www.linguatools.de/disco/disco-api/).

DISCO 1.1 can only be used to query an existing word space, it can not
be used to create a word space from a corpus.


DOWNLOAD

DISCO is freely available, open source and now licensed under the
Apache License (http://www.apache.org/licenses/LICENSE-2.0.html).

The Java package and the word spaces can be downloaded from

   http://www.linguatools.de/disco/disco-download_en.html


REFERENCE

Peter Kolb. DISCO: A Multilingual Database of Distributionally Similar
Words. In A. Storrer et al. (Eds.), KONVENS 2008 - Ergänzungsband:
Textressourcen und lexikalisches Wissen, Berlin 2008.


Peter Kolb
peter.kolb at linguatools.org
http://www.ling.uni-potsdam.de/~kolb/

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list