[Corpora-List] [software announcement] DISCO: database of distributionally similar words
Peter Kolb
pekoli at gmail.com
Wed Aug 18 07:40:14 UTC 2010
DISCO: A Multilingual Database of Distributionally Similar Words
version 1.1 is now freely available at
http://www.linguatools.de/disco/disco_en.html
DISCO is a Java package that allows to retrieve the semantic
similarity between two words, and to retrieve the most similar words
for a given word. DISCO queries a database containing a pre-computed
word space. There are pre-computed word spaces for a number of
languages, including English, German, Spanish, French, Italian, Dutch,
and Czech.
Also, there is a plug-in for Protege 3.4 (http://protege.stanford.edu)
with which DISCO can be queried from within the Protege application:
http://www.linguatools.de/disco/disco4protege.html
DISCO can also be queried online at
http://www.linguatools.de/disco/disco-gui_en.html
FUNCTIONALITY
DISCO allows to
* compute the semantic similarity between two words,
* retrieve the n most similar words for a given word,
* retrieve the n most significant co-occurrences (collocations) for a word,
* retrieve the common context (co-occurrences) of two words,
* retrieve the frequency of a word in the corpus.
On a standard workstation the semantic similarity for about 50 word
pairs per second can be computed.
DISCO can be easily integrated into other applications via the DISCO
Java API (see API documentation at
http://www.linguatools.de/disco/disco-api/).
DISCO 1.1 can only be used to query an existing word space, it can not
be used to create a word space from a corpus.
DOWNLOAD
DISCO is freely available, open source and now licensed under the
Apache License (http://www.apache.org/licenses/LICENSE-2.0.html).
The Java package and the word spaces can be downloaded from
http://www.linguatools.de/disco/disco-download_en.html
REFERENCE
Peter Kolb. DISCO: A Multilingual Database of Distributionally Similar
Words. In A. Storrer et al. (Eds.), KONVENS 2008 - Ergänzungsband:
Textressourcen und lexikalisches Wissen, Berlin 2008.
Peter Kolb
peter.kolb at linguatools.org
http://www.ling.uni-potsdam.de/~kolb/
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list