Corpora: wsd software available
ted pedersen
tpederse at d.umn.edu
Tue Feb 5 19:06:12 UTC 2002
We are happy to announce the availability of the complete source code
distribution for the Duluth systems that participated in the Senseval-2
comparative exercise among word sense disambiguation systems. This is
free software, distributed under the GNU CopyLeft.
This includes a number of components:
SenseTools (v0.1), a suite of Perl programs that convert sense-tagged
text into a feature vector representation suitable for use with the Weka
machine learning system. Users may specify features to be identified in
the text using regular expressions, or features may be automatically
identified using the Bigram Statistics Package (v0.4 or better), which
is also available.
Duluth-Shell, a set of C-shell scripts that tie together the Bigram
Statistics Package, SenseTools, and Weka and should allow a user to easily
replicate the Duluth systems from Senseval-2, and provide a convenient
starting point for further experimentation with corpus-based, machine
learning oriented methods.
You can find SenseTools, Duluth-Shell, the Bigram Statistics Package, and
a pointer to Weka (which was developed at the University of Waikato) at
http://www.d.umn.edu/~tpederse/senseval2.html
Please let us know if you have any questions.
Enjoy!
Ted
--
# Ted Pedersen http://www.d.umn.edu/~tpederse #
# Department of Computer Science tpederse at d.umn.edu #
# University of Minnesota, Duluth #
# Duluth, MN 55812 (218) 726-8770 #
More information about the Corpora
mailing list