Corpora: Survey of the answers about IR info and tools
Patrick Ruch
ruch at dim.hcuge.ch
Fri Oct 27 15:34:42 UTC 2000
Some days ago,
I asked (on several lists) about tools and info on vectors distance and
indexing strategies. My question was very general, however the main target
was concerned with IR application. I was expecting answers about packages
for computing any kind of features distances (vectors, Boolean, Euclide,
Levenshtein...). I should have said that our system implements its own
indexing strategy.
I would like to thanks:
Romaric Besancon, Eric Gaussier, Paul Holmes-Higgin,
Andrew MacFarlane, Ian Soboroff, Richard Boulton,
Jian-Yun Nie, and Christian Boitet.
Here is a survey of the available tools:
Andrew McCallum's Bag Of Words library:
Open source, seems complete.
http://www.cs.cmu.edu/~mccallum/bow
SMART: it is a very complete IR system (indexing, retrieval,
stop words for English and Spanish...),
totally open source.
(ftp.cs.cornell.edu/pub/smart/).
Muscat:
http://open.muscat.com/
The indexing portion of Muscat is still closed-source.
I have started to install SMART.
Thanks again,
Patrick
__________________________________
Patrick Ruch
HUG - Medical Informatics Division
CH-1211 Geneva 14
tel.: (+41 22) 372 61 64
fax: (+41 22) 372 48 55
email: Patrick.Ruch at dim.hcuge.ch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20001027/b871d92c/attachment.htm>
More information about the Corpora
mailing list