Corpora: Survey of the answers about IR info and tools

Patrick Ruch ruch at dim.hcuge.ch
Fri Oct 27 15:34:42 UTC 2000


Some days ago,

I asked (on several lists) about tools and info on vectors distance and
indexing strategies. My question was very general, however the main target
was concerned with IR application. I was expecting answers about packages
for computing any kind of features distances (vectors, Boolean, Euclide,
Levenshtein...). I should have said that our system implements its own
indexing strategy.

I would like to thanks:
Romaric Besancon, Eric Gaussier, Paul Holmes-Higgin,
Andrew MacFarlane, Ian Soboroff, Richard Boulton,
Jian-Yun Nie, and Christian Boitet. 

Here is a survey of the available tools:

Andrew McCallum's Bag Of Words library:
Open source, seems complete.
http://www.cs.cmu.edu/~mccallum/bow 

SMART: it is a very complete IR system (indexing, retrieval,
stop words for English and Spanish...),
totally open source.
(ftp.cs.cornell.edu/pub/smart/). 

Muscat:
http://open.muscat.com/           
The indexing portion of Muscat is still closed-source.

I have started to install SMART.
Thanks again,
Patrick

__________________________________
Patrick Ruch
HUG - Medical Informatics Division
CH-1211 Geneva 14
tel.: (+41 22) 372 61 64
fax: (+41 22) 372 48 55
email: Patrick.Ruch at dim.hcuge.ch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20001027/b871d92c/attachment.htm>


More information about the Corpora mailing list