Corpora: Survey of the answers about IR info and tools

Patrick Ruch ruch at
Fri Oct 27 15:34:42 UTC 2000

Some days ago,

I asked (on several lists) about tools and info on vectors distance and
indexing strategies. My question was very general, however the main target
was concerned with IR application. I was expecting answers about packages
for computing any kind of features distances (vectors, Boolean, Euclide,
Levenshtein...). I should have said that our system implements its own
indexing strategy.

I would like to thanks:
Romaric Besancon, Eric Gaussier, Paul Holmes-Higgin,
Andrew MacFarlane, Ian Soboroff, Richard Boulton,
Jian-Yun Nie, and Christian Boitet. 

Here is a survey of the available tools:

Andrew McCallum's Bag Of Words library:
Open source, seems complete. 

SMART: it is a very complete IR system (indexing, retrieval,
stop words for English and Spanish...),
totally open source.

The indexing portion of Muscat is still closed-source.

I have started to install SMART.
Thanks again,

Patrick Ruch
HUG - Medical Informatics Division
CH-1211 Geneva 14
tel.: (+41 22) 372 61 64
fax: (+41 22) 372 48 55
email: Patrick.Ruch at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Corpora mailing list