full text retrieval system

Yoshimasa Tsuji yamato at yt.cache.waseda.ac.jp
Mon Mar 15 03:31:04 UTC 1999


Hello,
  I have been using unix's "grep" command to search particular
words in my text archive which is not terrribly large. But they
say if my archive becomes a little larger (say, several gigabytes)
I won't be able to cope.
  It is suggested that I should use some kind of FullTextRetrieval
system that internet browser users are familiar with whenever they
call some net search services. The thing is that the systems I know
use key word index that are created by scanning the whole of the text
so that the word search won't require scanning the text thereafter, but
the "key word" are usually obscure words like proper names and not,
for example, definite article or similar words that appear very often.
  The query I would like to put to you is that whether or not there is
a need to do the FullTextRetrieval in order to obtain word usages (which
is the sole objective of mine). And is there a need to search derivatives
as well?
  I am asking this because I am thinking of buying a product by bitsoft
that does that kind of job (they say it will find "finds", "found", "finding",
etc. upon input of "find"). They sell a product that sets <idesh'> under
<idti> index, which is interesting.

  If you have experience on this matter, let me know.

Cheers,
Tsuji

-------
The "grep" command, if slightly hacked", seems to be sufficient
if the volume of text is less than 600 MB (the Chekhov's full
30 volumes is much, much less). The matter will be completely
different if my machine is to serve anonymous enquirers all over
the world, of course.



More information about the SEELANG mailing list