[Corpora-List] Inverted index implementation: Best practices

Andy Roberts andyr at comp.leeds.ac.uk
Sat Oct 15 17:09:33 UTC 2005


The best practice is to not spend your valuable time and resources
re-implementing indexing/searching software. Many already exist and have
undergone years of testing and improvement.

For this task, I tend to go for Lucene, which is a Java library for
fast indexing and searching. It's really fast and is designed to cope
with gigabytes of data.

http://lucene.apache.org

With Lucene being an Apache project its well supported and receives a
lot of coverage. Many sub-projects have been formed to port Lucene into
other languages like C, Perl, Python and C#, which is very handy for
those who Java's not the language of choice.

Andy

On Sat, 15 Oct 2005, Helge Thomas Karset Hellerud wrote:

> Hello,
>
> Does anyone have some good links where I can find best practices when
> implementing an inverted index (inverted file index)? The index only
> needs to store terms and in which document they occur:
>
> term  document
> --------------
> term1 1;3;5
> term2 1;2;4
> term3 3;4
> ...
>
> The goal of the implementation is to be able to do a fast search even if
> the index will become large.
>
> Thanks in advance.
>
> Helge
>
>
>
>



More information about the Corpora mailing list