[Corpora-List] Corpora and SQL

John D. Burger john at mitre.org
Tue May 22 18:45:55 UTC 2007


> Is this not a performance nightmare?  A table with 200 million  
> entries?

Many databases are routinely used for far larger datasets.

With respect to the original query, the industrial-strength DB  
Postgres has a well-developed extension for text search called tsearch2:

http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/

One virtue of using real databases rather than text retrieval engines  
is the ability to query both document content and whatever metadata  
one might have associated with the text.  "Find me blog entries with  
these words posted on Saturday evenings by authors whose profile says  
they were born before 1964 and are interested in sushi."

- John D. Burger
   MITRE



More information about the Corpora mailing list