[Corpora-List] Corpora and SQL
John D. Burger
john at mitre.org
Tue May 22 18:45:55 UTC 2007
> Is this not a performance nightmare? A table with 200 million
> entries?
Many databases are routinely used for far larger datasets.
With respect to the original query, the industrial-strength DB
Postgres has a well-developed extension for text search called tsearch2:
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/
One virtue of using real databases rather than text retrieval engines
is the ability to query both document content and whatever metadata
one might have associated with the text. "Find me blog entries with
these words posted on Saturday evenings by authors whose profile says
they were born before 1964 and are interested in sushi."
- John D. Burger
MITRE
More information about the Corpora
mailing list