[Corpora-List] Corpora and SQL

Antonio Saz antonio.saz at otsistemas.com
Wed May 23 07:50:38 UTC 2007


I think is a corpus with 200 million words , but not different words.

We built a 250 million words corpus in spanish, implemented with MS SQL
Server 2000. We use the text retrieval engine from Microsoft (called "Text
Services" in SQL Server) that is very fast. The full database (data and
indexes) occupies 2 GB.

Antonio

> Is this not a performance nightmare?  A table with 200 million entries?
>
> I would guess something specifically designed for textual data would
> be better (eg the system described in 'Managing Gigabytes' by
> Moffat/Witten/Bell).
>
> Oliver
>
>



More information about the Corpora mailing list