[Corpora-List] Corpora and SQL
Lars Nygaard
lars.nygaard at iln.uio.no
Tue May 22 18:42:37 UTC 2007
Oliver Mason wrote:
> Is this not a performance nightmare? A table with 200 million entries?
A challenge, but not necessarily a nightmare. MySQL has no problem in
handling 200 million rows; and tables can be compressed and stored in
memory for incrased performance. Collocate searching would have to be
heavily optimised, though.
> I would guess something specifically designed for textual data would
> be better (eg the system described in 'Managing Gigabytes' by
> Moffat/Witten/Bell).
Well, the Moffat/Witten/Bell system is not very well suited for
linguistics, but CWB (which was originally written based on the
Gigabytes book) is, and would in most cases have better performance than
SQL.
As always it depends, but I would agree that CWB (or similar tools like
Manatee) is in general the best solution for corpus linguistics.
cheers,
lars
More information about the Corpora
mailing list