[Corpora-List] Corpora and SQL
Lars Nygaard
lars.nygaard at iln.uio.no
Wed May 23 13:29:32 UTC 2007
John D. Burger wrote:
>> Another possibility is to store metadata in a SQL database, and
>> export, on the fly, a subcorpus definition (start and stop positions)
>> for CWB. The best of both worlds, so to speak. This works very well
>> for the Glossa corpus query system (which has a combination of CWB
>> and MySQL as a backend).
>
>
> The disadvantage of this is that a single engine cannot reason about
> the best way to run your query. Like other databases, Postgresql keeps
> various summary statistics about the distribution of values in each
> indexed column, and uses these to construct a (hopefully) optimal query
> plan.
Yes, indeed. For single-word queries you could, with a bit of effort,
probably outperform CWB with an SQL-based system. It gets quite
unpredicable for more complex queries, however, and I suspect the
advantages of a single engine can easily be drowned (cf. examples here:
http://www.ims.uni-stuttgart.de/projekte/CorpusWorkbench/CQPTutorial/html/node13.html).
It would certainly be interesting, though, if anyone where up to the
challenge of implementing the full range of features found in IMS CQP
with an SQL backend.
best,
lars nygaard
More information about the Corpora
mailing list