[Corpora-List] Corpora and SQL

Wed May 23 13:29:32 UTC 2007

John D. Burger wrote:

>> Another possibility is to store metadata in a SQL database, and  
>> export, on the fly, a subcorpus definition (start and stop  positions) 
>> for CWB. The best of both worlds, so to speak. This  works very well 
>> for the Glossa corpus query system (which has a  combination of CWB 
>> and MySQL as a backend).
> 
> 
> The disadvantage of this is that a single engine cannot reason about  
> the best way to run your query.  Like other databases, Postgresql  keeps 
> various summary statistics about the distribution of values in  each 
> indexed column, and uses these to construct a (hopefully)  optimal query 
> plan.

Yes, indeed. For single-word queries you could, with a bit of effort, 
probably outperform CWB with an SQL-based system. It gets quite 
unpredicable for more complex queries, however, and I suspect the 
advantages of a single engine can easily be drowned (cf. examples here: 
http://www.ims.uni-stuttgart.de/projekte/CorpusWorkbench/CQPTutorial/html/node13.html). 

It would certainly be interesting, though, if anyone where up to the 
challenge of implementing the full range of features found in IMS CQP 
with an SQL backend.

best,
lars nygaard