[Corpora-List] Corpora and SQL

Lars Nygaard lars.nygaard at iln.uio.no
Wed May 23 13:29:32 UTC 2007


John D. Burger wrote:

>> Another possibility is to store metadata in a SQL database, and  
>> export, on the fly, a subcorpus definition (start and stop  positions) 
>> for CWB. The best of both worlds, so to speak. This  works very well 
>> for the Glossa corpus query system (which has a  combination of CWB 
>> and MySQL as a backend).
> 
> 
> The disadvantage of this is that a single engine cannot reason about  
> the best way to run your query.  Like other databases, Postgresql  keeps 
> various summary statistics about the distribution of values in  each 
> indexed column, and uses these to construct a (hopefully)  optimal query 
> plan.

Yes, indeed. For single-word queries you could, with a bit of effort, 
probably outperform CWB with an SQL-based system. It gets quite 
unpredicable for more complex queries, however, and I suspect the 
advantages of a single engine can easily be drowned (cf. examples here: 
http://www.ims.uni-stuttgart.de/projekte/CorpusWorkbench/CQPTutorial/html/node13.html). 


It would certainly be interesting, though, if anyone where up to the 
challenge of implementing the full range of features found in IMS CQP 
with an SQL backend.

best,
lars nygaard



More information about the Corpora mailing list