[Corpora-List] Corpora and SQL
Adam Kilgarriff
adam at lexmasterclass.com
Thu May 24 15:46:19 UTC 2007
Of course a dedicated corpus query tool does everything well without extra
engineering. When I see a discussion like this with lots of comments like
"with a bit of effort", I think "how many person-hours do they mean? (And,
how good a solution will it be?) Unless person-hours are very cheap, it will
cost less to buy a service that already does what is wanted." But, seeing
as I have such a service to sell, I'd better stop there or I shall be thrown
off the list for being commercial
Adam
http://www.kilgarriff.co.uk
-----Original Message-----
From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On
Behalf Of Lars Nygaard
Sent: 23 May 2007 14:30
To: corpora at uib.no
Subject: Re: [Corpora-List] Corpora and SQL
John D. Burger wrote:
>> Another possibility is to store metadata in a SQL database, and
>> export, on the fly, a subcorpus definition (start and stop positions)
>> for CWB. The best of both worlds, so to speak. This works very well
>> for the Glossa corpus query system (which has a combination of CWB
>> and MySQL as a backend).
>
>
> The disadvantage of this is that a single engine cannot reason about
> the best way to run your query. Like other databases, Postgresql keeps
> various summary statistics about the distribution of values in each
> indexed column, and uses these to construct a (hopefully) optimal query
> plan.
Yes, indeed. For single-word queries you could, with a bit of effort,
probably outperform CWB with an SQL-based system. It gets quite
unpredicable for more complex queries, however, and I suspect the
advantages of a single engine can easily be drowned (cf. examples here:
http://www.ims.uni-stuttgart.de/projekte/CorpusWorkbench/CQPTutorial/html/no
de13.html).
It would certainly be interesting, though, if anyone where up to the
challenge of implementing the full range of features found in IMS CQP
with an SQL backend.
best,
lars nygaard
More information about the Corpora
mailing list