[Corpora-List] Corpora and SQL

Lars Nygaard lars.nygaard at iln.uio.no
Thu May 24 12:23:42 UTC 2007


Mark Davies wrote:
>>>Yes, indeed. For single-word queries you could, with a bit of effort,
> 
> 
> probably outperform CWB with an SQL-based system. It gets quite 
> unpredicable for more complex queries, however, and I suspect the 
> advantages of a single engine can easily be drowned.
> 
> I use a purely SQL approach with (for example) the VIEW interface to the
> BNC (http://view.byu.edu) or the 100 million word TIME corpus
> (http://view.byu.edu/timemag) , and it seems to handle "complex" queries
> quite well -- less than two or three seconds for a query like " white
> [nn*] that ". I've used CWB, but it doesn't seem to be any faster for a
> query like this on a large (e.g. 100+ million word query) -- should it
> be?

My point was that it is not obvious how to handle complex queries 
efficiently in the SQL approach, and it would be great to get some more 
technical details, and perhaps source code, on you obvously very 
well-engineered solution.

> In addition, a true SQL approach allows nice functionality in terms
> of limiting and comparing by sub-corpora (directly, as part of the
> query; see help files and examples at these two website). 

But that can also be done by combining CWB and SQL, making that 
combination (at least so far!) the most feature complete solution that 
is generally available to the community.


best,
lars nygaard



More information about the Corpora mailing list