[Corpora-List] Corpora and SQL
Mark Davies
Mark_Davies at byu.edu
Thu May 24 12:00:21 UTC 2007
>> Yes, indeed. For single-word queries you could, with a bit of effort,
probably outperform CWB with an SQL-based system. It gets quite
unpredicable for more complex queries, however, and I suspect the
advantages of a single engine can easily be drowned.
I use a purely SQL approach with (for example) the VIEW interface to the
BNC (http://view.byu.edu) or the 100 million word TIME corpus
(http://view.byu.edu/timemag) , and it seems to handle "complex" queries
quite well -- less than two or three seconds for a query like " white
[nn*] that ". I've used CWB, but it doesn't seem to be any faster for a
query like this on a large (e.g. 100+ million word query) -- should it
be? In addition, a true SQL approach allows nice functionality in terms
of limiting and comparing by sub-corpora (directly, as part of the
query; see help files and examples at these two website).
>> It would certainly be interesting, though, if anyone where up to the
challenge of implementing the full range of features found in IMS CQP
with an SQL backend.
Each approach has its advantages and disadvantages. Just as a purely SQL
approach may not do everything that CWB can, I'm sure the converse is
also true.
Mark Davies
============================================
Mark Davies
Professor of (Corpus) Linguistics
Brigham Young University
(phone) 801-422-9168 / (fax) 801-422-0906
Web: davies-linguistics.byu.edu
** Corpus design and use // Linguistic databases **
** Historical linguistics // Language variation **
** English, Spanish, and Portuguese **
============================================
More information about the Corpora
mailing list