[Corpora-List] Corpus Development

Mark Davies Mark_Davies at byu.edu
Tue Apr 29 14:57:46 UTC 2008


Marco,

>> with a CQP query like
>> VERB ADV? DET? ADJ* NOUN
>> I would only find collocates that are plausible direct objects of the
>> verb (since you can only have adverbs, articles, adjectives occurring
>> in between, and not, e.g., prepositions or other verbs). I suspect
>> that something like this would be extremely tricky to implement in a
>> relational db while preserving efficiency,

For a query like that, I'd just search for:

WORD(S): [break].[v*] [d*]
CONTEXT: [nn*]

The [d*] in the "node word" field should limit things pretty well.

In general, though, there are undoubtedly things that a relational database can't do that a CQP/CQS could, and vice versa.

With a relational database, for example, it's quite easy to do complex comparisons across different sections of a corpus (e.g. comparing collocates in two different genres or historical periods, which can be defined and searched "on the fly"), or to integrate other resources (like a thesaurus, or long user-defined wordlists, or CELEX, or WordNet). Perhaps there are CQP/CQS implementations that allow such queries, however. Actually, I'm working on a paper right now that deals with this issue, and so I'd appreciate any pointers to publicly-available CQP/CQS implementations that allow for queries like this.

Best,

Mark Davies

============================================
Mark Davies
Professor of (Corpus) Linguistics
Brigham Young University
(phone) 801-422-9168 / (fax) 801-422-0906
Web: davies-linguistics.byu.edu

** Corpus design and use // Linguistic databases **
** Historical linguistics // Language variation **
** English, Spanish, and Portuguese **
============================================

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list