[Corpora-List] Corpus Development

Marco Baroni marco.baroni at unitn.it
Tue Apr 29 14:24:10 UTC 2008


Dear Mark,

Thanks for your informative reply.

I think the query you propose is probably a good surrogate, but not 
quite what I meant.

If I understand your syntax correctly, this...

> WORD(S): [break].[v*]  (i.e. all forms of 'break' as a verb)
> CONTEXT: [nn*]  (and also select [0]  [5] )

would also find, e.g., "break with a hammer", "break and repair cars", 
etc., whereas with a CQP query like

VERB ADV? DET? ADJ* NOUN

I would only find collocates that are plausible direct objects of the 
verb (since you can only have adverbs, articles, adjectives occurring 
in between, and not, e.g., prepositions or other verbs). I suspect 
that something like this would be extremely tricky to implement in a 
relational db while preserving efficiency, but I would be happy to be 
proven wrong.

Regards,

Marco



_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list