[Corpora-List] distribution of types of lexical collocations

Stefan Evert evert at IMS.Uni-Stuttgart.DE
Tue Jan 4 18:52:09 UTC 2005


Dear Philippa,

sorry for this late reply, but I'm very interested in your research on
adjective-noun collocations - not least since we've been doing some
research on the extraction of German adjective-noun collocations
(mostly for lexicography applications) here at the IMS.

The answer to your question (and, in fact, the difficulty of giving an
answer) depends very much on what exactly you mean by "collocations".

If you understand a collocation as a lexicalised word combination (or
some subtype thereof), then you will need manually compiled lists of
true collocations in order to answer this question. Since I am not
aware of any systematic and comprehensive collections of such a
nature, I believe that it is nearly impossible to give a reliable
answer.

If you understand a collocation in a strict Neo-Firthian sense as a
recurrent word combination, then it's merely a matter of counting the
types of word combinations you're interested in on a given corpus
(such as the BNC for English). The quality of the answer you get
depends on the accuracy of the linguistic pre-processing and the
methods you use to extract the word combinations. While adjective-noun
combinations can easily be identified with high accuracy, other types
of word combinations will be much more difficult, especially noun-verb
combinations in German.

Another necessary clarification concerns what you mean by the
"distribution" of collocations. Are you referring to the number of
types (how many adj-n collocations are there?), the number of tokens
(how often do adj-n collocations occur in the corpus?), or to the
distribution of type frequencies/probabilities in the corpus (how many
low-frequency collocations are there? does their distribution follow
Zipf's law or is it more balanced?)?

If you don't mind my asking, I'm curious what kind of research
questions you want to address with this frequency data.

Kind regards, and a Happy New Year,
Stefan

> Dear members of the Corpora List,
>
> I am doing research on adjective-noun collocations and I wonder if there are
> any reliable corpus data and numbers as to the distribution of this type of
> (English and/or German) collocations in contrast to (noun-noun /)
> adjective-adverb / verb-adverb / noun-verb collocations.
>
> Thanks very much in advance.
>
> Philippa

--
______________________________________________________________________
Stefan Evert                                     purl.org/stefan.evert
http://www.collocations.de/                             schtepf at gmx.de



More information about the Corpora mailing list