[Corpora-List] Co-occurrence stats from BNC
Afsaneh Fazly
afsaneh at cs.toronto.edu
Fri Mar 17 14:29:42 UTC 2006
You should be able to do this easily and quickly, using the
Ngram Statistics Package (by Ted Pedersen), which can be
found here:
http://ngram.sourceforge.net/
Regards,
Afsaneh
On Fri, 17 Mar 2006, MCUSSHS wrote:
> Sorry if this is a dumb question: for a student project, we would like
> to get the following stats based on the BNC:
> (1) frequency (or probability) of all trigrams
> (2) co-occurrence stats for all word pairs (NOT bigrams, note) based on
> co-occurrence within the same sentence
>
> I assume that this is easy to compute, though time-consuming; and of
> course I understand that the data will be relatively sparse.
>
> So my question is, is this data available somewhere, e.g. someone has
> already done it; OR: what is the easiest ay to do it?
>
> Harold Somers
>
>
>
More information about the Corpora
mailing list