[Corpora-List] Co-occurrence stats from BNC

Afsaneh Fazly afsaneh at cs.toronto.edu
Fri Mar 17 14:29:42 UTC 2006


You should be able to do this easily and quickly, using the
Ngram Statistics Package (by Ted Pedersen), which can be
found here:

http://ngram.sourceforge.net/

Regards,
Afsaneh

On Fri, 17 Mar 2006, MCUSSHS wrote:

> Sorry if this is a dumb question: for a student project, we would like
> to get the following stats based on the BNC:
> (1) frequency (or probability) of all trigrams
> (2) co-occurrence stats for all word pairs (NOT bigrams, note) based on
> co-occurrence within the same sentence
>
> I assume that this is easy to compute, though time-consuming; and of
> course I understand that the data will be relatively sparse.
>
> So my question is, is this data available somewhere, e.g. someone has
> already done it; OR: what is the easiest ay to do it?
>
> Harold Somers
>
>
>



More information about the Corpora mailing list