[Corpora-List] calculating word density

Thu Jul 23 18:21:42 UTC 2009

Hello Naouel,

The standard approach would be to turn the (total) count into a probability by dividing it by the number of tokens in the corpus. These can be compared directly. Or you might find it useful to then take the logarithm if you're comparing or plotting relative densities. But perhaps there is something subtler you are wanting to do?

Justin Washtell
School of Computing
University of Leeds

________________________________________
From: corpora-bounces at uib.no [corpora-bounces at uib.no] On Behalf Of n.toumi at reading.ac.uk [n.toumi at reading.ac.uk]
Sent: 23 July 2009 18:27
To: Corpora at uib.no
Subject: [Corpora-List] calculating word density

Dear readers,

I'm looking for suggestions on how to calculate word density in a corpus.
knowing the number of occurrences of that word and the number of words in
that corpus.

One method I thought of is to count the number of occurrences per 1000 or
10000 words.

Can anyone suggest other ways.

Thank you for advice.

Regards,

Naouel

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora