[Corpora-List] calculating word density
Justin Washtell
lec3jrw at leeds.ac.uk
Thu Jul 23 18:21:42 UTC 2009
Hello Naouel,
The standard approach would be to turn the (total) count into a probability by dividing it by the number of tokens in the corpus. These can be compared directly. Or you might find it useful to then take the logarithm if you're comparing or plotting relative densities. But perhaps there is something subtler you are wanting to do?
Justin Washtell
School of Computing
University of Leeds
________________________________________
From: corpora-bounces at uib.no [corpora-bounces at uib.no] On Behalf Of n.toumi at reading.ac.uk [n.toumi at reading.ac.uk]
Sent: 23 July 2009 18:27
To: Corpora at uib.no
Subject: [Corpora-List] calculating word density
Dear readers,
I'm looking for suggestions on how to calculate word density in a corpus.
knowing the number of occurrences of that word and the number of words in
that corpus.
One method I thought of is to count the number of occurrences per 1000 or
10000 words.
Can anyone suggest other ways.
Thank you for advice.
Regards,
Naouel
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list