[Corpora-List] Frequency of the pronoun I
Alberto Barron Cedeño
lbarron at dsic.upv.es
Tue Sep 13 15:07:55 UTC 2011
Hi,
I'm not claiming it is the "quality corpus" you are asking for, but what
about considering the Web 1T n-grams collection?
When looking at 1-grams, we have the following:
the 19401194714
The 3513278932
I 2744649681
I hope this helps,
Alberto
--
Alberto Barrón-Cedeño
Department of Information Systems and Computation (Ph.D. student)
Universidad Politécnica de Valencia
http://www.dsic.upv.es/~lbarron
On Tue, 2011-09-13 at 15:33 +0100, Mike Scott wrote:
> On page 45 of the 3 September issue of New Scientist, there is a table
> giving frequencies of "the 20 most frequently used words in the English
> languiage, across both spoken and written texts". The first is I, then
> THE, AND, TO, A, OF, THAT... ME,ON,BUT.
> I wrote to the author, James Pennemaker of the U of Texas, about this,
> expressing my surprise at the pronoun I having greater frequency than
> THE, as even in the spoken-only section of the BNC (10m words) we find I
> occurring only just over half as often as THE. His data contains a mix
> of spoken and written with a large amount of blog data. He reports that
> with all his studies in the USA and Mexico, "people always use more I
> more than THE. It's never close."
> Can anyone help here, clearing up the position? Someone with access to a
> really top quality corpus, more up to date and representative than the BNC?
>
> Mike
>
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list