[Corpora-List] Frequency of the pronoun I

Alberto Barron Cedeño lbarron at dsic.upv.es
Tue Sep 13 15:07:55 UTC 2011


Hi,

I'm not claiming it is the "quality corpus" you are asking for, but what
about considering the Web 1T n-grams collection?

When looking at 1-grams, we have the following:

the 	19401194714
The     3513278932
I       2744649681

I hope this helps,
Alberto

-- 
Alberto Barrón-Cedeño 
Department of Information Systems and Computation (Ph.D. student)
Universidad Politécnica de Valencia
http://www.dsic.upv.es/~lbarron


On Tue, 2011-09-13 at 15:33 +0100, Mike Scott wrote:
> On page 45 of the 3 September issue of New Scientist, there is a table 
> giving frequencies of "the 20 most frequently used words in the English 
> languiage, across both spoken and written texts". The first is I, then 
> THE, AND, TO, A, OF, THAT... ME,ON,BUT.
> I wrote to the author, James Pennemaker of the U of Texas, about this, 
> expressing my surprise at the pronoun I having greater frequency than 
> THE, as even in the spoken-only section of the BNC (10m words) we find I 
> occurring only just over half as often as THE. His data contains a mix 
> of spoken and written with a large amount of blog data. He reports that 
> with all his studies in the USA and Mexico, "people always use more I 
> more than THE.  It's never close."
> Can anyone help here, clearing up the position? Someone with access to a 
> really top quality corpus, more up to date and representative than the BNC?
> 
> Mike
> 


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list