[Corpora-List] Frequency of the pronoun I

WHITELOCK, Pete pete.whitelock at oup.com
Tue Sep 13 15:04:59 UTC 2011


The American blogs sub-corpus of the Oxford English Corpus contains around 89m words of post 2000 text. The frequencies are as follows

the	4.2m
I	1.3m



Pete Whitelock
Head of Language Engineering, Dictionaries
Reference Department
Academic Division
Oxford University Press


-----Original Message-----
From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of Mike Scott
Sent: 13 September 2011 15:33
To: corpora at uib.no
Cc: jwpennebaker at gmail.com
Subject: [Corpora-List] Frequency of the pronoun I

On page 45 of the 3 September issue of New Scientist, there is a table giving frequencies of "the 20 most frequently used words in the English languiage, across both spoken and written texts". The first is I, then THE, AND, TO, A, OF, THAT... ME,ON,BUT.
I wrote to the author, James Pennemaker of the U of Texas, about this, expressing my surprise at the pronoun I having greater frequency than THE, as even in the spoken-only section of the BNC (10m words) we find I occurring only just over half as often as THE. His data contains a mix of spoken and written with a large amount of blog data. He reports that with all his studies in the USA and Mexico, "people always use more I more than THE.  It's never close."
Can anyone help here, clearing up the position? Someone with access to a really top quality corpus, more up to date and representative than the BNC?

Mike

--
Mike Scott

***
If you publish research which uses WordSmith, do let me know so I can include it at http://www.lexically.net/wordsmith/corpus_linguistics_links/papers_using_wordsmith.htm
***
University of Aston and Lexical Analysis Software Ltd.
mike.scott at aston.ac.uk
www.lexically.net


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
Oxford University Press (UK) Disclaimer

This message is confidential. You should not copy it or disclose its contents to anyone. You may use and apply the information for the intended purpose only. OUP does not accept legal responsibility for the contents of this message. Any views or opinions presented are those of the author only and not of OUP. If this email has come to you in error, please delete it, along with any attachments. Please note that OUP may intercept incoming and outgoing email communications.

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list