[Corpora-List] Frequency of the pronoun I

Adam Kilgarriff adam at lexmasterclass.com
Tue Sep 13 14:51:06 UTC 2011


Everything depends on text type.

BNC-spoken overall has more 'the' than 'I' but that's because half of it is
meetings/lectures/sermons.  If you look only at the conversational part
(obscurely called "demographic") 'I' is more common, in keeping with the
kinds of language that James Pennebaker works with (from my recollection of
a fascinating talk of his I went to)

Asking for a more representative corpus won't help because we all have
different ideas about what it should be representative of

Adam

On 13 September 2011 15:33, Mike Scott <mike at lexically.net> wrote:

> On page 45 of the 3 September issue of New Scientist, there is a table
> giving frequencies of "the 20 most frequently used words in the English
> languiage, across both spoken and written texts". The first is I, then THE,
> AND, TO, A, OF, THAT... ME,ON,BUT.
> I wrote to the author, James Pennemaker of the U of Texas, about this,
> expressing my surprise at the pronoun I having greater frequency than THE,
> as even in the spoken-only section of the BNC (10m words) we find I
> occurring only just over half as often as THE. His data contains a mix of
> spoken and written with a large amount of blog data. He reports that with
> all his studies in the USA and Mexico, "people always use more I more than
> THE.  It's never close."
> Can anyone help here, clearing up the position? Someone with access to a
> really top quality corpus, more up to date and representative than the BNC?
>
> Mike
>
> --
> Mike Scott
>
> ***
> If you publish research which uses WordSmith, do let me know so I can
> include it at
> http://www.lexically.net/**wordsmith/corpus_linguistics_**
> links/papers_using_wordsmith.**htm<http://www.lexically.net/wordsmith/corpus_linguistics_links/papers_using_wordsmith.htm>
> ***
> University of Aston and Lexical Analysis Software Ltd.
> mike.scott at aston.ac.uk
> www.lexically.net
>
>
> ______________________________**_________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/**corpora<http://mailman.uib.no/options/corpora>
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/**listinfo/corpora<http://mailman.uib.no/listinfo/corpora>
>



-- 
========================================
Adam Kilgarriff <http://www.kilgarriff.co.uk/>
adam at lexmasterclass.com
Director                                    Lexical Computing
Ltd<http://www.sketchengine.co.uk/>

Visiting Research Fellow                 University of
Leeds<http://leeds.ac.uk>

*Corpora for all* with the Sketch Engine <http://www.sketchengine.co.uk>

                        *DANTE: a lexical database for
English<http://www.webdante.com>
                  *
========================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110913/04fc11ee/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list