[Corpora-List] Frequency of the pronoun I
Ken Litkowski
ken at clres.com
Wed Sep 14 16:00:47 UTC 2011
This discussion has focused on only one aspect of James Pennebaker's
work, the 'I' frequency, and perhaps not as much on his many
contributions to content analysis, which may have even more relevance to
discussions on this list.
Kyle Dent of Xerox has recently performed an analysis
<http://www.parc.com/content/attachments/through-twitter-glass.pdf> of
2400 tweets, with the aim of classifying them into "Questions" and "Not
Questions". He developed an elaborate NLP system to deal with these
tweets. He kindly provided me with these data, so that I could examine
them with my content analysis program to see how well they could be
analyzed without all the NLP superstructure. I happened to run a first
analysis at the time of this thread. It simply compares the two sets as
a whole.
The corpus size is 31,000 words (hardly the stature of BNC, COCA, or
OEC). But, curiously, both "i" and "the" hold the top two frequency
positions in both:
Set "the" "I"
Questions 400 327
Not Questions 437 575
Wow! Could this be a classification signature? Although this is not
likely, various other statistics in various combinations generated in
the program may very well be. So, here we have a micro-genre analysis
that confirms the other comments on this thread, much like the Known
Similarity Corpora of Adam Kilgarriff (15 years ago!).
Sentiment analysis is an emerging field, but is currently dominated by
heavy NLP techniques. I would suggest that techniques from content
analysis might provide a nice complement.
Ken
--
Ken Litkowski TEL.: 301-482-0237
CL Research EMAIL: ken at clres.com
9208 Gue Road Home Page: http://www.clres.com
Damascus, MD 20872-1025 USA Blog: http://www.clres.com/blog
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110914/b73398e3/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list