<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<tt>This discussion has focused on only one aspect of James
Pennebaker's work, the 'I' frequency, and perhaps not as much on
his many contributions to content analysis, which may have even
more relevance to discussions on this list.<br>
<br>
Kyle Dent of Xerox has recently <a
href="http://www.parc.com/content/attachments/through-twitter-glass.pdf">performed
an analysis</a> of 2400 tweets, with the aim of classifying them
into "Questions" and "Not Questions". He developed an elaborate
NLP system to deal with these tweets. He kindly provided me with
these data, so that I could examine them with my content analysis
program to see how well they could be analyzed without all the NLP
superstructure. I happened to run a first analysis at the time of
this thread. It simply compares the two sets as a whole.<br>
<br>
The corpus size is 31,000 words (hardly the stature of BNC, COCA,
or OEC). But, curiously, both "i" and "the" hold the top two
frequency positions in both:<br>
<br>
Set "the" "I"<br>
Questions 400 327<br>
Not Questions 437 575<br>
<br>
Wow! Could this be a classification signature? Although this is
not likely, various other statistics in various combinations
generated in the program may very well be. So, here we have a
micro-genre analysis that confirms the other comments on this
thread, much like the Known Similarity Corpora of Adam Kilgarriff
(15 years ago!).<br>
<br>
Sentiment analysis is an emerging field, but is currently
dominated by heavy NLP techniques. I would suggest that techniques
from content analysis might provide a nice complement.<br>
<br>
Ken <br>
</tt>-- <br>
<tt>Ken Litkowski TEL.: 301-482-0237
<br>
CL Research EMAIL: <a class="moz-txt-link-abbreviated" href="mailto:ken@clres.com">ken@clres.com</a>
<br>
9208 Gue Road Home Page: <a class="moz-txt-link-freetext" href="http://www.clres.com">http://www.clres.com</a>
<br>
Damascus, MD 20872-1025 USA Blog: <a class="moz-txt-link-freetext" href="http://www.clres.com/blog">http://www.clres.com/blog</a></tt>
</body>
</html>