[Corpora-List] Laypersons' applied corpus linguistics

Ken Litkowski ken at clres.com
Mon Jan 26 16:15:26 UTC 2009


Thanks to Adam for remembering my interest in content analysis (CA).

More than anything else, CA is intended to provide a comparative 
analysis of the content of several texts, from open-ended questions in 
questionnaires with 2 or 3 word answers to full texts. Quantitative CA 
received its most prominent recognition when it was used to identify the 
authors of the US Federalist papers, where distinctions among the texts 
could be identified from the relative frequencies of function words, 
including 'the'. CA is not used to "understand" texts; for this we use 
the full range of techniques in NLP.

CA has an extensive suite of methods, all of which depend on some 
categorization of words. With a good underlying dictionary allowing 
"polysemous" entries, a CA is capable of performing excellent 
disambiguation. Quantitative profiles of texts essentially characterize 
the "domain" of the text; this is a reflection of Yarowsky's principle 
of "one sense per discourse." (Note that such profiles can actually be 
used as a retrieval mechanism to identify similar texts.)

Interrater reliability (particularly as initiated by Krippendorf) is 
intended to assure reproducibility of results among subjective raters 
(less of a problem when quantitative methods are used). Over 15 years 
ago, I mentioned Krippendorf's alpha to Becky Passonneau and she became 
an ardent supporter for this in CL studies. This has now received full 
endorsement in our community via the recent CL paper.

    Ken

Hongyin Tao wrote:
>
> Thanks to Adam and everyone for the useful references. Perhaps I 
> should clarify my subject line a bit. When I read passages like the 
> following as reported by the journalist (of course without checking 
> the actual study), it made me to think that a corpus linguist would do 
> more than just looking at individual words alone:
>
> "The researchers read through the conversations, noting the context of 
> the IM threads. Then, they used a linguistic word count program to 
> analyze the conversations' pronouns and words with emotional content.
>
> Among pronouns in IMs, couples used "I" nearly 20 times more 
> frequently than "we." And of the emotion words, all couples were most 
> likely to use positive words.
>
> "We found that the extent to which people used positive emotion words 
> like 'great,' 'happy,' 'love,' tended to be happier in their 
> relationships and to stay in their relationships for a longer period 
> of time," Slatcher said."
>
> While individual words are useful to look into, 
> combinations/collocations would be equally, if not more, important in 
> understanding texts. This is of course not exactly an earth-shattering 
> discovery to folks on this list.
>
> Hongyin
>
> On Sat, Jan 24, 2009 at 10:55 PM, Adam Kilgarriff 
> <adam at lexmasterclass.com <mailto:adam at lexmasterclass.com>> wrote:
>
>     Dear Hongyin Tao
>      
>     this isn't layperson's corpus linguistics, it's another discipline
>     called Content Analysis, which has been around for longer than
>     corpus linguistics but has remarkably little crossover of
>     references and interest despite simialrity of methods - the only
>     person I know of who has explicitly linked the two approaches is
>     Ken Litkowski.
>      
>     From the little I know, CA blossomed as a method of propoganda
>     analysis in the US in the 60s, and now lives on particularly in
>     psychotherapy and related areas, as in the news clip you show. 
>     One big famous system was called General Enquirer.  They developed
>     very large lexicons withe words marked up for whether they were
>     positive or negative, etc, adn also did lots of work on WSD, as
>     polysemy was aproblem for their method.
>      
>     Refs
>      
>     Harvard IV Psycho-Sociological Dictionary (*Kelly* & *Stone*, 1975).
>      
>     http://en.wikipedia.org/wiki/Content_analysis
>     or for the abstract of a psychological piece using it, with a
>     decent intro, see
>     http://www.informaworld.com/smpp/content~content=a785037098~db=all
>     <http://www.informaworld.com/smpp/content%7Econtent=a785037098%7Edb=all>
>     Or here:
>      
>     The assessment of psychological states through content analysis of
>     verbal communications.
>     Viney, Linda L.
>     Psychological Bulletin. Vol 94(3), Nov 1983, 542-563.
>
>
>           Abstract
>
>        1. Presents a history of the use of content analysis in
>           psychology and describes the development of CA scales,
>           including an example of a scale in construction. The variety
>           of verbal communications to which CA is applicable is also
>           considered. Issues of reliability and validity were
>           considered in a survey of the literature on a sample of 10
>           relatively well-developed CA scales. Some of the theoretical
>           and practical advantages of the technique over other methods
>           of assessing psychological states are also examined, as well
>           as some of its problems and limitations. Information about
>           available CA scales is included. Applications of CA in
>           personality, developmental, and social psychology are
>           considered, together with others in clinical, community, and
>           health psychology. The scoring of CA scales by computer is
>           also discussed, as is their contribution to an ethical
>           relationship between researcher and research participant.
>           The viability of CA as an aid in psychological research is
>           evaluated. (158 ref) (PsycINFO Database Record (c) 2008 APA,
>           all rights reserved)
>
>     (which scarcely looks layperson-like to me!)
>      
>     Regards,
>      
>     Adam Kilgarriff
>      
>     2009/1/24 Hongyin Tao <bbs.lists at gmail.com
>     <mailto:bbs.lists at gmail.com>>
>
>          A recent example that just came up...
>
>         http://www.livescience.com/culture/090123-instant-message-couples.html
>
>         _______________________________________________
>         Corpora mailing list
>         Corpora at uib.no <mailto:Corpora at uib.no>
>         http://mailman.uib.no/listinfo/corpora
>
>
>
>
>     -- 
>     ================================================
>     Adam Kilgarriff                                    
>      http://www.kilgarriff.co.uk              
>     Lexical Computing Ltd                   http://www.sketchengine.co.uk
>     Lexicography MasterClass Ltd      http://www.lexmasterclass.com
>     Universities of Leeds and Sussex       adam at lexmasterclass.com
>     <mailto:adam at lexmasterclass.com>
>     ================================================
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>   

-- 
Ken Litkowski                     TEL.: 301-482-0237
CL Research                       EMAIL: ken at clres.com
9208 Gue Road
Damascus, MD 20872-1025 USA       Home Page: http://www.clres.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20090126/fb7a0839/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list