Corpora: Keywords in Literary Texts Summary

T Murphy tmorpheme at hotmail.com
Wed Jun 6 00:42:39 UTC 2001


  
Dear Corpora Listers:
Here is a summary of my inquiry concerning the corpus analysis of keywords in literary texts.
1. Mick Short noted that there is some discussion of keywords in the play Romeo and Juliet in chapter 4 of Jonathan Culpeper, Language and Characterisation, (Longman 2001).  Mick also suggested that it might be worth having a look at David Hoover's Language and Style in The Inheritors (University Press of America 1998), which compares Golding's book with various corpora.  
 2. Christopher Tribble noted that M. Stubbs, Text and Corpus Analysis (1996) specifically mentions Raymond Williams’ notion of keywords. Christopher also commented that Mike Scott has been doing work on cultural keywords using Guardian newspaper data.
3. Adam Kilgarriff reminds me that Mike Scott's Wordsmith supports this sort of analysis, and that Tony Bernber Sardinha knows a lot about the area but from an EFL rather than a literary perspective
The following two leads were very useful:
4. Ramesh Krishnamurthy has written “Ethnic, Racial and Tribal: The Language of Racism?” in Texts and Practices, eds. Caldas-Coulthard & Coulthard, Routledge, London, 1996.  In this article , Ramesh looked at three keywords in the Bank of English corpus (then 121 million words, now 418 million words) and made specific references to Raymond Williams' Keywords.

5. Andrius Utka, a master student at Vytautas Magnus University, Faculty of Humanities has done an analysis of George Orwell’s 1984 using the statistical methods of corpus linguistics. It is available for viewing at  http://donelaitis.vdu.lt, by follow the link from "publications" to "sankirta".  
Among other things, this paper suggests a useful method for discovering what the keywords of a given literary text actually are:


“The following procedure is used for finding key words in 1984:  
The frequency list of all word forms is produced by the computer program Wordsmith Tools.  
Only 100 most frequent nouns are left and all the other words are removed from the list.  
The nouns are lemmatized.  
The frequency list of these 100 nouns is produced from the large corpus of the Bank of English.  
The occurrences of words in both frequency lists are compared using chi-squared statistical test”.  
The key words are sorted out according the chi-square value.
Finally, there were two respondents working on texts other than literary ones:
 6. Wendy J. Anderson, a PhD Student in the Department of French at the University of St Andrews is carrying out keyword analysis on administrative texts in French.
 7. Geoffrey Williams has done work on extracting keywords in scientific corpora. Geoffrey also notes that Berry Roghe worked in a similar way on literary texts in the 70's.
 The references that Geoffrey provided are:
 Berry-Roghe G.L.M. (1973). The computation of collocations and their relevance in lexical studies, dans Aitken A.J,. Bailey R., Hamilton-Smith N., (eds), The Computer and Literary Studies, Edinburgh, Edinburgh University Press  
Williams, G. 1998. "Collocational Networks: Interlocking Patterns of Lexis in a Corpus of Plant Biology". International Journal of Corpus Linguistics. .3(1): 151-171
Williams G. 1999. Les rseaux collocationnels dans la construction et l'exploitation d'un corpus dans le cadre d'une communaut de discours scientifique. These en anglais linguistique de corpus. Universit de Nantes. http://perso.wanadoo/geoffrey.williams
It seems clear that the field is still in its very early stages of development. I suspect, however, that it may experience some growth over the next few years, although perhaps the non-literary areas may grow more quickly than the literary one that concerns me.
Thanks very much to all who responded.
 Dr. Terry Murphy
Yonsei University
College of Liberal Arts
Dept. of English Language and Literature
Seoul 120--749
KoreaGet more from the Web.  FREE MSN Explorer download : http://explorer.msn.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20010606/e7616e47/attachment.htm>


More information about the Corpora mailing list