[Corpora-List] Keyness across Texts
    Ute Römer 
    ute.roemer at engsem.uni-hannover.de
       
    Mon Jul  9 12:49:47 UTC 2007
    
    
  
Dear Duncan, 
 
You may want to check Mike Scott's and Christopher Tribble's book Textual
Patterns (Benjamins, 2006, browsable at
http://site.ebrary.com/pub/benjamins/Doc?isbn=9789027222930) which contains
some very useful chapters on keyness and aboutness (chs. 4 and 5 if I
remember correctly) and discusses different ways of identifying keywords in
texts and corpora, and of interpreting the search output. 
 
Best wishes... Ute
 
 
************************************************************
 
Dr. Ute Römer
English Department
Leibniz University of Hanover
Königsworther Platz 1
30167 Hannover
Germany
 
Phone: +49 (0)511 762 2997
Fax: +49 (0)511 762 2996
Please note NEW e-mail address: ute.roemer at engsem.uni-hannover.de
<blocked::mailto:ute.roemer at engsem.uni-hannover.de> 
http://www.uteroemer.com <blocked::http://www.uteroemer.com/> 
http://www.engsem.uni-hannover.de/angli/
<blocked::http://www.engsem.uni-hannover.de/angli/> 
  _____  
From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On
Behalf Of Hunter, Duncan
Sent: Monday, July 09, 2007 2:30 PM
To: corpora at uib.no
Subject: [Corpora-List] Keyness across Texts
Hello Colleagues! 
 
A question about key-ness, and key words, in a group of texts
 
Ive been mulling over some key-ness statistics for a selection of texts
Ive been studying and a rather odd question has occurred to me
.
 
Ive been attempting to discover something of the thematic content or
about-ness of a group of texts by using a keywords analysis, comparing the
word frequency profile of the selection of texts with a comparative group to
derive key-ness (via log-likelihood) stats for each word. 
 
The key-ness value returned by such a procedure can be misleading because of
the problem of dispersal; is the word key because it occurs in a lot of
text samples in the corpus or because of a very high usage in only a single
text or small group of texts?
 
It occurs to me; would it be possible to formulate some kind of measure of a
words overall key-ness in the set of texts we are studying? By
multiplying together the words key score by the number of texts in which it
is key, for example. Of course the resulting figure in this case would be
totally arbitrary in a sense-even in the non-parametric realm of corpus
comparison measurement it would not really mean anything beyond its own
description...
 
However it seems to me useful to have some kind of quantitative means of
describing a words significance across a range of texts in some way
Any
ideas?  I am a relative 'newbie' in this field, surely this issue has been
tackled by somebody else somewhere? !
 
All the best,
 
Duncan Hunter
 <http://valibel.fltr.ucl.ac.be/> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20070709/fac2c683/attachment.htm>
    
    
More information about the Corpora
mailing list