[Corpora-List] Keyness across Texts

Hunter, Duncan D.I.Hunter at warwick.ac.uk
Mon Jul 9 13:50:57 UTC 2007


Yes, thanks for this...
 
I've seem the discussion in Scott and Tribble, including (I think this is what you're referring to) Key-key words et al. The authors here obviously recognise the 'problem', and key key words lists, which indicate how many texts terms are key in, are certainly helpful in identifying problems of dispersion 'by eye'.
 
I am now really looking for the 'next step', a measurement which treats the number of texts as an important value in its own right, since from a common sense perspective the number of texts a word is key in seems a more powerful predicor of of its overall significance in a collection of texts than a 'raw' keyness (log-like, chi square, whatever) statistic that doesn't take it into account. What do others think?
 
thanks though-the Scott and tribble book is a goody...
 
 
 

________________________________

From: Ute Römer [mailto:ute.roemer at engsem.uni-hannover.de]
Sent: Mon 09/07/2007 13:49
To: Hunter, Duncan; corpora at uib.no
Subject: RE: [Corpora-List] Keyness across Texts


Dear Duncan, 
 
You may want to check Mike Scott's and Christopher Tribble's book Textual Patterns (Benjamins, 2006, browsable at http://site.ebrary.com/pub/benjamins/Doc?isbn=9789027222930) which contains some very useful chapters on keyness and aboutness (chs. 4 and 5 if I remember correctly) and discusses different ways of identifying keywords in texts and corpora, and of interpreting the search output. 
 
Best wishes... Ute
 
 
************************************************************
 
Dr. Ute Römer
English Department
Leibniz University of Hanover
Königsworther Platz 1
30167 Hannover
Germany
 
Phone: +49 (0)511 762 2997
Fax: +49 (0)511 762 2996
Please note NEW e-mail address: ute.roemer at engsem.uni-hannover.de
http://www.uteroemer.com
http://www.engsem.uni-hannover.de/angli/



________________________________

	From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On Behalf Of Hunter, Duncan
	Sent: Monday, July 09, 2007 2:30 PM
	To: corpora at uib.no
	Subject: [Corpora-List] Keyness across Texts
	
	
	Hello Colleagues! 

	 

	A question about 'key-ness', and key words, in a group of texts...

	 

	I've been mulling over some 'key-ness' statistics for a selection of texts I've been studying and a rather odd question has occurred to me....

	 

	I've been attempting to discover something of the thematic content or 'about-ness' of a group of texts by using a keywords analysis, comparing the word frequency profile of the selection of texts with a comparative group to derive 'key-ness' (via log-likelihood) stats for each word. 

	 

	The key-ness value returned by such a procedure can be misleading because of the problem of dispersal; is the word 'key' because it occurs in a lot of text samples in the corpus or because of a very high usage in only a single text or small group of texts?

	 

	It occurs to me; would it be possible to formulate some kind of measure of a word's 'overall key-ness' in the set of texts we are studying? By multiplying together the word's key score by the number of texts in which it is key, for example. Of course the resulting figure in this case would be totally arbitrary in a sense-even in the non-parametric realm of corpus comparison measurement it would not really 'mean' anything beyond its own description...

	 

	However it seems to me useful to have some kind of quantitative means of describing a word's significance across a range of texts in some way...Any ideas?  I am a relative 'newbie' in this field, surely this issue has been tackled by somebody else somewhere? !

	 

	All the best,

	 

	Duncan Hunter

	<http://valibel.fltr.ucl.ac.be/> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20070709/cfb7b248/attachment.htm>


More information about the Corpora mailing list