[Corpora-List] Distance & word context

Roger Mitton roger at dcs.bbk.ac.uk
Fri May 2 13:32:05 UTC 2008


You might have a look at:

Dave Hardcastle
An examination of word association scoring using distributional analysis in the 
BNC: what is an interesting score and what is a useful system?
Proc Corp Ling 2005, Univ Birmingham, July 14-17 2005
www.corpus.bham.ac.uk/PCLC/ (under "Language Processing and Corpus Tool")

Roger Mitton

Message: 6
Date: Thu, 01 May 2008 20:24:27 +0100
From: J Washtell <lec3jrw_AT_leeds.ac.uk>
Subject: Distance & word context.
To: corpora_AT_uib.no


Thank you very much for your feedback.

I was referring to what one might call the linear "physical distance"  
or "narrative distance" in corpora (as would correlate with the time  
between terms occurring as a reader reads, if you like). Hence my  
citing "distance-weighted context windows" as an example of one way in  
which this is considered (also referred to as "ramped" windows etc).  
As there doesn't seem to be a consistent established terminology - at  
least none that I'm familiar with - Google is unsatisfactory by  
itself. I'm definitely not asking about semantic distance.

What I ceratinly didn't make clear is that I'm particularly interested  
in approaches that have been used to *mine* these relationships from  
corpora, as well as general linguistic discussions concerning their  
existence, rather than formal ways in which they can be expressed (as  
per context-free grammar) - although I am certainly interested in the  
models that have made such mining possible.

Do you have any more insights which I might find useful in light of  
this? Perhaps something that you might expect to have fewer hits  
(owing to our now hopefully increased precision)?

Many thanks!

Justin Washtell
University of Leeds

Corpora mailing list
Corpora at uib.no

More information about the Corpora mailing list