[Corpora-List] Distance & word context.

Thu May 1 12:45:48 UTC 2008

On 1 May 2008, at 11:31, Krishnamurthy, Ramesh wrote:

> I don't have a reference, but I think I remember Ken Church (?)  
> mentioning 'long-range collocations, up to c. 10k words apart' in c.  
> 1992?

I know of this paper:

@inproceedings{church2000noriegas,
	Address = {Saarbr{\"u}cken, Germany},
	Author = {Kenneth W. Church},
	Booktitle = {Proceedings of the 18th conference on Computational  
linguistics (COLING)},
	Pages = {180-186},
	Title = {Empirial Estimates of Adaptation: The chance of Two Noriegas  
is closer to $p/2$ than $p^2$},
	Year = {2000}}

J Washtell wrote:

> Can anybody point me towards works (however old or new) that exploit
> the distance between terms in a corpus (such as, but not restricted
> to, the use of "distance-weighted" context windows). The specific
> applications are not important; I am interested in any works that deal
> with the concept of distance as opposed to (or in addition to) say
> frequency counts or roles/positions within grammatical constructs.

We've been exploiting the distance between repeated syntactic  
constructions (not terms) in our work on structural (syntactic)  
priming.  For a more applied paper, see

@inproceedings{reitter2007predicting,
	Address = {Prague, Czech Republic},
	Author = {David Reitter and Johanna D. Moore},
	Booktitle = {Proceedings of the 45th Annual Meeting of the  
Association of Computational Linguistics (ACL)},
	Pages = {808-815},
	Title = {Predicting Success in Dialogue},
	Year = {2007}}

What you call "distance-weighted context window" is roughly what we've  
been using to estimate short-term priming levels in dialogue corpora;  
however, we've been looking at the decay in repetition probability  
with increasing distance.  That's not quite the same.  Long-term  
adaptation is measured slightly differently, but it translates to the  
distance paradigm if you compare mean distances between different  
terms (or structures).  (See the CogSci and EMNLP papers on various  
aspects.)  In the above paper, it was long-range repetition that  
predicted task success, not the short-term distance effects.

There is further work on structural repetition effects, where distance  
is observed as one of many factors (not sure if this is right paper -  
there is a book as well):

@article{Szmrecsanyi2005habit,
	Author = {Benedikt Szmrecsanyi},
	Journal = {Corpus Linguistics and Linguistic Theory},
	Number = {1},
	Pages = {113-149},
	Title = {Creatures of Habit: A corpus-linguistic Analysis of  
Persistence in Spoken English},
	Volume = {1},
	Year = {2005}}

If you're primarily interested on the lexical level, Ward and Litman  
have been using the metric recently, e.g.,

http://www.cs.pitt.edu/~litman/cpSlate.pdf
http://www.cs.pitt.edu/~litman/primeLearn.pdf

Hope this helps!

--
David Reitter
ICCS/HCRC, Informatics, University of Edinburgh
http://www.david-reitter.com

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora