[Corpora-List] Distance & word context.
David Reitter
dreitter at inf.ed.ac.uk
Thu May 1 12:45:48 UTC 2008
On 1 May 2008, at 11:31, Krishnamurthy, Ramesh wrote:
> I don't have a reference, but I think I remember Ken Church (?)
> mentioning 'long-range collocations, up to c. 10k words apart' in c.
> 1992?
I know of this paper:
@inproceedings{church2000noriegas,
Address = {Saarbr{\"u}cken, Germany},
Author = {Kenneth W. Church},
Booktitle = {Proceedings of the 18th conference on Computational
linguistics (COLING)},
Pages = {180-186},
Title = {Empirial Estimates of Adaptation: The chance of Two Noriegas
is closer to $p/2$ than $p^2$},
Year = {2000}}
J Washtell wrote:
> Can anybody point me towards works (however old or new) that exploit
> the distance between terms in a corpus (such as, but not restricted
> to, the use of "distance-weighted" context windows). The specific
> applications are not important; I am interested in any works that deal
> with the concept of distance as opposed to (or in addition to) say
> frequency counts or roles/positions within grammatical constructs.
We've been exploiting the distance between repeated syntactic
constructions (not terms) in our work on structural (syntactic)
priming. For a more applied paper, see
@inproceedings{reitter2007predicting,
Address = {Prague, Czech Republic},
Author = {David Reitter and Johanna D. Moore},
Booktitle = {Proceedings of the 45th Annual Meeting of the
Association of Computational Linguistics (ACL)},
Pages = {808-815},
Title = {Predicting Success in Dialogue},
Year = {2007}}
What you call "distance-weighted context window" is roughly what we've
been using to estimate short-term priming levels in dialogue corpora;
however, we've been looking at the decay in repetition probability
with increasing distance. That's not quite the same. Long-term
adaptation is measured slightly differently, but it translates to the
distance paradigm if you compare mean distances between different
terms (or structures). (See the CogSci and EMNLP papers on various
aspects.) In the above paper, it was long-range repetition that
predicted task success, not the short-term distance effects.
There is further work on structural repetition effects, where distance
is observed as one of many factors (not sure if this is right paper -
there is a book as well):
@article{Szmrecsanyi2005habit,
Author = {Benedikt Szmrecsanyi},
Journal = {Corpus Linguistics and Linguistic Theory},
Number = {1},
Pages = {113-149},
Title = {Creatures of Habit: A corpus-linguistic Analysis of
Persistence in Spoken English},
Volume = {1},
Year = {2005}}
If you're primarily interested on the lexical level, Ward and Litman
have been using the metric recently, e.g.,
http://www.cs.pitt.edu/~litman/cpSlate.pdf
http://www.cs.pitt.edu/~litman/primeLearn.pdf
Hope this helps!
--
David Reitter
ICCS/HCRC, Informatics, University of Edinburgh
http://www.david-reitter.com
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list