[Corpora-List] Measuring relative collocational strength

David Beavan d.beavan at englang.arts.gla.ac.uk
Mon Oct 18 11:05:51 UTC 2010


Hi all

I've got a pretty little toy which lets you visualise the collocates of
two words by comparing their MI. You might be interested:

http://bit.ly/d3TR8E

Should give you the good semantic prosody example of utterly (negative)
vs. absolutely (positive). Collocates of both words are shown, together
with your search words. The collocates near each extremity have a strong
collocational strength with that search word, collocates in the middle
are used equally with both your words. It is a work in progress BTW

Dave

-- 
David Beavan
English Language Computing Manager
University of Glasgow
+44 (0)141 330 2382
http://www.scottishcorpus.ac.uk/
The University of Glasgow, charity number SC004401

-----Original Message-----
From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf
Of Justin Washtell
Sent: 13 October 2010 15:37
To: Alon Lischinsky; Corpora Mailing List
Subject: Re: [Corpora-List] Measuring relative collocational strength

Hi Alon,

MI should work fine in that setting, providing the frequencies of the
terms or their collocates aren't so low as to make the results
undependable. However, if "spud" and "potato" are being studied in the
same corpus, and therefore the marginal probabilities of the collocate
terms do not vary between the two terms, then you do not need to use MI:
conditional probability is probably adequate.

You still have the issue of how to compare these values. I would expect
that the best choice would be to take the Log-Likelihoods of the
conditional probabilities of each collocate term, between the two terms
of interest. That will give you a measure of significance which will
take the marginal frequencies of the collocate terms into account, and
will therefore identify any "suitably surprising" differences, in either
direction (if you supply a threshold).

I'm by no means an expert on these measures, so I should get a second
opinion first, but this seems sensible to me. Unfortunately I cannot
recommend the best software to use for this. I expect there are quite a
few options.

Justin Washtell
University of Leeds

________________________________________
From: corpora-bounces at uib.no [corpora-bounces at uib.no] On Behalf Of Alon
Lischinsky [alon.lischinsky at kultmed.umu.se]
Sent: 13 October 2010 15:01
To: Corpora Mailing List
Subject: [Corpora-List] Measuring relative collocational strength

Hi.

I am looking for help with a kind of statistical measure that has
probably been described in the literature, but which I don't know how to
call. I should point out that I'm relatively new to corpus studies,
having a background in qualitative discourse studies, and am still
coming to terms with some of the technical lexis.

Simply put, I want to find out, given two terms that are seemingly
synonymous but different in absolute frequency (say, "potato" and
"spud"), which (lexical) terms have statistically significant
differences in their collocation with either. I suppose I could simply
look at the full list of collocates for each term ordered by t-score or
MI and spot differences, but since one of the terms is much rarer and MI
scores are affected by absolute frequency, I guess this might lead to
quite a few artifacts.

I don't know of any piece of software that can do that, so I would
appreciate any pointers, or even suggestions as to how to go about doing
it in R or any other statistical software (my programming skills aren't
great, but I trust I could manage with a little guidance).

Best,

Alon Lischinsky

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list