[Corpora-List] Metrics used for word clusters analysis ...

Adam Kilgarriff adam at lexmasterclass.com
Tue Jul 24 19:35:20 UTC 2012


Albrecht

Three papers I've been impressed by are

-  James Curran's thesis work, which takes 'evaluation against manual
thesauruses' about as far as it can go
-  Julie Weeds and David Weir (2005) Co-occurrence Retrieval: a General
Framework for Lexical Distributional   Similarity.*Computational
Linguistics *31(4) 439-476.
[pdf<http://aclweb.org/anthology-new/J/J05/J05-4002.pdf>]
[bib <http://www.sussex.ac.uk/Users/davidw/resources/bibtex/cl05.bib>]
   (interprets different formulae for distributional similarity in terms of
precision and recall)
- Marco Baroni, Alessandro
Lenci<http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/l/Lenci:Alessandro.html>:
Distributional Memory: A General Framework for Corpus-Based Semantics.
Computational
Linguistics 36<http://www.informatik.uni-trier.de/~ley/db/journals/coling/coling36.html#BaroniL10>(4):
673-721 (2010)
    (with a whole suite of 'extrinsic evaluation' tasks for comparing their
model with others)

Best

Adam

On 24 July 2012 12:25, Albretch Mueller <lbrtchx at gmail.com> wrote:

> ~
>  What are the kinds of metrics used for word clusters analysis and
> synonymy?
> ~
>  In Speech and Language Processing by Jurafsky & Martin (2004):
> chapter 17; and Foundations of Statistical Natural Language
> Processing, Manning & Schuetze (1999): chapter 8; you find some
> introductory treatment of the topic, but what I am looking for is a
> corpora-based thorough discussion of the pros and cons of the various
> similarity models.
> ~
>  I could imagine there is lots of research going on on that topic
> since IR depends very much on it and, to me, the metrics behind
> similarity models should be language-independent
> ~
>  A simple search on "word clusters" would overwhelm you with hits and
> an attempt to narrow down a search to:
> ~
>  "word clusters" corpus linguistics metrics n-grams cosine similarity
> synonym
> ~
>  gives you few documents
> ~
>  Any good/current papers on that topic?
> ~
>  lbrtchx
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>



-- 
========================================
Adam Kilgarriff <http://www.kilgarriff.co.uk/>
adam at lexmasterclass.com
Director                                    Lexical Computing
Ltd<http://www.sketchengine.co.uk/>

Visiting Research Fellow                 University of
Leeds<http://leeds.ac.uk>

*Corpora for all* with the Sketch Engine <http://www.sketchengine.co.uk>

                        *DANTE: a lexical database for
English<http://www.webdante.com>
                  *
========================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20120724/be9fed28/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list