[Corpora-List] Corpora Digest, Vol 56, Issue 3

Noam Ordan noam.ordan at gmail.com
Fri Feb 3 16:31:24 UTC 2012


Dear Karen,

A simple thing you can do is use MultiSemCor, a corpus annotated for
word senses cast in the WordNet paradigm -
http://multisemcor.fbk.eu/semcor.php
Each token is annotated for sense and you can easily make a frequency
list of types in the corpus; since each token is mapped unto a wordnet
sense of a particular synset, you will know how ambiguouseach word is
by counting the number of senses for each particular synset. The
corpus, as you'll see in the link, exists also for Italian.

Best,
Noam
------------------

Noam Ordan
Department of Computer Science
University of Haifa

> Hi all,
>
> I could not find the time to precise my question and then received a lot
of
> very interesting answers and references.
> Thank you all for this!
>
> In fact, I should have said that I'm looking for the number of ambiguous
> word tokens in terms of POS in an English corpus, for example from the
Penn
> TreeBank. One solution would be to compute this myself from the Brown
> corpus, but I was curious if there was a ref. on this.
>
> I found this ref for French that says 60% of the French tokens in their
> corpus were non ambiguous in terms of POS:
> Tzoukermann, E.; Radev, D. R. & Gale, W. A. Ken Church, Susan Armstrong,
P.
> I. E. T. & Yarowsky, D. (ed.) Natural Language Processing Using Very Large
> Corpora Tagging french without lexical probabilities -- combining
linguistic
> knowledge and statistical learning Kluwer Academic, 1999
>
> Of course, it all depends on the number of tags, their refinement et so
on.
> It only gives a very rough idea and should be taken in its context,
> obviously. But that's all I need.
>
> Best,
>
> Karen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20120203/1a156df5/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list