<div dir="ltr"><span>Dear Karen,</span><br><br><span>A simple thing you can do is use MultiSemCor, a corpus annotated for</span><br><span>word senses cast in the WordNet paradigm -</span><br>

<a href="http://multisemcor.fbk.eu/semcor.php" target="_blank">http://multisemcor.fbk.eu/semcor.php</a><br><span>Each token is annotated for sense and you can easily make a frequency</span><br><span>list of types in the corpus; since each token is mapped unto a wordnet</span><br>


<span>sense of a particular synset, you will know how ambiguouseach word is</span><br><span>by counting the number of senses for each particular synset. The</span><br><span>corpus, as you'll see in the link, exists also for Italian.</span><br>


<br><span>Best,</span><br><span>Noam</span><br><span>------------------</span><div><br></div><span><font>Noam Ordan</font></span><div><span><font>Department of Computer Science<br>

University of Haifa</font></span>

</div><div><span><font><br></font></span></div><div><span style>> Hi all,</span><br style><span style>></span><br style><span style>> I could not find the time to precise my question and then received a lot of</span><br style>

<span style>> very interesting answers and references.</span><br style><span style>> Thank you all for this!</span><br style><span style>></span><br style><span style>> In fact, I should have said that I'm looking for the number of ambiguous</span><br style>

<span style>> word tokens in terms of POS in an English corpus, for example from the Penn</span><br style><span style>> TreeBank. One solution would be to compute this myself from the Brown</span><br style><span style>> corpus, but I was curious if there was a ref. on this.</span><br style>

<span style>></span><br style><span style>> I found this ref for French that says 60% of the French tokens in their</span><br style><span style>> corpus were non ambiguous in terms of POS:</span><br style><span style>> Tzoukermann, E.; Radev, D. R. & Gale, W. A. Ken Church, Susan Armstrong, P.</span><br style>

<span style>> I. E. T. & Yarowsky, D. (ed.) Natural Language Processing Using Very Large</span><br style><span style>> Corpora Tagging french without lexical probabilities -- combining linguistic</span><br style>

<span style>> knowledge and statistical learning Kluwer Academic, 1999</span><br style><span style>></span><br style><span style>> Of course, it all depends on the number of tags, their refinement et so on.</span><br style>

<span style>> It only gives a very rough idea and should be taken in its context,</span><br style><span style>> obviously. But that's all I need.</span><br style><span style>></span><br style><span style>> Best,</span><br style>

<span style>></span><br style><span style>> Karen</span>

</div></div>