[Corpora-List] Ambiguous words in English and their frequency

Karen Fort karen.fort at inist.fr
Thu Feb 2 10:25:18 UTC 2012


Hi all,

I could not find the time to precise my question and then received a lot 
of very interesting answers and references.
Thank you all for this!

In fact, I should have said that I'm looking for the number of ambiguous 
word tokens in terms of POS in an English corpus, for example from the 
Penn TreeBank. One solution would be to compute this myself from the 
Brown corpus, but I was curious if there was a ref. on this.

I found this ref for French that says 60% of the French tokens in their 
corpus were non ambiguous in terms of POS:
Tzoukermann, E.; Radev, D. R. & Gale, W. A. Ken Church, Susan Armstrong, 
P. I. E. T. & Yarowsky, D. (ed.) Natural Language Processing Using Very 
Large Corpora Tagging french without lexical probabilities -- combining 
linguistic knowledge and statistical learning Kluwer Academic, 1999

Of course, it all depends on the number of tags, their refinement et so 
on. It only gives a very rough idea and should be taken in its context, 
obviously. But that's all I need.

Best,

Karen


Le 26/01/2012 10:39, Eckhard Bick a écrit :
> Hello again,
>
> I forgot to add, that the ambiguous word tokens in my English test run
> amounted to 49.8%.
>
> Best,
> Eckhard
>
> On 2012-01-25 20:33, FORT, Karen wrote:
>> Hi all,
>>
>> I need to find this information (the proportion of ambiguous words in English and their frequency).
>> For example, we know that in French 8% of the words represent 30% of the ambiguity.
>> Of course, it's very rough, but it's only to have a rough idea.
>>
>> Can somebody help me with this (of course, I searched for a ref but could not find anything precise)?
>>
>> Thank you in advance,
>>
>> Regards,
>>
>>
>> Karën FORT
>> Ingénieure/Engineer et/and doctorante/PhD student
>> INIST-CNRS / LIPN
>> 2, allée de Brabois
>> 54500 Vandoeuvre-lès-Nancy
>> France
>> Bureau/Office: H112
>> +33 (0)3 83 50 46 36
>>
>> http://www-lipn.univ-paris13.fr/~fort/
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>
>

-- 
Karën FORT
Ingénieure/Engineer et/and doctorante/PhD student
INIST-CNRS / LIPN
2, allée de Brabois
54500 Vandoeuvre-lès-Nancy
France
Bureau/Office: H112
+33 (0)3 83 50 46 36

http://www-lipn.univ-paris13.fr/~fort/

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list