[Corpora-List] Ambiguous words in English and their frequency
Karen Fort
karen.fort at inist.fr
Thu Feb 2 10:25:18 UTC 2012
Hi all,
I could not find the time to precise my question and then received a lot
of very interesting answers and references.
Thank you all for this!
In fact, I should have said that I'm looking for the number of ambiguous
word tokens in terms of POS in an English corpus, for example from the
Penn TreeBank. One solution would be to compute this myself from the
Brown corpus, but I was curious if there was a ref. on this.
I found this ref for French that says 60% of the French tokens in their
corpus were non ambiguous in terms of POS:
Tzoukermann, E.; Radev, D. R. & Gale, W. A. Ken Church, Susan Armstrong,
P. I. E. T. & Yarowsky, D. (ed.) Natural Language Processing Using Very
Large Corpora Tagging french without lexical probabilities -- combining
linguistic knowledge and statistical learning Kluwer Academic, 1999
Of course, it all depends on the number of tags, their refinement et so
on. It only gives a very rough idea and should be taken in its context,
obviously. But that's all I need.
Best,
Karen
Le 26/01/2012 10:39, Eckhard Bick a écrit :
> Hello again,
>
> I forgot to add, that the ambiguous word tokens in my English test run
> amounted to 49.8%.
>
> Best,
> Eckhard
>
> On 2012-01-25 20:33, FORT, Karen wrote:
>> Hi all,
>>
>> I need to find this information (the proportion of ambiguous words in English and their frequency).
>> For example, we know that in French 8% of the words represent 30% of the ambiguity.
>> Of course, it's very rough, but it's only to have a rough idea.
>>
>> Can somebody help me with this (of course, I searched for a ref but could not find anything precise)?
>>
>> Thank you in advance,
>>
>> Regards,
>>
>>
>> Karën FORT
>> Ingénieure/Engineer et/and doctorante/PhD student
>> INIST-CNRS / LIPN
>> 2, allée de Brabois
>> 54500 Vandoeuvre-lès-Nancy
>> France
>> Bureau/Office: H112
>> +33 (0)3 83 50 46 36
>>
>> http://www-lipn.univ-paris13.fr/~fort/
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>
>
--
Karën FORT
Ingénieure/Engineer et/and doctorante/PhD student
INIST-CNRS / LIPN
2, allée de Brabois
54500 Vandoeuvre-lès-Nancy
France
Bureau/Office: H112
+33 (0)3 83 50 46 36
http://www-lipn.univ-paris13.fr/~fort/
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list