[Corpora-List] Ambiguous words in English and their frequency

Karen Fort karen.fort at inist.fr
Thu Feb 2 13:24:47 UTC 2012


Definitely!

Thanks Kevin.

Le 02/02/2012 13:47, Kevin B. Cohen a écrit :
> Last semester, one of my students did a project that showed the rate
> of growth in ambiguous word/POS pairs as you read in increasing
> amounts of text in, if I remember correctly, the WSJ and one
> biomedical corpus.  Would his results be helpful?
>
> Kev
>
> On Thu, Feb 2, 2012 at 10:25 AM, Karen Fort<karen.fort at inist.fr>  wrote:
>> Hi all,
>>
>> I could not find the time to precise my question and then received a lot of
>> very interesting answers and references.
>> Thank you all for this!
>>
>> In fact, I should have said that I'm looking for the number of ambiguous
>> word tokens in terms of POS in an English corpus, for example from the Penn
>> TreeBank. One solution would be to compute this myself from the Brown
>> corpus, but I was curious if there was a ref. on this.
>>
>> I found this ref for French that says 60% of the French tokens in their
>> corpus were non ambiguous in terms of POS:
>> Tzoukermann, E.; Radev, D. R.&  Gale, W. A. Ken Church, Susan Armstrong, P.
>> I. E. T.&  Yarowsky, D. (ed.) Natural Language Processing Using Very Large
>> Corpora Tagging french without lexical probabilities -- combining linguistic
>> knowledge and statistical learning Kluwer Academic, 1999
>>
>> Of course, it all depends on the number of tags, their refinement et so on.
>> It only gives a very rough idea and should be taken in its context,
>> obviously. But that's all I need.
>>
>> Best,
>>
>> Karen
>>
>>
>> Le 26/01/2012 10:39, Eckhard Bick a écrit :
>>>
>>> Hello again,
>>>
>>> I forgot to add, that the ambiguous word tokens in my English test run
>>> amounted to 49.8%.
>>>
>>> Best,
>>> Eckhard
>>>
>>> On 2012-01-25 20:33, FORT, Karen wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I need to find this information (the proportion of ambiguous words in
>>>> English and their frequency).
>>>> For example, we know that in French 8% of the words represent 30% of the
>>>> ambiguity.
>>>> Of course, it's very rough, but it's only to have a rough idea.
>>>>
>>>> Can somebody help me with this (of course, I searched for a ref but could
>>>> not find anything precise)?
>>>>
>>>> Thank you in advance,
>>>>
>>>> Regards,
>>>>
>>>>
>>>> Karën FORT
>>>> Ingénieure/Engineer et/and doctorante/PhD student
>>>> INIST-CNRS / LIPN
>>>> 2, allée de Brabois
>>>> 54500 Vandoeuvre-lès-Nancy
>>>> France
>>>> Bureau/Office: H112
>>>> +33 (0)3 83 50 46 36
>>>>
>>>> http://www-lipn.univ-paris13.fr/~fort/
>>>> _______________________________________________
>>>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>>>> Corpora mailing list
>>>> Corpora at uib.no
>>>> http://mailman.uib.no/listinfo/corpora
>>>>
>>>
>>>
>>
>> --
>> Karën FORT
>> Ingénieure/Engineer et/and doctorante/PhD student
>> INIST-CNRS / LIPN
>> 2, allée de Brabois
>> 54500 Vandoeuvre-lès-Nancy
>> France
>> Bureau/Office: H112
>> +33 (0)3 83 50 46 36
>>
>> http://www-lipn.univ-paris13.fr/~fort/
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>
>
>

-- 
Karën FORT
Ingénieure/Engineer et/and doctorante/PhD student
INIST-CNRS / LIPN
2, allée de Brabois
54500 Vandoeuvre-lès-Nancy
France
Bureau/Office: H112
+33 (0)3 83 50 46 36

http://www-lipn.univ-paris13.fr/~fort/

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list