[Corpora-List] Ambiguous words in English and their frequency

Assaf Urieli admin at joli-ciel.com
Thu Feb 2 13:26:41 UTC 2012


For what it's worth, the following link gives what I imagine to be an 
upper bound on a non-native speaker's oral ambiguity measure in just 
about any language :)

A story in which each syllable is pronounced /shi/

http://books.google.fr/books?id=vu3lRLVUta8C&pg=PA30&lpg=PA30&dq=a+story+in+which+each+syllable+is+pronounced+shi&source=bl&ots=htFQwMsQuR&sig=_NQkI_j64m9QAPJsuD-8rEmsx7o&hl=en&sa=X&ei=sG8qT-GxIs6h-QbHvcz4DQ&sqi=2&redir_esc=y#v=onepage&q=a%20story%20in%20which%20each%20syllable%20is%20pronounced%20shi&f=false

Best regards,
Assaf

On 02/02/2012 11:25, Karen Fort wrote:
> Hi all,
>
> I could not find the time to precise my question and then received a 
> lot of very interesting answers and references.
> Thank you all for this!
>
> In fact, I should have said that I'm looking for the number of 
> ambiguous word tokens in terms of POS in an English corpus, for 
> example from the Penn TreeBank. One solution would be to compute this 
> myself from the Brown corpus, but I was curious if there was a ref. on 
> this.
>
> I found this ref for French that says 60% of the French tokens in 
> their corpus were non ambiguous in terms of POS:
> Tzoukermann, E.; Radev, D. R. & Gale, W. A. Ken Church, Susan 
> Armstrong, P. I. E. T. & Yarowsky, D. (ed.) Natural Language 
> Processing Using Very Large Corpora Tagging french without lexical 
> probabilities -- combining linguistic knowledge and statistical 
> learning Kluwer Academic, 1999
>
> Of course, it all depends on the number of tags, their refinement et 
> so on. It only gives a very rough idea and should be taken in its 
> context, obviously. But that's all I need.
>
> Best,
>
> Karen
>
>
> Le 26/01/2012 10:39, Eckhard Bick a écrit :
>> Hello again,
>>
>> I forgot to add, that the ambiguous word tokens in my English test run
>> amounted to 49.8%.
>>
>> Best,
>> Eckhard
>>
>> On 2012-01-25 20:33, FORT, Karen wrote:
>>> Hi all,
>>>
>>> I need to find this information (the proportion of ambiguous words 
>>> in English and their frequency).
>>> For example, we know that in French 8% of the words represent 30% of 
>>> the ambiguity.
>>> Of course, it's very rough, but it's only to have a rough idea.
>>>
>>> Can somebody help me with this (of course, I searched for a ref but 
>>> could not find anything precise)?
>>>
>>> Thank you in advance,
>>>
>>> Regards,
>>>
>>>
>>> Karën FORT
>>> Ingénieure/Engineer et/and doctorante/PhD student
>>> INIST-CNRS / LIPN
>>> 2, allée de Brabois
>>> 54500 Vandoeuvre-lès-Nancy
>>> France
>>> Bureau/Office: H112
>>> +33 (0)3 83 50 46 36
>>>
>>> http://www-lipn.univ-paris13.fr/~fort/
>>> _______________________________________________
>>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>>> Corpora mailing list
>>> Corpora at uib.no
>>> http://mailman.uib.no/listinfo/corpora
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20120202/b9534bcc/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list