[Corpora-List] homographs in semitic languages

Eric Atwell E.S.Atwell at leeds.ac.uk
Thu Jun 28 20:40:12 UTC 2012


Dear Isabella,

I don't have any quantitative data you ask for - but if ou DO find some,
I'd be very interested to share!

I assume by "homography" rate you mean the percentage of words which are
ambiguous, with more than one meaning. This clearly depends on the
writing system as well as the language. Arabic is (usually) written
without vowels, whereas Maltese (whcih Habash's textbook on Arabic
NLP states is a dialect of Arabic, albeit written in a Roman alphabet) 
does include vowels; so you would expect unvoweled Arabic to be
significantly more ambiguous than voweled Maltese. Other Semitic
languages use yet different scripts (Hebrew, Amharic)  - so it may not
make sense to look for generalisations about "percentage of homography
of texts in semitic languages"

Let me know if you get any quantitative answers please!


Eric Atwell, Leeds University



On Thu, 28 Jun 2012, Isabella Chiari wrote:

> Dear Corpora list members,
> Can anyone point me to papers that refer to estimates of the rate
> (percentage) of homography of texts in semitic languages like Arabic.
> I am interested in quantitative data on word tokens and types and in
> lexicographic entries also, if available.
> Thanks for your help!
> Isabella
> 
> -- 
> 
> Isabella Chiari
> 
> Dipartimento di Scienze documentarie, linguistico-filologiche e geografiche
> 
> Università di Roma “La Sapienza”
> 
> pl.le Aldo Moro, 5, III Piano, Edificio ex Facoltà di Lettere e Filosofia,
> 00185 Roma, tel. +30 06 4991 3575
> 
> E.mail: isabella.chiari at uniroma1.it
> 
> Website: www.alphabit.net
> 
> 
>

-- 
Eric Atwell, Associate Professor, Language research group,
  I-AIBS Institute for Artificial Intelligence and Biological Systems
  School of Computing, Faculty of Engineering, UNIVERSITY OF LEEDS
  Leeds LS2 9JT, England.        TEL: 0113-3435430  FAX: 0113-3435468
  WWW: http://www.comp.leeds.ac.uk/eric
       http://www.comp.leeds.ac.uk/nlp
       http://www.comp.leeds.ac.uk/arabic
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list