Corpora: Arabic vs Spanish diacritics

Tim Buckwalter TimBuckwalter at aol.com
Mon Apr 23 17:41:34 UTC 2001


Arabic short vowels and diacritics are zero-width optional elements that
are used occasionally to disambiguate homographs when there is
insufficient context for the reader to do so. A good writer anticipates
these potential ambiguities and inserts shorts vowels and diacritics as
needed, such as to disambiguate the Arabic for "Amman" and "Oman" or to
indicate the passive voice. Occasionally one hears professional news
announcers pause and backtrack to re-read a passage with a different
"vocalization" of a word. Short vowels and diacritics are useful for
learners, but once you know the language they are more of a hindrance.
Arabs restrict their use mainly to poetry and religious texts.

The big difference between Arabic and accented languages such as Spanish
in this regard is that accent-less Spanish is probably sub-standard or
at least informal orthography. Whereas it is the norm for an entire
formal Arabic newspaper to have only a dozen or so thoughtfully-placed
short vowels & diacritics, an unaccented Spanish newspaper would be hard
to imagine (I've never seen one, at least), or one with accents placed
only where there is not enough context to know what is intended.

My impression is that the Arabs will make less and less use of these
short vowels and diacritics in the future, possibly even dropping them
entirely (as the Israelis have done with modern Hebrew). In our
discussions with cell phone manufacturers I have noted the general
expectation that text input on mobile devices will neither display nor
provide an input method for Arabic short vowels and diacritics.

Tim Buckwalter
Senior Language Engineer
AOL Mobile (formerly Tegic)
1000 Dexter Ave N, Suite 300
Seattle, WA 98109-3574
206.268.7552 phone
206.343.7004 fax
206.343.7001 front desk
TimBuckwalter at aol.com
www.tegic.com

Steven Krauwer wrote:
>
> Rene.Valdes at lhsl.com wrote:
> >
> > In support of Monika's argument, I'll offer the following two sentences:
> >
> >      Ya termino.        (I'm finishing soon.)
> >      Ya terminó.        (It's already finished.)
> >
> > Without the diacritic, you would not be able to tell which one of these two
> > meanings to assign to this sentence.  I use diacritics whenever possible,
> > even at the risk of having my text become garbage when it travels through
> > cyberspace.
> >
> > Another interesting case is the very important distinction between año and
> > ano, two nouns with quite different meanings.
>
> You're the experts, so I won't even dream of challenging what
> you  are saying, but I am really curious to hear the opinion of
> colleagues from the Arabic speaking world, as they seem to be
> able to live happily with unvocalized written texts.
>
> Should I infer that Spanish is more ambiguous in this respect,
> or that Arabic speakers (or rather: readers) are more tolerant,
> or that Spanish diacritics and Arabic vowels are different
> animals?
>
> Steven



More information about the Corpora mailing list