Corpora: Re: Arabic vs Spanish diacritics

Steven Krauwer Steven.Krauwer at let.uu.nl
Mon Apr 23 22:23:46 UTC 2001


Tim Buckwalter wrote:

> The big difference between Arabic and accented languages such as Spanish
> in this regard is that accent-less Spanish is probably sub-standard or
> at least informal orthography. Whereas it is the norm for an entire
> formal Arabic newspaper to have only a dozen or so thoughtfully-placed
> short vowels & diacritics, an unaccented Spanish newspaper would be hard
> to imagine (I've never seen one, at least), or one with accents placed
> only where there is not enough context to know what is intended.

So, the picture is (in a very black and white version): the
Spanish have fewer diacritics (both types and tokens) but use
them
virtually all the time, and the Arabs have a lot more of them,
but they hardly ever use them.

I have three questions:
- does this difference have any measurable effect on the
  learning process (for native speakers who learn to read
  and write)
- same for parsing and processing by humans
- same for NLP

Any pointers to any empirical data?

I realize that we are now really moving away from this list's
core business, so I'll be happy to continue this discussion
somewhere else if people prefer that.

[ One place to go could be the email list
elsnet-arabic at elsnet.org
  that we have just set up for discussing Arabic NLP and Speech
  processing issues, but that hasn't been officially launched
  yet. Subscription is already open at
  http://utrecht.elsnet.org/subscriptions.html ]

Steven



More information about the Corpora mailing list