[Corpora-List] Arabic transliteration

Nizar Habash habash at cs.columbia.edu
Sun Oct 21 21:45:05 UTC 2007


Hi, there are many different types of transliterations for Arabic
that can serve different purposes. Often a phonetic "transcription"
is good enough for linguistic papers; however, discussions of Arabic
orthographic peculiarities are much harder without a one-to-one
"transliteration" that romanizes the Arabic alphabet.  There is a
chapter in a new book published this year in which a proposal is
made for a transliteration for Arabic (that addresses most of the
issues people have with Buckwalter's transliteration). This
approach was used in all the papers in that book:

Habash, Nizar,  Abdelhadi Soudi and Tim Buckwalter. On Arabic
Transliteration. In Arabic Computational Morphology:
Knowledge-based and Empirical Methods. Soudi, Abdelhadi; van den
Bosch, Antal; Neumann, Günter (Eds.), 2007. ISBN: 978-1-4020-6045-8

Online:
http://www.nizarhabash.com/publications/chapter2BisHabash_et_al-2007-web.doc


-- 
Nizar



Quoting Ron Artstein <artstein at essex.ac.uk>:

> On Sat, 20 Oct 2007, Eric Atwell wrote:
>
> >>> Why do you want to transliterate your Arabic text to the
> Latin alphabet?
> >>
> >> Generally when we submit a research paper (written in English/
> French)
> >> on Arabic NLP we are asked to present the latin-character
> >> transliteration for the arabic sentences !
> >
> > For this, I suggest the Buckwalter transliteration is not the
> > best solution, as it maps some Araic letters to ASCII
> characters
> > whcih are not roman alphabet letters, making the transcription
> > simple to process but hard for humans to read.
>
> I wholeheartedly agree with Eric: Buckwalter transliteration is
> usually not appropriate for a published paper because only a
> handful of people working in Arabic NLP can read it fluently. The
> appropriate transliteration or transcription depends on the
> expected audience -- Arabists/Orientalists would probably be more
> comfortable reading a ZDMG-style transcription, whereas general
> linguists would probably prefer one based on IPA. The following
> page has a table comparing the two:
>
> http://en.wikipedia.org/wiki/DIN-31635
>
> -Ron.
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list