Arabic-L:GEN:Arabic from PDF response
Dilworth Parkinson
dilworth_parkinson at BYU.EDU
Tue Jun 6 21:21:00 UTC 2006
------------------------------------------------------------------------
Arabic-L: Tue 06 Jun 2006
Moderator: Dilworth Parkinson <dilworth_parkinson at byu.edu>
[To post messages to the list, send them to arabic-l at byu.edu]
[To unsubscribe, send message from same address you subscribed from to
listserv at byu.edu with first line reading:
unsubscribe arabic-l ]
-------------------------Directory------------------------------------
1) Subject:Arabic from PDF response
-------------------------Messages-----------------------------------
1)
Date: 06 Jun 2006
From:medawar at panix.com
Subject:Arabic from PDF response
Dil,
It is possible to reverse engineer the non-standard encoding, letter
by letter. Each letter can be copied and pasted and, knowing the
letter gliph in the PDF, map it to a standard encoding. The
resulting mapping table can then be used in a simple program to
recode the text in a standard encoding.
The procedure above can be complicated if a single non-standard
encoding character is used to represent say two or more Arabic
letters. I saw an example today in a pdf where the Arabic word
"Fi" (meaning "in") was written with a single non-standard
character. The non-standard character consisted of the letter FEH on
to left/top above the YEH. The Yeh extended further to the right
making it come first while reading right to left.
What makes this character combination highly nonstandard is that it
uses a single nonstandard character to encode two standard
characters. I attached a bitmap of the word. The bitmap (550 bytes)
may be stripped by the list software.
bassem
[moderator note: the graphic showed a faa' on top of a yaa' that
came underneath it and to the right of it]
------------------------------------------------------------------------
--
End of Arabic-L: 06 Jun 2006
More information about the Arabic-l
mailing list