Arabic-L:GEN:Arabic from PDF response

Dilworth Parkinson dilworth_parkinson at BYU.EDU
Tue Jun 6 21:21:00 UTC 2006


------------------------------------------------------------------------
Arabic-L: Tue 06 Jun 2006
Moderator: Dilworth Parkinson <dilworth_parkinson at byu.edu>
[To post messages to the list, send them to arabic-l at byu.edu]
[To unsubscribe, send message from same address you subscribed from to
listserv at byu.edu with first line reading:
            unsubscribe arabic-l                                      ]

-------------------------Directory------------------------------------

1) Subject:Arabic from PDF response

-------------------------Messages-----------------------------------
1)
Date: 06 Jun 2006
From:medawar at panix.com
Subject:Arabic from PDF response

Dil,

It is possible to reverse engineer the non-standard encoding, letter  
by letter.  Each letter can be copied and pasted and, knowing the  
letter gliph in the PDF, map it to a standard encoding.  The  
resulting mapping table can then be used in a simple program to  
recode the text in a standard encoding.

The procedure above can be complicated if a single non-standard  
encoding character is used to represent say two or more Arabic  
letters.  I saw an example today in a pdf where the Arabic word  
"Fi" (meaning "in") was written with a single non-standard  
character.  The non-standard character consisted of the letter FEH on  
to left/top above the YEH.  The Yeh extended further to the right  
making it come first while reading right to left.

What makes this character combination highly nonstandard is that it  
uses a single nonstandard character to encode two standard  
characters.  I attached a bitmap of the word.  The bitmap (550 bytes)  
may be stripped by the list software.

bassem

[moderator note:  the graphic showed a faa' on top of a yaa' that  
came underneath it and to the right of it]

------------------------------------------------------------------------ 
--
End of Arabic-L:  06 Jun 2006



More information about the Arabic-l mailing list