Arabic-L:LING:Needs Arabic/Farsi/Dari/Pashto help and refs

Dilworth Parkinson dil at BYU.EDU
Fri Apr 10 16:57:46 UTC 2009


------------------------------------------------------------------------
Arabic-L: Fri 10 Apr 2009
Moderator: Dilworth Parkinson <dilworth_parkinson at byu.edu>
[To post messages to the list, send them to arabic-l at byu.edu]
[To unsubscribe, send message from same address you subscribed from to
listserv at byu.edu with first line reading:
            unsubscribe arabic-l                                      ]

-------------------------Directory------------------------------------

1) Subject:Needs Arabic/Farsi/Dari/Pashto help and refs

-------------------------Messages-----------------------------------
1)
Date: 10 Apr 2009
From:Joel Shapiro <jrs_14618 at yahoo.com>
Subject:Needs Arabic/Farsi/Dari/Pashto help and refs

Assalam u Alaikum / Senga Yai,

My name is Joel Shapiro.  I consider myself a journeyman jack of all  
trades
linguist (Arabic and Hebrew) and application (Python) and web page  
programmer
(Javascript and PHP).

I have recently realized what I consider one of my life's ambitions of  
creating a
functioning English to Arabic Transcription Veracity Verification  
(MULTVV) application/
web page at the following URL:

http://enartrans.com/arabictranstest.php

I have been a long time subscriber of this Arabic-L Digest group for a  
couple years; since
March 3, 2007 to be exact when my English to Arabic transcription  
veracity verifier was
an extremely awkward, esoteric, barely functioning app to what my web  
page is today.
I look forward to it being constant work in progress for the rest of  
my life; improving and
tweaking it's features, ultimately GUI's, transcription processing,  
incorporating new
languages etc.

The crux of my post here is to inquire of Arabic-L members of two  
things:

1.) Do you know of and then would be willing to refer me to any  
"already established"
electronic" English <----> Arabic/ Farsi/Dari/Urdu/Pashto lists where  
the transcriptions are
appearing the given target language (i.e. not phonetically in  
English).  You would think
such lists abound ... Au Contraire!  Here are two examples of already  
established English to
Arabic transcription references I've found:

http://www.behindthename.com/nmc/ara.php
http://www.un.org/sc/committees/1267/consolidatedlist.htm#talibanind

(If such Arabic transcription lists I seek are hard to find ... Ha!  
Try finding Pashto or Urdu!)

2.) Do you have references to any studies of work that identify from  
attributes or
aspects of names, proper nouns or terms written English as to their  
origin.  As I will
explain a little more in depth shortly to realize the Arabic  
transcriptions of English
terms that are Arabic in origin I have to implement some very  
specialized and intricate
logic in what I've termed my "filters".  The idea is to not implement  
this logic
unnecessarily i.e. for each English term I process or disposition for  
Arabic, as I've
found it can be counterproductive.   On a very simple one indicator  
I've come up with
on my own is if the English term begins with "AL" or "EL".  I will  
leave it at this unless any
of you request further clarification in this regard.  With respect to  
the un.org list I've
been confronted with some new challenges I will describe shortly.

Following is verbiage I posted to others before to give you sense of  
context (as
I repeat a few things I've already mentioned up to this juncture.

A hearty Shukran Jiddan, Mam'noon,  Me Herabani, Tashakkur for any  
help from
any of you.

Joel S.

P.S. It is my ultimate goal to create the ability for the user to  
input their term
in their own native or preferred language that is represented on the  
Internet and
in one stroke be able to search the Internet for the vast majority web  
pages that
contain the term again, in all languages represented on the Internet.   
Obviously!
my grandiose objective is lifetimes of work but it is something I'm  
interested
spending the rest of my life pursuing ... because it -IS- realistic.   
I could not have
picked a more difficult language to start my transcription veracity  
verification
work than Arabic albeit a truly beautiful language indeed!


==============================================================


Transcription is the formal terminology for spelling, especially in a  
phonetic
respect from one language to another.

In a nutshell or in other words (no pun intended) what I do is take  
the user's
(your) English input name, proper noun or term and first put it  
through several
English to Arabic "transcription engines" where each outputs one  
transcription
(variation) or one transcription engine that outputs a few possible or
probable valid alternative transcription variations and then run each  
Arabic
variation respectively through a search engine.

 From the URL count return or "number of hits" I categorize and quantify
which one perhaps of several transcription variations are the most
recognized or accepted by the world Arabic speaking community and to  
what
extent (order of magnitude).

 From the transcription categorization coupled with an inline regular  
Arabic
word dictionary, a very powerful native language search engine utility  
can
be realized.  The greater the accuracy of the transcription  
classification
and regular word dictionary; the greater the effectiveness of the  
combination
being a multilingual search engine utility like no other.

While my language app/web page is an entity in its own right, it is also
inherently powerful adjunct to the relatively new Google Cross Language
Information Retrieval Language tool or "CLIR" where Google specifically
prompts the user for alternative transcriptions or regular word  
translations
if the one it (Google) has come up with does not suit his/her needs.

Currently my page is currently "tooled" only for Arabic.  My objective  
is to
ultimately to the same for all languages represented by web pages on the
Internet.

Tapping off of my established "base" Arabic transcription programming
infrastructure I want to expand my application into Farsi/Dari, Urdu and
Pashto and I need some help re: very fine clarifications and  
information for
a better transcription result for you.  Shuran Jiddan/Tashakkur for any
help you can provide for any references or web pages containing already
established English to native Farsi/Dari, Urdu and/or Pashto.  I would
welcome and appreciated more Arabic examples as well.  The more examples
the better!  This is the crux of my post here.

Without getting into too many specifics and technicalities at this  
juncture,
my first inclination to to try to find existing transcription engines  
for a
given target language filtering out transcriptions which blatantly  
have no
phonetic correspondence to the English input term through a phonetic
transcription "post-processor filter" of my own (design) ... no sense in
reinventing the wheel.

Arabic more than other languages seem to have more advanced or developed
transcription engines by far compared to others.  Thus, my plan is to  
tweak
the output of the Arabic transcription engine(s) for phonetically  
similar
languages such as Dar/Farsi, Urdu, Pashto etc., otherwise build my own
transcription engines from scratch which I foresee doing once I  
address say
Hindi (way down the road).

For instance when I "Google map" Karachi (Pakistan) Google at the  
following
URL, Google  displays the Urdu representation which contains the the  
Character "Che" which is not present in the Arabic character set.

http://www.nationsonline.org/oneworld/map/google_map_Karachi.htm

كراچى

Notice the contrast of corresponding the two Arabic transcription  
variations
of Karachi from my web page:

كراتشي

كاراتشاي

(Hopefully the preceding Urdu and Arabic are coming to you in human
readable Arabic characters i.e. HTML entities and not being converted to
some kind of encoding representation.  Perhaps some of you are familar
with some of the encoding issues I've encountered in your own Semitic
/Indo-European language work.)

I envision from the from the "basic Arabic infrastructure" to "map" or
"translate" to the Urdu Che using specialized logic (i.e. another  
filter) from
the corresponding Arabic characters that would equate to it.  Likewise I
trust you can see what I'm to trying to accomplish here as well.

===================================================

Here is the basic instructions:

Bring up;

http://enartrans.com/arabictranstest.php

(On some versions and/or settings of IE the result matrix may disappear.
If so try using only IE Version 7.0 and above or change the screen  
resolution.

There is never any problem in this regard using Firefox or Opera.  
However, Firefox
or Opera does not permit the functionality of clicking the button and  
having the
Arabic transcription go right to your memory/clipboard.  Only IE
permits this)

To enter a new name or proper noun on on my web page that
is not in the database (i.e. not in the drop-down autocomplete
selection:

Start typing/inputting the name, proper noun or term in

English in the name field on the left side of the web page.

If you see it already in the database i.e. in the autocomplete
selection, select it via mouse.

If you have a term that is not in the database just type [Enter]
when you have your desired word spelled to your satisfaction.

===================================================

Joel Shapiro
Rochester, New York 14618
(585) 255-0997 (Cell - Call anytime - best to reach me)
(585) 473-7013 (Home - 9:30 to 22:00 EDT/EST)


jrs_14618 at yahoo.com

-or-

cshapiro at rochester.rr.com


--------------------------------------------------------------------------
End of Arabic-L:  10 Apr 2009



More information about the Arabic-l mailing list