Arabic-L:LING:Needs Arabic/Farsi/Dari/Pashto help and refs
Dilworth Parkinson
dil at BYU.EDU
Fri Apr 10 16:57:46 UTC 2009
------------------------------------------------------------------------
Arabic-L: Fri 10 Apr 2009
Moderator: Dilworth Parkinson <dilworth_parkinson at byu.edu>
[To post messages to the list, send them to arabic-l at byu.edu]
[To unsubscribe, send message from same address you subscribed from to
listserv at byu.edu with first line reading:
unsubscribe arabic-l ]
-------------------------Directory------------------------------------
1) Subject:Needs Arabic/Farsi/Dari/Pashto help and refs
-------------------------Messages-----------------------------------
1)
Date: 10 Apr 2009
From:Joel Shapiro <jrs_14618 at yahoo.com>
Subject:Needs Arabic/Farsi/Dari/Pashto help and refs
Assalam u Alaikum / Senga Yai,
My name is Joel Shapiro. I consider myself a journeyman jack of all
trades
linguist (Arabic and Hebrew) and application (Python) and web page
programmer
(Javascript and PHP).
I have recently realized what I consider one of my life's ambitions of
creating a
functioning English to Arabic Transcription Veracity Verification
(MULTVV) application/
web page at the following URL:
http://enartrans.com/arabictranstest.php
I have been a long time subscriber of this Arabic-L Digest group for a
couple years; since
March 3, 2007 to be exact when my English to Arabic transcription
veracity verifier was
an extremely awkward, esoteric, barely functioning app to what my web
page is today.
I look forward to it being constant work in progress for the rest of
my life; improving and
tweaking it's features, ultimately GUI's, transcription processing,
incorporating new
languages etc.
The crux of my post here is to inquire of Arabic-L members of two
things:
1.) Do you know of and then would be willing to refer me to any
"already established"
electronic" English <----> Arabic/ Farsi/Dari/Urdu/Pashto lists where
the transcriptions are
appearing the given target language (i.e. not phonetically in
English). You would think
such lists abound ... Au Contraire! Here are two examples of already
established English to
Arabic transcription references I've found:
http://www.behindthename.com/nmc/ara.php
http://www.un.org/sc/committees/1267/consolidatedlist.htm#talibanind
(If such Arabic transcription lists I seek are hard to find ... Ha!
Try finding Pashto or Urdu!)
2.) Do you have references to any studies of work that identify from
attributes or
aspects of names, proper nouns or terms written English as to their
origin. As I will
explain a little more in depth shortly to realize the Arabic
transcriptions of English
terms that are Arabic in origin I have to implement some very
specialized and intricate
logic in what I've termed my "filters". The idea is to not implement
this logic
unnecessarily i.e. for each English term I process or disposition for
Arabic, as I've
found it can be counterproductive. On a very simple one indicator
I've come up with
on my own is if the English term begins with "AL" or "EL". I will
leave it at this unless any
of you request further clarification in this regard. With respect to
the un.org list I've
been confronted with some new challenges I will describe shortly.
Following is verbiage I posted to others before to give you sense of
context (as
I repeat a few things I've already mentioned up to this juncture.
A hearty Shukran Jiddan, Mam'noon, Me Herabani, Tashakkur for any
help from
any of you.
Joel S.
P.S. It is my ultimate goal to create the ability for the user to
input their term
in their own native or preferred language that is represented on the
Internet and
in one stroke be able to search the Internet for the vast majority web
pages that
contain the term again, in all languages represented on the Internet.
Obviously!
my grandiose objective is lifetimes of work but it is something I'm
interested
spending the rest of my life pursuing ... because it -IS- realistic.
I could not have
picked a more difficult language to start my transcription veracity
verification
work than Arabic albeit a truly beautiful language indeed!
==============================================================
Transcription is the formal terminology for spelling, especially in a
phonetic
respect from one language to another.
In a nutshell or in other words (no pun intended) what I do is take
the user's
(your) English input name, proper noun or term and first put it
through several
English to Arabic "transcription engines" where each outputs one
transcription
(variation) or one transcription engine that outputs a few possible or
probable valid alternative transcription variations and then run each
Arabic
variation respectively through a search engine.
From the URL count return or "number of hits" I categorize and quantify
which one perhaps of several transcription variations are the most
recognized or accepted by the world Arabic speaking community and to
what
extent (order of magnitude).
From the transcription categorization coupled with an inline regular
Arabic
word dictionary, a very powerful native language search engine utility
can
be realized. The greater the accuracy of the transcription
classification
and regular word dictionary; the greater the effectiveness of the
combination
being a multilingual search engine utility like no other.
While my language app/web page is an entity in its own right, it is also
inherently powerful adjunct to the relatively new Google Cross Language
Information Retrieval Language tool or "CLIR" where Google specifically
prompts the user for alternative transcriptions or regular word
translations
if the one it (Google) has come up with does not suit his/her needs.
Currently my page is currently "tooled" only for Arabic. My objective
is to
ultimately to the same for all languages represented by web pages on the
Internet.
Tapping off of my established "base" Arabic transcription programming
infrastructure I want to expand my application into Farsi/Dari, Urdu and
Pashto and I need some help re: very fine clarifications and
information for
a better transcription result for you. Shuran Jiddan/Tashakkur for any
help you can provide for any references or web pages containing already
established English to native Farsi/Dari, Urdu and/or Pashto. I would
welcome and appreciated more Arabic examples as well. The more examples
the better! This is the crux of my post here.
Without getting into too many specifics and technicalities at this
juncture,
my first inclination to to try to find existing transcription engines
for a
given target language filtering out transcriptions which blatantly
have no
phonetic correspondence to the English input term through a phonetic
transcription "post-processor filter" of my own (design) ... no sense in
reinventing the wheel.
Arabic more than other languages seem to have more advanced or developed
transcription engines by far compared to others. Thus, my plan is to
tweak
the output of the Arabic transcription engine(s) for phonetically
similar
languages such as Dar/Farsi, Urdu, Pashto etc., otherwise build my own
transcription engines from scratch which I foresee doing once I
address say
Hindi (way down the road).
For instance when I "Google map" Karachi (Pakistan) Google at the
following
URL, Google displays the Urdu representation which contains the the
Character "Che" which is not present in the Arabic character set.
http://www.nationsonline.org/oneworld/map/google_map_Karachi.htm
كراچى
Notice the contrast of corresponding the two Arabic transcription
variations
of Karachi from my web page:
كراتشي
كاراتشاي
(Hopefully the preceding Urdu and Arabic are coming to you in human
readable Arabic characters i.e. HTML entities and not being converted to
some kind of encoding representation. Perhaps some of you are familar
with some of the encoding issues I've encountered in your own Semitic
/Indo-European language work.)
I envision from the from the "basic Arabic infrastructure" to "map" or
"translate" to the Urdu Che using specialized logic (i.e. another
filter) from
the corresponding Arabic characters that would equate to it. Likewise I
trust you can see what I'm to trying to accomplish here as well.
===================================================
Here is the basic instructions:
Bring up;
http://enartrans.com/arabictranstest.php
(On some versions and/or settings of IE the result matrix may disappear.
If so try using only IE Version 7.0 and above or change the screen
resolution.
There is never any problem in this regard using Firefox or Opera.
However, Firefox
or Opera does not permit the functionality of clicking the button and
having the
Arabic transcription go right to your memory/clipboard. Only IE
permits this)
To enter a new name or proper noun on on my web page that
is not in the database (i.e. not in the drop-down autocomplete
selection:
Start typing/inputting the name, proper noun or term in
English in the name field on the left side of the web page.
If you see it already in the database i.e. in the autocomplete
selection, select it via mouse.
If you have a term that is not in the database just type [Enter]
when you have your desired word spelled to your satisfaction.
===================================================
Joel Shapiro
Rochester, New York 14618
(585) 255-0997 (Cell - Call anytime - best to reach me)
(585) 473-7013 (Home - 9:30 to 22:00 EDT/EST)
jrs_14618 at yahoo.com
-or-
cshapiro at rochester.rr.com
--------------------------------------------------------------------------
End of Arabic-L: 10 Apr 2009
More information about the Arabic-l
mailing list