Arabic-L:LING:Needs verified English to Arabic transcription database

Dilworth Parkinson dilworth_parkinson at BYU.EDU
Sun Mar 4 00:44:53 UTC 2007


------------------------------------------------------------------------
Arabic-L: Sat 07 Mar 2007
Moderator: Dilworth Parkinson <dilworth_parkinson at byu.edu>
[To post messages to the list, send them to arabic-l at byu.edu]
[To unsubscribe, send message from same address you subscribed from to
listserv at byu.edu with first line reading:
            unsubscribe arabic-l                                      ]

-------------------------Directory------------------------------------

1) Subject:Needs verified English to Arabic transcription database

-------------------------Messages-----------------------------------
1)
Date: 07 Mar 2007
From:Joel Shapiro <jrs_14618 at YAHOO.COM>
Subject:Needs verified English to Arabic transcription database

Hello All,

I am an automated robot script / Python computer
programmer with a long time interest and
fascination for foreign languages and cultures;
especially those of the Middle East.

I have programmed the "foundation" of what I term
a multilingual search engine utility (MULSEU) for
searching web pages in a given "target" language's
native text or character set (encoding) that the
user need not have -any- familiarity with the
target language to realize -very- effective
searches in the native text of the target
language.

In a nutshell what my application does is take
an English word list or combination of words in
English and the target language and derive the
best dictionary translations and transcriptions
as the case may be per the current word being
dispositioned.

My current focus is on Arabic, the most developed
and studied in the Middle East.  When I refer to
"target language" henceforth in this document, for
all intents and purposes Arabic is implied.

The efficacy of my MULSEU is directly proportional
to the level of sophistication and development of
the English - target language dictionary and
"verified" English to target language transcription
database.

What I mean by "verified"  English to target
language transcription database is each entry
in the transcription database; generally proper
names/nouns, geographic places, company names etc.
has been "run" through a popular search engine
as a search term and the number of URLs returned
is an indication of the relative "veracity" of the
given transcription.

In other words (no pun intended) the number of
URLs, if any, indicate the veracity of the given
transcription in its own right and importantly
with respect to other valid transcriptions that
have subtle differences between them but these
subtle differences have generally a more profound
effect than comparable subtle differences in
English.

While there will be readily recognizable and
distinct search results for English Mohammad,
Muhammad, Mohamad etc. taking each respectively
as an individual search engine search term,
there will be a lot of overlap as well especially
for the most or second most common word in the
Arabic and Islamic world that is "Mohammad" (one
of several Arabic to English transcription variant
renditions.

As of course Mohammad is Arabic in origin it
has really only one generally recognized or
accepted spelling.  The variation comes on the
English side with several valid transcription
alternatives where I just named a few.

The converse is generally the case for the
names and terms English in origin.

For instance the name "Clinton" has really just
one generally recognized spelling but has no
less than three valid, popular and accepted
transcription variants in Arabic.

The very subtle Arabic transcription differences
with respect to their use as search engine
search terms in general are much more profound
than comparable English variations.

To realize a verified database for a given
target language I have programmed an "offshoot"
from my MULSEU infrastructure that pipes each
English term through an English to target
language "transcription engine" or
"transcriptor" and then through a search engine
or metasearch engine such as of course Google,
AltaVista, Yahoo just to name a few to get
what I term the "empirical value" of the
transcription.

Every language and language dialect would need
to have its own transcriptor and the more
transcriptors the better as there are subtle
differences between the transcriptors
themselves.  For instance one Arabic transcriptor
may have a transcription starting with just a
Aleph and another Aleph Hamza for essentially the
same transcription in terms of "primary" letters
or characters.

If the term is already in the target language
the transcriptor can be bypassed and immediately
directed to the search engine to realize its
empirical value.

In this respect or context in the preceding
example, Arabic to English "transcription pairs"
already in place from the field would be a
tremendous cumulative boon not having to take
initiate an English to Arabic transcription
process.

What I'm doing here is inquiring if anyone here
in this group has a database of "already
transcripted" pairs of any language of the Middle
East; again especially Arabic.

For your efforts I will return your contribution
with valuable information of the relative value of
each entry as a search engine term or as the
databases become more established for every new
valid transcription pair not yet present in the
database I would return 10:1 pairs in the same
genre of terms in the same language or one term
from ten different languages.

I have not yet worked out the details in this
respect.  Currently because the verified
databases are so undeveloped, all I have in
place is a database of a thousand or so proper
English - Arabic transcription name pairs and a
few thousand English - Arabic transcription
geographic place names that have yet to be put
through a search engine I will return all
submitted.

One great English Arabic transcription source
would be an English - Arabic phone directory on
CD say from a modern Arabic place such as Dubai.
In all my Arabic travels on the Internet I've
never seen any reference to this.

I vigorously contend the establishment of a
verified English to Arabic transcription
database could be a valuable tool in its own
right for Arabic interpreters and translators
who while deriving valid Arabic transcriptions
on their own may miss other equally valid
transcriptions not from any ignorance or lack
of skill on their part, but rather it is an
inherent circumstance or phenomena.

 From all my inquiries to Arabists before and
you dear reader, who has a knowledge and
command of Arabic far greater than my own and
what I ever will achieve in my lifetime will
concur: [T]here is no English to Arabic rules
or methodology that will return the one most
popular, recognized and accepted Arabic
transcription.

Thank you for your interest and consideration.
I sincerely thank anyone for their English
- Middle Eastern language transcription pair
contribution or reference.

I welcome corresponding with anyone who is
so inclined as I can, have time and anyone
needing further clarification or FAQ please
don't hesitate to contact me:

Joel Shapiro
(585) 255-0997 (Cell)
jrs_14618 at yahoo.com

------------------------------------------------------------------------ 
--
End of Arabic-L:  07 Mar 2007



More information about the Arabic-l mailing list