[Corpora-List] Fwd: Fw: Pashto Transliteration Scheme
Thierry Fontenelle
thierryf at microsoft.com
Fri Sep 19 23:05:14 UTC 2008
Dear Fatima,
You might want to experiment with the Transliteration Utility that is available here: http://www.microsoft.com/globaldev/tools/translit.mspx
It allows you to convert one natural language script to another (like Serbian Latin to Serbian Cyrillic or Latin characters to Inuktitut). The tool uses a simple but powerful rule language to create, edit, debug, and test your own natural language transliteration modules to convert one script to another. The nine modules it includes do not cover Pashto, but there is built-in documentation which explains how to create additional modules. See also this post for some more details:
http://blogs.msdn.com/correcteurorthographiqueoffice/archive/2006/02/07/transliteration-utility-freely-downloadable.aspx
I hope it helps,
Best wishes,
Thierry
Thierry Fontenelle
Microsoft Natural Language Group
thierryf at microsoft.com<mailto:thierryf at microsoft.com>
From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of fatima zuhra
Sent: Thursday, September 18, 2008 9:53 PM
To: wheston at sas.upenn.edu
Cc: Corpora at uib.no; sir abid
Subject: Re: [Corpora-List] Fwd: Fw: Pashto Transliteration Scheme
Dear Wilma,
Thanks for your helpful e-mail. I am actually looking for a transliteration scheme for Pashto, not a phonetic transcription scheme. It is because written Pashto does not include symbols for short vowels. I have been working on the development of an alphabet transducer (computer program) for Pashto that can map from romanized Pashto to Arabic-scripted Pashto and vice versa. I want to use a standard transliteration scheme in this programming. One of the corpora-list members has sent me a transliterations scheme for Pashto, which is available on the following URL:
http://www.loc.gov/catdir/cpso/roman.html
Using this scheme, I may have the following problem:
This scheme uses 'k' for the Pashto symbol 'ک' , 'h' for 'ه' and 'kh' for 'خ'. Now, if my alphabet transducer encounters a 'kh' in input, it will be confused either to map it to 'خ' or to 'که'?
For such a situation, I have raised the question that whether I can make the changes of my choice in an already existing transliteration scheme?
I'll be very thankful for any kind suggestions in this regard.
Thanks.
Fatima Zuhra
University of Peshawar, Pakistan
--- On Tue, 9/16/08, wheston at sas.upenn.edu <wheston at sas.upenn.edu> wrote:
From: wheston at sas.upenn.edu <wheston at sas.upenn.edu>
Subject: Fwd: Fw: [Corpora-List] Pashto Transliteration Scheme
To: fateeshah at yahoo.com
Cc: ebashir at yahoo.com
Date: Tuesday, September 16, 2008, 6:42 AM
This is in response to an email forwarded to me by Dr. Elena Bashir.
--- On Sat, 9/13/08, fatima zuhra <fateeshah at yahoo.com> wrote:
> From: fatima zuhra <fateeshah at yahoo.com>
> Subject: [Corpora-List] Pashto Transliteration Scheme
> To: Corpora at uib.no
> Date: Saturday, September 13, 2008, 12:57 AM
> Dear group members,
>
> I am a Pashto language researcher in the field of Natural
> Language Processing. I want to know some standard
> transliteration scheme for Pashto. In my research, I am
> currently using a transliteration scheme that is similar to
> that of Herbert Penzl (1955).
Are you looking for a transliteration scheme (one roman letter for each written
Pashto letter) or a transcription scheme (phonetic representations which
include short vowels)? Penzl (1955) gives transcriptions, not
transliterations.
> By similar I mean to say that
> I have done some changes in that scheme in order to easily
> input data using keyboard. I'll be thankful if someone
> kindly answer my these questions:
>
> Q.1 Is there some standard transliteration scheme available
> for Pashto that is used in computer applications for Pashto?
I don’t think so. For typing convenience, some scholars of Indo-Iranian use
capital letters for both retroflexion and long vowels. When Penzl was writing
in 1955, no one was doing computer work with languages; even in the 1970s (when
I did my dissertation using computerized texts for 4 Iranian languages), the
use
of alphabetic inputs was generally restricted to lower case ASCII. There are
many options now available when computerizing language materials.
> Q.2 If there are some symbols in a transliteration scheme,
> which are hard to input using a keyboard, then can I make
> changes in such a scheme (to make it easy for my
> application)?
I think that it’s a matter of your convenience, so long as you explain what
you’ve done. If your choices are very unconventional, it places a burden on
readers of your work. You might want to explore the Doulos Unicode fonts
(available on the SIL website) for phonetic transcriptions; they include a dot
beneath retroflex consonants (e.g., 1E6D for retroflex _t_) and the digraphs
_tz_ and _ts_ (Unicode 01F3 and 02a6, respectively).
Wilma Heston
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20080919/f0495ff8/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list