Arabic-L:GEN:Sakhr Text to Speech

Dilworth Parkinson Dilworth_Parkinson at byu.edu
Wed Oct 23 14:13:13 UTC 2002


----------------------------------------------------------------------
Arabic-L: Tue 22 Oct 2002
Moderator: Dilworth Parkinson <dilworth_parkinson at byu.edu>
[To post messages to the list, send them to arabic-l at byu.edu]
[To unsubscribe, send message to listserv at byu.edu with first line =20
reading:
            unsubscribe arabic-l                                      ]

-------------------------Directory-------------------------------------

1) Subject:Sakhr Text to Speech

-------------------------Messages--------------------------------------
1)
Date:  22 Oct  2002
From: GnhBos at aol.com
Subject:Sakhr Text to Speech

OVERVIEW

Text-To-Speech converts any electronic readable text into a human
sounding synthetic speech. Arabic language is written usually
without diacritics and hence vowelization is needed to properly
utter any input text.

The generation of the diacritized output is done by Sakhr
diacritizer that resolves the ambiguity of the words and hence
selects the proper pronunciation of the input undiacritized text.

THE ENGINE

Sakhr Text-To-Speech (TTS) engine is composed of three basic
parts. The Linguistic Module that converts the input text into a
phonetic transcription, the Phonetic Module that calculates
speech parameters, and the Acoustic Module that uses these
parameters to generate synthetic speech signals.

The Linguistic Module

This module is composed of four parts: Text Normalization,
Grapheme To Phoneme (G2P) conversion, Lexical Analysis and
Syntactic Analysis. Text Normalization handles language
dependent abbreviations, dates, currencies, time indications,
phone numbers and other special symbols. After Grapheme To
Phoneme conversion, the system resolves pronunciation
ambiguities, through lexical and syntactic analysis, and identifies
the proper prosodic phrases for each sentence. The output is a
phonetic representation of the input text.

The Phonetic Module

In order to create synthetic speech, Sakhr TTS engine is flexible
enough to use the proper speech segments such as diphones,
triphones, tetraphones or much more. These segments, which are
taken from human speech, preserve phoneme transitions as well
as CO-articulation effects. By concatenating the speech segments,
high quality synthetic speech is obtained. This is accomplished
through the production of good intonation contours and the
assignment of the correct duration to each phoneme.

The Acoustic Module

The Acoustic Processing Module converts the speech data that
was created previously, into speech signals. Sakhr's
concatenation of speech segments and the synthesis of prosody
are based on the latest synthesis techniques. The output is an
array of wave samples with sampling rates ranging from 8 to
44 kHz to cover a broad range of quality and applications from
telephony to CD audio quality.

KEY FEATURES

- The TTS engine is fully compatible with Microsoft
=A0 Speech APIs version SAPI 5.0.
- Converts any electronic readable undiacritized text into
=A0 natural sounding speech output with phonetic input support.
- Controls the speaker volume, speech rate, and speech pitch.
- Supports natural sounding speech output in male and
=A0 female voices.
- Integrates easily with speech applications using its
=A0 Software Development Kit SDK.
- Supports Arabic language as default and automatically
=A0 handles Latin characters as English text. The built-in
=A0 English TTS can be replaced by any SAPI compliant TTS
=A0 SDK for any language.

MARKET & APPLICATIONS

The merging of the telephony and the computer industries has
created major new business opportunities. Applications such as
accessing Internet Voice Portals, voice messaging and interactive
voice response (IVR) systems are rapidly becoming mandatory
tools for a successful business. By adding Text-To-Speech
capabilities to phone and voice processing applications,
businesses can provide automated access to information, reduce
the number of phone attendees, and increase efficiency and
customer satisfaction. Text-To-Speech can be utilized in
applications such as home banking, remote E-Mail and unified
messaging, data base driven inquiry systems, and solutions for
blind users to name a few.

Pentium III 500 Higher - 256 Mbytes Higher
Win 95, 98 enabled, 2000, or Windows NT
Sound Card or Telephony Card - SB full duplex or Dialogic Card

Best Regards,

George N. Hallak=A0=A0
AramediA Group=A0=A0=A0=A0=A0=A0=A0
T 617 825-3044 F 617 265-9648
http://www.arabicsoftware.net
http://www.aramedia.com

------------------------------------------------------------------------

--
End of Arabic-L:  22 Oct  2002



More information about the Arabic-l mailing list