Job: Stage, Phonetisation d'enonces ecrits pour la synthese de parole, IRISA

Thierry Hamon thierry.hamon at UNIV-PARIS13.FR
Sat Dec 22 14:00:02 UTC 2012


Date: Wed, 19 Dec 2012 18:04:39 +0100
From: Gwénolé Lecorvé <gwenole.lecorve at irisa.fr>
Message-ID: <50D1F3A7.7020509 at irisa.fr>


[English version below]

Bonjour à tous,

L'équipe Cordial de l'IRISA (Lannion) propose un sujet de stage sur la
phonétisation d'énoncés écrits pour la synthèse de parole (conversion de
graphèmes en phonèmes) et, plus précisément, sur l'adaptation d'un
système de phonétisation à des spécificités de prononciation telles que
peuvent par exemple produire des accents régionaux ou étranger, voire
certains états émotionnels particuliers. Le sujet est détaillé plus
amplement en toute fin de cet email.

Ce sujet s'adresse à des étudiants de master (ou équivalent bac+5) pour
durée de 5 mois environ. Le stage aurait lieu sur le site lannionnais de
l'IRISA.

Merci de le diffuser auprès des étudiants qui pourraient être
intéressés.

Cordialement,
Gwénolé Lecorvé.

--

Dear colleagues,

The group Cordial at IRISA (Lannion) is proposing an internship about
written text phonetization (grapheme-to-phoneme conversion) for speech
synthesis. More precisely, this internship would focus on the adaptation
of a phonetization tool in order to match the specifics of various
regional or foreign accents within a same language (French) or of some
particular emotional states. The proposal is detailed at the bottom of
this email.

This intern is addressed to master students (or equivalent) and would 
last about 5 months.

Thanks for forwarding this emails to students who may be interested.

Best regards,
Gwénolé Lecorvé.

--

*Title: Grapheme-to-phoneme conversion adaptation using conditional 
random fields*

*Description:*
Grapheme-to-phoneme conversion consists in generating possible
pronunciations for an isolated word or for a sequence of words. More
formally, this conversion is a transliteration of a sequence of
graphemes, i.e., letters, into a sequence of phonemes, symbolic units to
represent elementary sounds of a language. Grapheme-to-phoneme
converters are used in speech processing

- either to help automatic speech recognition systems to decode words
  from a speech signal

- or as a mean to explain speech synthesizers how a written input should
  be acoustically produced.

A problem with such tools is that they are trained on large and varied
amounts of aligned sequences of graphemes and phonemes, leading to
generic manners of pronouncing words in a given language. As a
consequence, they are not adequate as soon as one wants to recognize or
synthesize specific voices, for instance, accentuated speech, stressed
speech, dictating voices versus chatting voices, etc. [1].

While multiple methods have been proposed for grapheme-to-phoneme
conversion [2, 3], the primary goal of this internship is to propose a
method to adapt grapheme-to-phoneme models which can easily be adapted
under conditions specified by the user. More precisely, the use of
conditional random fields (CRF) will be studied to model the generic
French pronunciation and variants of it [4]. CRFs are state-of-the-art
statistical tools widely used for labelling problems in natural language
processing [5]. A further important goal is to be able to automatically
characterize pronunciation distinctive features of a given specific
voice as compared to a generic voice. This means highlighting and
generalizing differences that can be observed between two sequences of
phonemes derived from a same sequence of graphemes.

Results of this internship would be integrated into the speech synthesis
platform of the team in order to easily and automatically simulate and
imitate specific voices.

*Technical skills:* C/C++ and a scripting language (e.g., Perl or
 Python)

*Keywords:* Natural language processing, speech processing, machine
 learning, statistical learning

*Contact:* Gwénolé Lecorvé (gwenole.lecorve at irisa.fr)

*References:*
[1] B. Hutchinson and J. Droppo. Learning non-parametric models of
    pronunciation. In Proceedings of ICASSP, 2011.
[2] M. Bisani and H. Ney. Joint-sequence models for grapheme-to-phoneme 
    conversion. In Speech Communication, 2008.
[3] S. Hahn, P. Lehnen, and Ney H. Powerful extensions to crfs for
    grapheme to phoneme conversion. In Proceedings of ICASSP, 2011.
[4] Irina Illina, Dominique Fohr, and Denis Jouvet. Multiple
    pronunciation generation using grapheme-to-phoneme conversion based
    on conditional random fields. In Proceedings of SPECOM, 2011.
[5] John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira.
    Conditional random fields: probabilistic models for segmenting and
    labeling sequence data. In Proceedings of ICML, 2001.

-------------------------------------------------------------------------
Message diffuse par la liste Langage Naturel <LN at cines.fr>
Informations, abonnement : http://www.atala.org/article.php3?id_article=48
English version       : 
Archives                 : http://listserv.linguistlist.org/archives/ln.html
                                http://liste.cines.fr/info/ln

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  : http://www.atala.org/
-------------------------------------------------------------------------



More information about the Ln mailing list