[Corpora-List] Algorithm for orthography to IPA conversion in German

Amir Zeldes amir.zeldes at rz.hu-berlin.de
Wed May 11 09:03:41 UTC 2011


Hi Thomas,

Well, if you want to do it without any form of lexicon it's actually quite
difficult. I tried writing lexiconless transcription rules loosely based on
the Pompino-Marschall introduction suggested by Matias, and in many cases
the orthography doesn't give you all the information you need, especially
vowel quantity in certain positions, but also morpheme segmentation which
matters for [s] vs. [S] and other distinctions. I made a toy implementation
of the rules I could find in PHP here:

http://korpling.german.hu-berlin.de/~amir/phon.php 

The page also describes the steps of the derivation broadly and gives
examples of words that do and don't work and why. I mainly use it as a
didactic thing, but if you'd like to have the rules I'd be happy to give
them to you.

If you want something that really works reliably, I think you'll need a
lexicon with high coverage, ideally including compounds, something like
Krech et al. (2009), "Deutsches Aussprachewörterbuch". It also has extensive
information on orthography vs. standard transcription for Germany and
Austria. But for computational resources maybe someone at the Bayerisches
Archiv für Sprachsignale could help you, I think they might have some
software that does transcription for German (probably not lexiconless). Hope
you find what you're looking for,

Best,

Amir

------------------
Institut für deutsche Sprache und Linguistik
Humboldt-Universität zu Berlin
Unter den Linden 6
D-10099 Berlin
 
Tel: +49-(0)30-2093-9720
 
URL:
http://www.linguistik.hu-berlin.de/institut/professuren/korpuslinguistik/mit
arbeiter-innen/amir/

> -----Original Message-----
> From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
> Sebastian Nagel
> Sent: Tuesday, May 10, 2011 23:16
> To: corpora at uib.no
> Cc: Thomas Schmidt
> Subject: Re: [Corpora-List] Algorithm for orthography to IPA conversion in
> German
> 
> Hi Thomas,
> 
> try this:
>   http://tools.webmasterei.com/mbrolatester/
> and contact the webmaster who has an amusing blog
> with some really interesting NLP stuff.
> 
> Personally, a couple of years ago I found the link,
> got curious, installed the MBrola (the link suggested) text-to-speech
> system (there are packages for Linux). It has a "pipelined"
> architecture, so it was quite easy to set up a pipeline which
> does the conversion:
>  + the heart is txt2pho (http://www.sk.uni-
> bonn.de/forschung/phonetik/sprachsynthese/txt2pho/)
>  + SAMPA to IPA conversion is done via the Perl module CXS from
> http://www.theiling.de/ipa/
> 
> That's a minimalistic script (txt2pho and CXS must be installed) for
> conversion from the command-line:
> 
> #!/bin/bash
> 
> TXT2PHO=<path_to_txt2pho>
> 
> perl -lpe 'print "." if /^\s*$/; print ".\n";' \
>     | recode -f u8..l1 \
>     | $TXT2PHO/pipefilt/pipefilt \
>     | $TXT2PHO/preproc/preproc $TXT2PHO/preproc/Rules.lst
> $TXT2PHO/preproc/Hadifix.abk \
>     | $TXT2PHO/txt2pho -m -p $TXT2PHO/data/ \
>     | perl -pe 'chomp; s/\s.+//; s/^_$//; print "\n" if /^$/;' \
>     | perl -MCXS -lne '$ipa=cxs2ipa($_); print $ipa'
> 
> Test:
> % echo -e "Haus\nH?user\nChinaapfel\nPhonetik" | txt2ipa.sh
> 
> ha?s
> 
> h??z?
> 
> ?i?na?
> apfl
> 
> fo?ne?t?k
> 
> As you may see I struggled with the word segmentation.
> But the transcription is impressive (I guess but I'm
> not quite familiar with phonetics).
> 
> Bye,
> Sebastian Nagel
> (from Konstanz)
> 
> 
> On 05/09/2011 10:47 AM, Thomas Schmidt wrote:
> > Dear all,
> >
> > I am looking for an algorithm / a tool / a set of rules which can help
> > me to automatically derive an IPA transcription for an orthographic
> > word (i.e. no lexicon lookup). Can anybody help (I'll post a summary)?
> >
> > Thanks,
> >
> > Thomas
> >
> 
> 
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list