[Corpora-List] Algorithm for orthography to IPA conversion in German
Sebastian Nagel
wastl.nagel at googlemail.com
Tue May 10 21:16:25 UTC 2011
Hi Thomas,
try this:
http://tools.webmasterei.com/mbrolatester/
and contact the webmaster who has an amusing blog
with some really interesting NLP stuff.
Personally, a couple of years ago I found the link,
got curious, installed the MBrola (the link suggested) text-to-speech
system (there are packages for Linux). It has a "pipelined"
architecture, so it was quite easy to set up a pipeline which
does the conversion:
+ the heart is txt2pho (http://www.sk.uni-bonn.de/forschung/phonetik/sprachsynthese/txt2pho/)
+ SAMPA to IPA conversion is done via the Perl module CXS from http://www.theiling.de/ipa/
That's a minimalistic script (txt2pho and CXS must be installed) for conversion from the command-line:
#!/bin/bash
TXT2PHO=<path_to_txt2pho>
perl -lpe 'print "." if /^\s*$/; print ".\n";' \
| recode -f u8..l1 \
| $TXT2PHO/pipefilt/pipefilt \
| $TXT2PHO/preproc/preproc $TXT2PHO/preproc/Rules.lst $TXT2PHO/preproc/Hadifix.abk \
| $TXT2PHO/txt2pho -m -p $TXT2PHO/data/ \
| perl -pe 'chomp; s/\s.+//; s/^_$//; print "\n" if /^$/;' \
| perl -MCXS -lne '$ipa=cxs2ipa($_); print $ipa'
Test:
% echo -e "Haus\nHäuser\nChinaapfel\nPhonetik" | txt2ipa.sh
haʊs
hɔʏzɐ
çiːnaː
apfl
foːneːtɪk
As you may see I struggled with the word segmentation.
But the transcription is impressive (I guess but I'm
not quite familiar with phonetics).
Bye,
Sebastian Nagel
(from Konstanz)
On 05/09/2011 10:47 AM, Thomas Schmidt wrote:
> Dear all,
>
> I am looking for an algorithm / a tool / a set of rules which can help
> me to automatically derive an IPA transcription for an orthographic
> word (i.e. no lexicon lookup). Can anybody help (I'll post a summary)?
>
> Thanks,
>
> Thomas
>
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list