Corpora: Diacritics

Fri Apr 20 13:44:40 UTC 2001

I don't know the UNIbet system, but I suspect that in practice it has been
rendered out of date by the SAM-PA system, which has sufficient "official"
backing to be accepted as an international standard.  The most accessible
reference on this which I know is an appendix in D. Gibbon et al., eds.,
_Handbook of Standards and Resources for Spoken Language Systems_,
Mouton de Gruyter.  SAM-PA is intended for "broad phonetic" (roughly,
phonemic) transcription; it consists of a mapping of the main IPA symbols
into the ASCII character set, together with sets of conventions for
using these elements for the sounds of the various official languages of
EU member states and a few other languages.  For narrow phonetic transcription
this is insufficiently precise, of course, but for those purposes the IPA
itself has defined a numerical coding of its entire up-to-date system of
notations (and if I remember rightly the Gibbon volume reprints this too).

G.R. Sampson, Professor of Natural Language Computing

School of Cognitive & Computing Sciences
University of Sussex
Falmer, Brighton BN1 9QH, GB

e-mail geoffs at cogs.susx.ac.uk
tel. +44 1273 678525
fax  +44 1273 671320
web http://www.grsampson.net