Corpora: diacritic marks

Steven Bird sb at unagi.cis.upenn.edu
Fri Apr 20 11:25:49 UTC 2001


Geoffrey Sampson wrote:
> I take it for granted that most languages which use
> diacritics use them for good reasons, ...

Just as an aside, note that this is not always true.  In orthographies for
certain languages, the diacritics in the official orthography mainly
benefit the expatriate linguist who designed the orthography, or serve a
sociopolitical function to distinguish indigenous from colonial writing
[1].  It is hard to defend this practice when it leads to "diacritic
overload" and degrades reading fluency, as I demonstrated for a language of
Cameroon [2].  Thus, for at least this language, being forced to send an
email without diacritics would present *no* problem (and diacritics are
regularly left off handwritten personal correspondence).

Of course, once Unicode is supported and suitable fonts and keyboard
mappings are available, sending email in any diacritic-laden script will be
straightforward.

Steven Bird

[1] Orthography and identity in Cameroon
    Written Language and Literacy 4(2) (in press)
    http://www.ldc.upenn.edu/sb/home/publications.html#identity
[2] When marking tone reduces fluency: an orthography experiment in Cameroon
    Language and Speech 42, 83-115, 1999
    http://www.ldc.upenn.edu/sb/home/publications.html#lgsp42



More information about the Corpora mailing list