Corpora: Just dotting an i

Rich Foley Richard.Foley at urova.fi
Thu Apr 19 09:01:35 UTC 2001


I was pleasantly surprised this morning to see the discussion on
diacritics revived by Ramesh Krishnamurthy.

* First off, I will admit to a romantic weakness for ornate alphabets
such as Georgian and visually elegant syllabaries such as Thai. By
extension, owing to or despite strict Jesuit training in correct Greek
accentuation, I  also have a general sympathy for the numerous flyspecks
computers - and  some linguists - seem intent on eradicating from
various orthographies.

* I once wrote a little program on a computer course that would replace
the double consonants and vowels (sign of length) in a Finnish text with
the corresponding single character and an acute accent (e.g., kaataa
'pour' -> kátá) á la Hungarian vowel orthography. If nothing else,
comparisons of input and output texts revealed that such a reform would
cut paper consumption by 10-15%.

*  With EU enlargement to embrace the Czech Republic, Estonia, Hungary,
Poland and Slovenia, computers in general and email programs in
particular had better quit while they are behind and learn to deal with
diacritics. It is interesting to note that German legal texts in the EU
(EUR-LEX database) use the digraphs ae and oe instead of a- and
o-umlaut. (I don't know if this was a political issue in its day, but
none of the other official languages seems to have compromised on
diacritics.) Eurosport is the only other forum where I have seen this
practice at work, with Finnish surnames like Hämäläinen or Määttä
rendered Haemaelaeinen, Maeaettae.

* The advantage of trisyllabic roots notwithstanding, wouldn't it be
interesting if the linguistic (imperialist) tables were turned and
speakers of Hebrew and Arabic began wondering out loud why we clutter
English texts with all these vowels?


Rich Foley
University of Lapland



More information about the Corpora mailing list