Corpora: diacritic marks

Geoffrey Sampson geoffs at cogs.susx.ac.uk
Fri Apr 20 09:44:10 UTC 2001


I am quite surprised that speakers of various languages other than English
feel they need to defend their wish to use the normal diacritics of their
languages in e-mail.  I take it for granted that most languages which use
diacritics use them for good reasons, and that if I were a speaker of such
a language I would expect to be able to write in e-mail precisely as I write
on paper, not in some garbled approximation to normal writing.  The fact that
this is so awkward in practice with diacritics in e-mail seems to me another
symptom of English-language arrogance in modern life, like the fact that
tourists these days take for granted that people in other European countries
will speak English while commonly making little or no effort themselves
to speak other languages.

In the 19th century there were telegraph systems which coded letters on a
5 x 5 grid and therefore left out Q -- in one famous case a criminal
disguised as a Quaker got further than he should have because telegraphists
didn't understand the word "kwaker".  Within recent decades there
were systems which didn't include the semicolon among available punctuation
marks.  All right, in theory we can write English without Q's and
semicolons, but suppose modern technology was being produced by speakers
of a language which lacked these symbols and they told us to do without them;
native speakers of English would be outraged, rightly.  For Czech or
French to be written without accents seems to me much worse than writing
English without Q's or semicolons.  I think speakers of such languages should
not feel defensive but should complain loudly.  I remember meeting a Swede
who told me that when he and fellow Swedes working abroad exchange e-mail,
they use English because it seems easier than solving the diacritic problem.
This is an appalling indictment of current communication technology.


G.R. Sampson, Professor of Natural Language Computing

School of Cognitive & Computing Sciences
University of Sussex
Falmer, Brighton BN1 9QH, GB

e-mail geoffs at cogs.susx.ac.uk
tel. +44 1273 678525
fax  +44 1273 671320
web http://www.grsampson.net



More information about the Corpora mailing list