Corpora: a particular type of sloppiness

Alexandr Rosen rosen at chomsky.ruk.cuni.cz
Wed Apr 11 13:40:32 UTC 2001


> From: "Tadeusz Piotrowski" <tadpiotr at plusnet.pl>

> But in fact I wanted to report on an interesting type of sloppiness in a
> language with diacritics. Polish has nine diacritics, or eighteen, when
> capital letters are counted separately. The point is that very few people
> bother about diacritics in e-mails, they use what is sometimes called pidgin
> Polish: only the Latin (or English) characters are used. (You have to press
> two keys at the same time when you want to use diacritics, you press one
> when you do not. Economy of language...).
> A very (VERY) careful writer will use diacritics, or you can tell somebody
> was writing offline seeing diacritics in his/her mail. In fact, we have a
> nice gradation: a proper letter with diacritics, a proper letter without
> diacritics, a casual letter, etc. This device tells you a lot about the
> speaker(?)/writer.
> I wonder what do the people do with other diacritic-rich languages? German?
> French? Czech? Is it the same as in Polish?

I have always thought that the absence of diacritics in most Czech e-mails is
due to the writer's awareness of the danger of character codes becoming garbage
on the way, rather than due to the writer being lazy. In fact, in most cases (11
out of 15) you only need one keystroke to produce an accented lower-case
character on the standard Czech keyboard. A decent keyboard mapping table (not
the default one in Czech MS Windows) with Caps Lock on also produces accented
upper-case characters with a single keystroke.

I believe it is very unfortunate that we still don't have a reliable way of
using a Latin-based (or any other) writing system on the Internet, sloppily or
not.

Regards

Alexandr Rosen

Institute of Theoretical and Computational Linguistics
Faculty of Philosophy, Charles University, Prague

address: UTKL FF UK, Celetna 13, CZ 110 00 Praha 1, Czech Republic
tel.: +420-2-24491858, e-mail: alexandr.rosen at ff.cuni.cz
http://utkl.ff.cuni.cz/~rosen/



More information about the Corpora mailing list