Corpora: a particular type of sloppiness

Alexandr Rosen Alexandr.Rosen at ff.cuni.cz
Tue Apr 17 12:15:28 UTC 2001


> 	I am not a native speaker of Spanish, and have argued in
> published articles for the general elimination of accents and diacritics
> from Spanish (and would be brash enough to make the same argument for
> almost *any* language with diacritics, including Polish, Portuguese and
> Czech).  My reasons are low functional load for the diacritics in
> general (messages I receive in Spanish without diacritics are close to
> 100% legible, and very close indeed to the legibility of msgs with
> diacritics; I'd bet the same is true for Czech, and I know it is for
> Polish-- the ѓ [if that got butchered up, it's an 'o' with an acute
> accent over it], for example, is almost 100% predictable), also the
> general dropping of diacritics in handwriting, etc.  

I don't know about Spanish, but at least for Czech, I disagree. Writing Czech 
text without diacritics is just another way of butchering it up, although 
admittedly not that bad as if you let the servers do it. 

The Czechs always use diacritics, except when technology does not know better: 
telegraph, SMS, e-mail. Then the writer must pay special attention to prevent 
misunderstanding. And proper names often would not make sense unless 
transliterated.

I think that by now we should have gotten past the stage where information 
technology forces us into something like that. 

Regards,

Alexandr [Rosen]



More information about the Corpora mailing list