Corpora: a particular type of sloppiness

Marco Antonio Esteves da Rocha marcor at cce.ufsc.br
Wed Apr 11 05:08:14 UTC 2001


On Tue, 10 Apr 2001, Tadeusz Piotrowski wrote:

> These arguments are double-edged. There is sloppiness and there is
> sloppiness. My last name is structurally similar to that of Ken Litkowski,
> and yet he is (most likely) a native speaker of English, I am not. Will
> native speakers of English stand my sloppiness? Harold Somers' attitude
> shows some of them will not, if I used 'corpi', or 'corpus are', or 'corpora
> is'. Whatever. Well, actually, I think I might err rather by being
> hypercorrect than otherwise. No split infinitives for me... But even my
> non-native self looks with a patronizing (?), pitying (?), hurt(?) attitude
> at some of the mail here.... and there.  Should you (native speakers of
> English) /we (members of this list) struggle with the form, hoping the
> contents will be illuminating? Or -- the dustbin?
> 
> But in fact I wanted to report on an interesting type of sloppiness in a
> language with diacritics. Polish has nine diacritics, or eighteen, when
> capital letters are counted separately. The point is that very few people
> bother about diacritics in e-mails, they use what is sometimes called pidgin
> Polish: only the Latin (or English) characters are used. (You have to press
> two keys at the same time when you want to use diacritics, you press one
> when you do not. Economy of language...).
> A very (VERY) careful writer will use diacritics, or you can tell somebody
> was writing offline seeing diacritics in his/her mail. In fact, we have a
> nice gradation: a proper letter with diacritics, a proper letter without
> diacritics, a casual letter, etc. This device tells you a lot about the
> speaker(?)/writer.
> I wonder what do the people do with other diacritic-rich languages? German?
> French? Czech? Is it the same as in Polish?
> Regards
> Tadeusz Piotrowski
> ***************************************************************
>                                               mailing address
> Department of English
> Opole University                    Chrobrego 20
> Oleska 48                              PL-55-020 Zorawina (Zórawina)
> Opole
> POLAND
>               phone/fax (+48)71-3165847
>               mobile (+48)607159263
> 
>

Curious idea. The absence of diacritics in Portuguese is what disturbs me,
not their inclusion. It is difficult to be sure whether people on the
other end of the message have the equipment and configuration to actually
see those diacritics on screen in their e-mail editor. In fact, what
appears on different screens around the world when you produce diacritics
in your own equipment is quite unpredictable and may be unreadable for the
recipient. So people writing messages in Portuguese often choose not to
use them for safety. 

But it makes me feel very uncomfortable. It is not at all the feeling of
using pidgin Portuguese but of writing in a different language, especially
because some very common words - such as the preposition/conjunction "e"
("and") and the verb form "e'" (equivalent to "is") can only be
distinguished by the diacritic. This forces the writer to resort to
nonexistent spelling - such as using "eh" instead of "e'" - as these words
are very common, likely invariably to appear in virtually any message
longer than fifty words and crucial for understanding.

Imagine if you had to interpret a sentence in which there was no clear
graphic distinction between

Corpora and corpus is the same

AND

Corpora is corpus and the same

:) 

Marco Rocha



More information about the Corpora mailing list