Corpora: a particular type of sloppiness
Rene.Valdes at lhsl.com
Rene.Valdes at lhsl.com
Fri Apr 20 00:20:23 UTC 2001
Why should we define informal text as email? Email is text written with a
keyword that
often lacks the keys needed for convenient input of diacritics. Email is
text written with
the purpose of sending your message across multiple systems that are very
likely
to convert your characters with diacritics into garbage. Isn't a quick
handwritten note
informal text? A Spanish speaker would never leave the diacritics out of
such a note.
If using a keyboard (like the ones in old typewriters) with which it is
easy to enter the
diacritics, no Spanish speaker would ever consider leaving them out. I
assume this is
also the case for German, Polish, Czech, Hungarian, Portuguese, and any
other
language using such marks.
The problem is one of hegemony of a keyboard and a system designed with
English
in mind and the desire by some English speakers to impose on the users of
other
languages a need for disambiguation that is alien to them and would not
even be in
question if we were just to implement the appropriate means to input
diacritics and
transmit them across the internet.
-René (with an accent, otherwise it would be pronounced differently and
have indeed
a different meaning with no chance for disambiguation)
Bruce Lambert <lambertb at uic.edu> on 04/19/2001 03:06:34 PM
To: Rene.Valdes at lhsl.com, corpora at hd.uib.no
cc:
Fax to:
Subject: Re: Corpora: a particular type of sloppiness
Isn't this question of diacritics, at some level at least, an empirical
one? That is, how does frequency of diacritic use vary in formal (e.g.,
Spanish newspaper text) vs. informal (e.g., email) text? I know there are
lots of subtleties that would need to be worked out to make any such
comparison valid, but it would be interesting nonetheless to see how common
or uncommon the use of diacritics is in various languages that use them.
I'm a pretty strong believer in context as a disambiguator, and human
beings are amazingly talented at correctly going beyond the information
given. So my hunch is that a great deal of text without diacritics can
still be unambiguously understood by the majority of readers. In fact, if
Spanish or Czech (or whatever language that uses diacritics) email messages
are often sent without diacritics, then I take this as an existence proof
that, to some extent, they are not needed for satisfactory comprehension.
-bruce
At 11:50 AM 4/19/01 -0700, Rene.Valdes at lhsl.com wrote:
>In support of Monika's argument, I'll offer the following two sentences:
>
> Ya termino. (I'm finishing soon.)
> Ya terminó. (It's already finished.)
>
>Without the diacritic, you would not be able to tell which one of these
two
>meanings to assign to this sentence. I use diacritics whenever possible,
>even at the risk of having my text become garbage when it travels through
>cyberspace.
>
>Another interesting case is the very important distinction between año and
>ano, two nouns with quite different meanings.
>
>René Valdés
>San Diego, California
>USA
>
>Monika Merino wrote:
> As a native speaker of Spanish I can tell you that ALL Spanish
speakers
> would
> face terrible comprehension problems without diacritics. In many
cases,
> diacritics in Spanish are used to "distinguish" homonyms. Take for
> example
> these two cases:
> El niño *se* cayó (The boy feel down)
> *Sé* que será difícil entenderlo (I know it's going to be difficult to
> understand)
> In the first case we're talking about the the reflective form of the
> verb "to
> be" whereas in the second case we're talking about the first person
> singular
> conjugation of the verb "to know". Perhaps in isolated sentences like
> these two
> and in the the "relaxed" and rather artificial situation of "reading
> examples",
> these diacritics might not seem crucial for comprehension. But I can't
> imagine
> what it would be like to have a 5,000-word Spanish text with no
> diacritics!
> It would take ages for native speakers of a language with diacritics
to
> get
> used to one without them! And anyway, what's the problem with
> diacritics?
> Monica Merino
More information about the Corpora
mailing list