Corpora: a particular type of sloppiness

Bruce Lambert lambertb at uic.edu
Thu Apr 19 22:06:34 UTC 2001


Isn't this question of diacritics, at some level at least, an empirical 
one? That is, how does frequency of diacritic use vary in formal (e.g., 
Spanish newspaper text) vs. informal (e.g., email) text? I know there are 
lots of subtleties that would need to be worked out to make any such 
comparison valid, but it would be interesting nonetheless to see how common 
or uncommon the use of diacritics is in various languages that use them.

I'm a pretty strong believer in context as a disambiguator, and human 
beings are amazingly talented at correctly going beyond the information 
given. So my hunch is that a great deal of text without diacritics can 
still be unambiguously understood by the majority of readers. In fact, if 
Spanish or Czech (or whatever language that uses diacritics) email messages 
are often sent without diacritics, then I take this as an existence proof 
that, to some extent, they are not needed for satisfactory comprehension.

-bruce


At 11:50 AM 4/19/01 -0700, Rene.Valdes at lhsl.com wrote:

>In support of Monika's argument, I'll offer the following two sentences:
>
>      Ya termino.        (I'm finishing soon.)
>      Ya terminó.        (It's already finished.)
>
>Without the diacritic, you would not be able to tell which one of these two
>meanings to assign to this sentence.  I use diacritics whenever possible,
>even at the risk of having my text become garbage when it travels through
>cyberspace.
>
>Another interesting case is the very important distinction between año and
>ano, two nouns with quite different meanings.
>
>René Valdés
>San Diego, California
>USA
>
>Monika Merino wrote:
>    As a native speaker of Spanish I can tell you that ALL Spanish speakers
>    would
>    face terrible comprehension problems without diacritics. In many cases,
>    diacritics in Spanish are used to "distinguish" homonyms. Take for
>    example
>    these two cases:
>    El niño *se* cayó (The boy feel down)
>    *Sé* que será difícil entenderlo (I know it's going to be difficult to
>    understand)
>    In the first case we're talking about the the reflective form of the
>    verb "to
>    be" whereas in the second case we're talking about the first person
>    singular
>    conjugation of the verb "to know". Perhaps in isolated sentences like
>    these two
>    and in the the "relaxed" and rather artificial situation of "reading
>    examples",
>    these diacritics might not seem crucial for comprehension. But I can't
>    imagine
>    what it would be like to have a 5,000-word Spanish text with no
>    diacritics!
>    It would take ages for native speakers of a language with diacritics to
>    get
>    used to one without them! And anyway, what's the problem with
>    diacritics?
>    Monica Merino



More information about the Corpora mailing list