accented fonts

David J Birnbaum djbpitt+ at pitt.edu
Thu Jul 29 19:43:27 UTC 1999


Dear SEELANGers,

A few (somewhat technical) thoughts about accented fonts:

1. Why Unicode Deliberately Excludes Accented Cyrillic Vowels.

Unicode is a character set (inventory of characters, or informational
unirts) and not a font (inventory of glyphs, or presentational units).
There is not necessarily a one-to-one mapping between characters and
glyphs. (For example, even if your font can render a single "fl" ligature
in English, we still think of this as two letters.)

Unicode deliberately does not encode accented vowels as independent
precomposed characters because it intends that these be represented as a
sequence of characters (plain vowel plus "floating diacritic"), which may
then be mapped to either one or two glyphs (depending on the font) for
rendering. Unicode includes accented vowels only for legacy purposes, that
is, only when they were part of character sets in reasonably wide use when
Unicode was developed. Unicode opted for the "floating diacritic" strategy
because it is economical, since it absolves Unicode from having to find
room in its inventory for every conceivable combination of base character
and diacritic. This strategy is sensible, but it carries a cost, part of
which is that font-rendering engines that can process floating diacritics
have to be smarter than those that assume a one-to-one mapping between
characters and glyphs.

2. What's a Slavist to Do: The Accented Font Approach.

There is no standard in wide use for encoding accented Cyrillic vowels,
which means that no matter what font you use (whether commercial or home
grown), you won't be able to share documents (or publish them on the web)
unless the people with whom you want to share them have a font that follows
the same layout. You also may have trouble getting your software to treat
accented and unaccented vowels identically for searching purposes. If you
need real accented vowels enough to put up with these inconveniences, you
can buy or build a font that will include accented Cyrillic vowels.

3. What's a Slavist to Do: The Non-Accented Font Approach.

I usually use bold or italic to mark the accented vowel in Cyrillic
documents. I find this subtler than accent marks, but it is culturally
incorrect, since when Russians do mark stress (such as when printing the
comparative adjective bol'shaja), they do use an acute accent.

4. The Long View.

If we live long enough, we'll see systems that store text as Unicode
internally (with floating diacritics) and that rely on rendering engines
that can assemble accented letters on the fly.

Cheers,

David
________

Professor David J. Birnbaum
Department of Slavic Languages and Literatures
1417 Cathedral of Learning
University of Pittsburgh
Pittsburgh, PA 15260 USA
Voice: 1 412 624 5712
Fax: 1 412 624 9714
Email: djb at clover.slavic.pitt.edu
URL: http://clover.slavic.pitt.edu/~djb/



More information about the SEELANG mailing list