Adding accents to Cyrillic text

Ralph Cleminson ralph.cleminson at PORT.AC.UK
Thu Nov 7 13:54:34 UTC 2002


> David Chaika wrote:
>
> > I have a Russian-language text that I downloaded from the net. I
> > want to put an acute accent mark over the stressed letters for
> > teaching purposes. Or even mark them in bold, I'm not particular. I
> > use Windows 2000/Word 2000. The text is unicode, but I may want to
> > print it in one of my cool cp1251 fonts, which don't have accented
> > Cyrillic characters either.
> >
> > My question is - how do y'all handle this? It is tedious to say the
> > least to go through the whole thing word by word, but that is doable
> > (for a short story of 20 pages, which I am working on) if I can
> > convert the Russian letter with a keystroke instead of fumbling
> > around on the keyboard. Zaranee bol'shoe spasibo!
>
> AFAIK Unicode doesn't include accented Cyrillic, so you will have to
> compose them using equation fields. Not to worry -- these are not math
> equations.
>
Unicode does not include accented cyrillic, nor is it ever
likely to, since the current thinking at the Unicode
Consortium is not to encode as a character anything that
can be decomposed into other characters.  This would
include such examples as "cyrillic small letter a" plus
"combining acute accent" (i.e. UTF-1072  + UTF-0301).

To display accented characters in Unicode, therefore, one
simply adds the accent after the character, as in the
following example:

<?xml version="1.0"?>
<!DOCTYPE doc
        [<!ELEMENT doc (p+) >
         <!ELEMENT p (#PCDATA)>
]>

<doc>
<p>&#x041C;&#x043E;&#x0301;&#x0436;&#x043D;&#x043E;
&#x0441;&#x0442;&#x0430;&#x0301;&#x0432;&#x0438;&#x0442;&#x044C;
&#x0443;&#x0434;&#x0430;&#x0440;&#x0435;&#x0301;&#x043D;&#x0438;
&#x0435; &#x043D;&#x0430; &#x0432;&#x0441;&#x0435;
&#x0433;&#x043B;&#x0430;&#x0301;&#x0441;&#x043D;&#x044B;&#x0435;
.
</p><p>
&#x0430;&#x0301; &#x0435;&#x0301; &#x0438;&#x0301;
&#x043E;&#x0301; &#x0443;&#x0301; &#x044B;&#x0301;
&#x044D;&#x0301; &#x044E;&#x0301; &#x044F;&#x0301;</p>

</doc>


For those (if there is anyone) who don't yet have an XML
browser, here is the same thing in HTML:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-
8">
<head><title>Cyrillic Stress Marks</title>
<STYLE type="text/css">
p {font-family: “TITUS Cyberbit Basic”; }
</STYLE>
</head>
<body>

<p>&#x041C;&#x043E;&#x0301;&#x0436;&#x043D;&#x043E;
&#x0441;&#x0442;&#x0430;&#x0301;&#x0432;&#x0438;&#x0442;&#x044C;
&#x0443;&#x0434;&#x0430;&#x0440;&#x0435;&#x0301;&#x043D;&#x0438;
&#x0435; &#x043D;&#x0430; &#x0432;&#x0441;&#x0435;
&#x0433;&#x043B;&#x0430;&#x0301;&#x0441;&#x043D;&#x044B;&#x0435;
.
</p><p>
&#x0430;&#x0301; &#x0435;&#x0301; &#x0438;&#x0301;
&#x043E;&#x0301; &#x0443;&#x0301; &#x044B;&#x0301;
&#x044D;&#x0301; &#x044E;&#x0301; &#x044F;&#x0301;</p>
</body>
</html>

Note that in order to display these, you will have to have a
font installed on your system that includes the glyphs for
these characters.  The quality of display is also dependent
on the font; for example, in Times New Roman the acute
accent is not properly centred over the letter.

As for printing it in one of your cool CP1251 fonts, that is
probably not possible, since CP1251 is a closed system and
you can't add glyphs to it.  You can of course design your
own font, but if it has accented cyrillic characters in it, it
won't be CP1251.  If you are going down that road, then
you can put precomposed cyrillic accented glyphs into the
Private Use Area of Unicode: this will ensure that these
characters will be displayed the way you want them, but
also mean that no one else will be able to use your file
unless you also give them your font.

As to how you do this, it will have to be done manually,
since the characters that ought to be indicated as stressed
are by definition unmarked in the current Unicode text file,
and therefore cannot be recognised by the machine.  How
precisely you do it depends on your environment.  If you
have, or can make, a keyboard driver that links a particular
keystroke to UTF-0301, then all you have to do is position
the cursor after the vowel you want the accent on, hit that
key, and it will appear.  If you can't do that, then you need
to do something more complicated, as for example
suggested by Paul B. Gallagher, but I'm not sure to what
extent the result will be platform-independent.

R.M.Cleminson,
Professor of Slavonic Studies,
University of Portsmouth,
Park Building,
King Henry I Street,
Portsmouth PO1 2DZ
tel. +44 23 92 846143, fax: +44 23 92 846040

-------------------------------------------------------------------------
 Use your web browser to search the archives, control your subscription
  options, and more.  Visit and bookmark the SEELANGS Web Interface at:
                  http://home.attbi.com/~lists/seelangs/
-------------------------------------------------------------------------



More information about the SEELANG mailing list