Sorting Russian Word Lists in MS Word
Paul B. Gallagher
paulbg at PBG-TRANSLATIONS.COM
Tue Mar 28 01:11:50 UTC 2006
Jim Tonn wrote:
> Tony,
> The Timesse Russ font uses an encoding system known as CP1251 (or Windows
> 1251) to organize its Cyrillic letters. In this encoding system, Cyrillic
> takes the place of lesser-used characters such as accented Latin vowels
> (which you see when you switch back to a standard font). As far as Word is
> concerned, the text you are producing is made of these lesser-used
> characters and not of Cyrillic letters, and it attempts to sort based on its
> idea of how multilingual accented characters should be "alphabetized." In
> some cases Word's idea of sorting accented characters happens to coincide
> with the ordering of their Cyrillic counterparts, which is why you
> experience partially correct sorting with this font.
> My web site, www.convertcyrillic.com, can convert CP1251 to Unicode,
> which should be correctly sorted by Word. Unfortunately, although accented
> Cyrillic characters can be represented in Unicode, Fingertip's way of
> representing them is not standardized, so they cannot be easily converted.
> However, if this particular list of terms does not use accented characters,
> you should be able to convert and sort with few problems.
On a closely related matter:
I just created a native Word file containing the following word list:
рука́
руки́
ру́ки
ру́ки прочь!
руководи́тель
Word happily sorted it correctly, and also displayed the accent marks
correctly over their respective vowels (though a bit higher than I would
like, as if leaving room in case the letter were capitalized).
I created the accent marks by positioning the cursor after the vowel and
adding the combining acute accent available via Insert | Symbol... at
position U+0301.
Notes:
1) The file displayed correctly using Times New Roman, Arial, Arial
Unicode MS, and Tahoma, but not with Courier New (accents separated from
their vowels) or Verdana (accents displaced one letter to the right).
In Verdana, the desired effect can be achieved by positioning the cursor
to the *left* of the vowel before adding the accent. However, this is
deceptive, because as far as Word is concerned, a word like "рука́"
actually has an accented "к," and that means you can search and find the
"а" regardless of whether it's accented (which may be desirable), but
you can't search for the "к" -- see note 2 below.
In Courier New (a fixed-width font), placing the cursor to the left of
the vowel just gets you an accent to the left of the vowel. There seems
to be no way to get the accent to appear directly over the vowel.
Switching to an older non-Unicode font such as Svoboda FWF allows you to
search and select the accent mark independently of the vowel, but the
downside is that the accent mark appears as a Ukrainian "ґ" (g with hook).
2) For searching purposes, Word treats accented vowels as distinct from
unaccented ones ("у" is different from "у́") so for example if you search
for "рук" it finds only three words in this list, and if you search for
"ру́к" it finds only the other two. Interestingly, the accented vowel is
not treated as a vowel+accent or accent+vowel sequence, so if you search
for "ру" or "ук" or even "у" alone, Word finds none of the words
containing the accented vowel.
3) Once the accented vowel is created, the accent mark cannot be
independently searched or selected even if you switch to a defective
font like Courier New. You can only search/replace the integrated unit.
(But see the note above re Svoboda FWF.)
There's probably more to this, and I'm sure some other people on this
list will be interested enough to fiddle with it.
--
War doesn't determine who's right, just who's left.
--
Paul B. Gallagher
pbg translations, inc.
"Russian Translations That Read Like Originals"
http://pbg-translations.com
-------------------------------------------------------------------------
Use your web browser to search the archives, control your subscription
options, and more. Visit and bookmark the SEELANGS Web Interface at:
http://seelangs.home.comcast.net/
-------------------------------------------------------------------------
More information about the SEELANG
mailing list