Sorting Russian Word Lists in MS Word

Paul B. Gallagher paulbg at PBG-TRANSLATIONS.COM
Tue Mar 28 01:11:50 UTC 2006


Jim Tonn wrote:

> Tony,
>    The Timesse Russ font uses an encoding system known as CP1251 (or Windows
> 1251) to organize its Cyrillic letters. In this encoding system, Cyrillic
> takes the place of lesser-used characters such as accented Latin vowels
> (which you see when you switch back to a standard font). As far as Word is
> concerned, the text you are producing is made of these lesser-used
> characters and not of Cyrillic letters, and it attempts to sort based on its
> idea of how multilingual accented characters should be "alphabetized." In
> some cases Word's idea of sorting accented characters happens to coincide
> with the ordering of their Cyrillic counterparts, which is why you
> experience partially correct sorting with this font.
>    My web site, www.convertcyrillic.com, can convert CP1251 to Unicode,
> which should be correctly sorted by Word. Unfortunately, although accented
> Cyrillic characters can be represented in Unicode, Fingertip's way of
> representing them is not standardized, so they cannot be easily converted.
> However, if this particular list of terms does not use accented characters,
> you should be able to convert and sort with few problems.

On a closely related matter:

I just created a native Word file containing the following word list:
	рука́
	руки́
	ру́ки
	ру́ки прочь!
	руководи́тель
Word happily sorted it correctly, and also displayed the accent marks 
correctly over their respective vowels (though a bit higher than I would 
like, as if leaving room in case the letter were capitalized).

I created the accent marks by positioning the cursor after the vowel and 
adding the combining acute accent available via Insert | Symbol... at 
position U+0301.

Notes:

1) The file displayed correctly using Times New Roman, Arial, Arial 
Unicode MS, and Tahoma, but not with Courier New (accents separated from 
their vowels) or Verdana (accents displaced one letter to the right).

In Verdana, the desired effect can be achieved by positioning the cursor 
to the *left* of the vowel before adding the accent. However, this is 
deceptive, because as far as Word is concerned, a word like "рука́" 
actually has an accented "к," and that means you can search and find the 
"а" regardless of whether it's accented (which may be desirable), but 
you can't search for the "к" -- see note 2 below.

In Courier New (a fixed-width font), placing the cursor to the left of 
the vowel just gets you an accent to the left of the vowel. There seems 
to be no way to get the accent to appear directly over the vowel.

Switching to an older non-Unicode font such as Svoboda FWF allows you to 
search and select the accent mark independently of the vowel, but the 
downside is that the accent mark appears as a Ukrainian "ґ" (g with hook).

2) For searching purposes, Word treats accented vowels as distinct from 
unaccented ones ("у" is different from "у́") so for example if you search 
for "рук" it finds only three words in this list, and if you search for 
"ру́к" it finds only the other two. Interestingly, the accented vowel is 
not treated as a vowel+accent or accent+vowel sequence, so if you search 
for "ру" or "ук" or even "у" alone, Word finds none of the words 
containing the accented vowel.

3) Once the accented vowel is created, the accent mark cannot be 
independently searched or selected even if you switch to a defective 
font like Courier New. You can only search/replace the integrated unit. 
(But see the note above re Svoboda FWF.)

There's probably more to this, and I'm sure some other people on this 
list will be interested enough to fiddle with it.

-- 
War doesn't determine who's right, just who's left.
--
Paul B. Gallagher
pbg translations, inc.
"Russian Translations That Read Like Originals"
http://pbg-translations.com

-------------------------------------------------------------------------
 Use your web browser to search the archives, control your subscription
  options, and more.  Visit and bookmark the SEELANGS Web Interface at:
                    http://seelangs.home.comcast.net/
-------------------------------------------------------------------------



More information about the SEELANG mailing list