Electronic Russian Dictionary/Spell Checker wanted...

Yoshimasa Tsuji yamato at YT.CACHE.WASEDA.AC.JP
Thu Jan 13 23:27:18 UTC 2000


Dear Kenneth,
      What are the rules again for figuring out syllable breaks
      in Russian words?

Well, I would answer as follows:
   1. splits can occur only somewhere between vowels. Thus, KGB cannot be
      split not only it is an abbreviation, but also it consists less than
      two vowels though pronunciation wise it has three vowels.
   2. split into components, especially into words, after prefixes,
       and before suffixes (etymology principle).
         e.g. spec-odezhda, kom-intern, zhelezno-dorozhnyj, pod"-ezd
   2a. "y" cannot be split from its preceding consonant. (<pred"-idushchij>
      has become out of fashion already in Grot's days, but I have seen
      <pred-idushchij> in the late 1920's).
   3. hyphenation rules proper such as
       a. abbreviation in upper case only should not be split
       b. double identical consonants to be split between them
            Ros-sija (not Ro-ssija as pronunciation suggests)
       c. "j"(short i) is usually split after it if a vowel precedes it
              raj-onnyj
These three rules are the most basic rules, but there are some
conventional rules as well:
      4. "st", "sk" are advised not not be split
             lenin-skij, etc.
      5. two or more consecutive consonants to be split after the first
         although v-cccv, vc-ccv, vcc-cv are possible. (Please note
         vccc-v is not allowed except in accordance with the rule 2).
            dozh-dja

The hyphenation rules used to be very complicated up to 1930's when
there were pre-1918 intellectuals who preferred old style hyphenation
(as to Grot's manual, see http://www.yt.cache.waseda.ac.jp/razbivka.html),
but the 1918 rules have allowed everyone to do hyphenation just as they
like. School teachers wanted to teach rules that are not so complicated
as old generation compositors knew (they would have split Schwarzen-egger),
but not so simple as the 1918 decree. The result is the 1956 rules that
have become the standard to date.
   However, apart from the fact that most of the software cannot reliably
hyphenate Russian words, there are too many ambiguities in the rules,
especially the rule 2 which heavily depends upon the linguistic knowledge
of the user.

   If hyphenation rules exist to help readers to pronounce the words
correctly (pronunciation principle) even when split, "sam-izdat" is
certainly welcome as "m" there is "hard", but "drozh-zhi" isn't because
"zh" there will not be pronounced "soft" before you knew the subsequent
"zhi". In the case where a consonant is pronounced soft, it should not
be taken apart from the subsequent consonant (the "r" in <Perm'> is one
of the rare cases where "r" is pronounced soft, so <v Pe-rmi> will help
readers better).

  As far as I know, hyphenation dictionaries have ceased to be published
after WW2 and people have become much more tolerant towards breaking rules
which after all did not help readers but always reminded them of their
ignorance of Church Slavonic and European languages.

Cheers,
Tsuji

-------------------------------------------------------------------------
 Use your web browser to search the archives, control your subscription
  options, and more.  Visit and bookmark the SEELANGS Web Interface at:
                http://members.home.net/lists/seelangs/
-------------------------------------------------------------------------



More information about the SEELANG mailing list