Pre-Revolutionary Russian texts

Paul B. Gallagher paulbg at PBG-TRANSLATIONS.COM
Thu May 31 01:42:23 UTC 2007


[Redirecting my reply to the list as others may be interested:]

Benjamin Sher wrote:

> I have a Russian poem (Zhabotinsky's "Bednaia Sharlotta") in PDF format
> which uses pre-Revolutionary orthography. It was originally published in
> 1902 and republished in 1930, the text from which the copy I have was
> scanned by the secretary at the Zhabotinsky Institute of Tel Aviv. I
> have no problem reading the text but I would like to have it in HTML
> format for my web site. What do others do in this case? There is no way
> to convert the PDF file to Doc or HTML or Text. 

Sure, there are several options.

The easiest way is to post the PDF itself or convert the relevant part 
to JPEG. This is platform- and font-independent.

The slowest and clumsiest way is to retype it by hand, and then use 
standard techniques to convert to HTML. Assuming you're a very accurate 
typist and editor, the risk is that the visitor may not have the 
relevant font(s) installed.

A third way is to use an OCR program to convert to editable text (same 
risk as above). I happen to use ABBYY FineReader 
(<http://www.abbyy.com>), which is perfectly capable of reading and 
analyzing text in a variety of graphical formats, including (since 
version 7) PDF. I haven't tried it on pre-Revolutionary texts, but since 
the designers are Russian, it wouldn't surprise me a bit to find it can 
handle them (if you like, email it to me and I'll give it a quick 
run-through to see). If not, write them and ask; they may have a 
solution that doesn't come in the standard bundle.

And of course you could download the thing in modern orthography and 
edit it back. But of course you have to be very careful and have a good 
eye to prevent modern errors from creeping in...


As for the obsolete characters, they are still available in Unicode, 
which all modern software uses anyway. If you're on a Windows system, it 
has a nifty little applet called "Character Map" (probably under System 
Tools) that will show you a table of all the characters and also give 
you the Unicode numbers as well. Here are most of them:

Ѡѡ -- Omega -- U+0460, U+0461
Ѣѣ -- Yat' -- U+0462, U+0463
Ѥѥ -- Iotov E -- U+0464, U+0465
Ѧѧ -- Yus malyy -- U+0466, U+0467
Ѩѩ -- Iotov yus malyy -- U+0468, U+0469
Ѫѫ -- Yus bol'shoy -- U+046A, U+046B
Ѭѭ -- Iotov yus bol'shoy -- U+046C, U+046D
Ѯѯ -- Ksi -- U+046E, U+046F
Ѱѱ -- Psi -- U+0470, U+0471
Ѳѳ -- Fita -- U+0472, U+0473
Ѵѵ -- Izhitsa -- U+0474, U+0475
Ѷѷ -- Izhitsa w/double grave accent -- U+0476, U+0477
Ѹѹ -- Uk -- U+0478, U+0479
Ѻѻ -- Omega kruglaya -- U+047A, -- U+047B
Ѽѽ -- Omega/titlo -- U+047C, U+047D
Ѿѿ -- Ot -- U+047E, U+047F
Ҁҁ -- Koppa -- U+0480, U+0481
҂ -- thousands mark -- U+0482

For HTML entities, see:

<http://www.magister.msk.ru/html/html_2.htm> (basic table)

<http://nesusvet.narod.ru/ico/books/cyrillic/charcodes.htm> (advice for 
webmasters)

-- 
War doesn't determine who's right, just who's left.
--
Paul B. Gallagher
pbg translations, inc.
"Russian Translations That Read Like Originals"
http://pbg-translations.com

-------------------------------------------------------------------------
 Use your web browser to search the archives, control your subscription
  options, and more.  Visit and bookmark the SEELANGS Web Interface at:
                    http://seelangs.home.comcast.net/
-------------------------------------------------------------------------



More information about the SEELANG mailing list