FYI re input device

Yoshimasa Tsuji yamato at YT.CACHE.WASEDA.AC.JP
Mon Oct 9 14:21:57 UTC 2000


Hello,
  I would like you to share my recent tips about inputting Russian
text.
  By far the most efficient method of reading Russian text is reading
them by a flatbed scanner and let some OCR convert the image to text.
I have often talked about that already.
  Now a recent method. I have been using a stick scanner (a kind of
hand held scanner that connects to a PC card without its own power
source) with my notebook computer for three years in archives and
libraries in Petersburg and Moscow. They have been very useful in that
I usually did not ask for special permission for the scanner which is
like a bid pencil. All I needed was a permission to bring in my computer
to the reading room, which is usually allowed.
  The problem was twofold. One was that ordinary OCR did not understand
pre-1918 spelling and made loads of mistakes. The other was the poor
scanning capability (mine is Fujitsu's Rapid Scan RS-20), which is
unavoidable for all handheld scanners -- you won't find  missing
lines till the last proofreading. And there were cases when part of the
text was right in the centre of a thick book and even the smallest scanner
could not get into there.

  After much hesitation I have recently purchased a digital camera
(Ricoh's RDC-7) and have found it very satisfying: a 7 point print in
A4 format can be captured at about 300dpi in a second or two without
flash and no mistakes whatever. Besides, the black/white threshold is
much better determined than previous scanners. The captured image
can be processed after you return from the libraries. OCR's comfortably
understand images created by digital cameras.

  The next problem is whether I can smuggle it to the reading room or
can get a permission to use it. When I managed to use it at Russian
institutions, I will let you know.

  As to the pre-1918 spelling. I compiled a dictionary from my
own archive of Russian text (some 5 megabytes in pre-1918 spelling,
whose correctness is vigorously confirmed), and fed it to abbyy's
Finereader. And I now get 90% or greater recognition from a poorly
printed Russian newspaper.

  Incidentally, digital camera may be useful for archival materials
for which only microfilms are provided. You can capture the text
from the screen!

  If you are interested in technical details, write to me off-list,
please.

Cheers,
Tsuji

------
P.S.
The problem will remain for poor students who may not afford to
have notebook computers and digital camers. They are  not cheap.

-------------------------------------------------------------------------
 Use your web browser to search the archives, control your subscription
  options, and more.  Visit and bookmark the SEELANGS Web Interface at:
                http://members.home.net/lists/seelangs/
-------------------------------------------------------------------------



More information about the SEELANG mailing list