dorsey film conversion questions and estimate

Pat Warren warr0120 at umn.edu
Fri Jan 9 14:19:43 UTC 2004


QUESTIONS:

I'd like to guess at how long it would take to convert the film, and how
large the output would be.

Can anyone tell me roughly what dimensions the original slips of the Dorsey
files are?

Are the slips individually filmed or several at once? John said there were
actually 20,000 shots on the film for the larger Dhegiha dictionary
material, so I guess individually.

Plus if anyone (Mark?) knows the magnification that would help. Even
knowing whether you'd use a 1(9-16x), 2(13-27), or 3(23-50x) lens on a
reader would help.

It will also help to know how clear the images tend to be. Are they very
dark? It doesn't matter whether they're positive or negative, 'cause they'd
be converted to positive anyway. But tif files (the best archival format
with the best compression) only stores data about black pixels, so the
lighter an image generally is, and the less fuzz and scratches, the smaller
the image file. For instance, one of the other reasons (other than large
size of the physical source) the iapi oaye files are so big is because the
filmed images are sometimes pretty dark and almost all have very dense text
- lots of black pixels. Images or drawings will be generally very large
files, but it sounds like most of the slips are text.

PRELIMINARY ESTIMATE:

A few tentative guesses for the 20,000 Dorsey Dhegiha slip images:

If more towards dark with dense text per slip:

3x5 - around 200kb ea., 20,000x..2= 4GB, 6 CDs
5x7 - around 325kb ea., 20,000x..325= 6.5GB, 10 CDs
8.5x11 - around 500kb ea., 20,000x.5= 10GB, 15 CDs

If more towards light with sparse text per slip:

3x5 - around 30kb ea., 20,000x.03= 600MB, 1 CD
5x7 - around 50kb ea., 20,000x.05= 1GB, 2 CDs
8.5x11 - around 75kb ea., 20,000x.075= 1.5GB, 3 CDs

For comparison, let's pretend that Iapi Oaye had been 20,000 pages instead
of 3,100. At an average of about 540kb per image it would take up 11,000
GB, about the same as my worst case scenario for 8.5x11 above. Iapi Oaye is
a good worst case standard (though the Dorsey slips surely will be nowehere
near this size). Since the originals of Iapi Oaye were very large with
dense text, when it's squeezed to fill the image capture area of the reader
it has one of the highest ratios of text per inch you'll find. My guess is
that the Dorsey slips are not very dense text, especially since they're
handwritten, so I'd say that it's come in at under 5 CDs at the extreme,
and quite possibly it would be much smaller. Give me some answers to the
questions above and I'll let you know. Web pages to navigate the images
would be almost negligible.

As far as scanning time:

A standard canadian fiche has 14 rows of 14 images, and I can usually sit
still long enough at a time to scan about 7 rows of one of those. That's
about 2 hours (though I'll go check again soon). So about 100 scans in two
hours (that rate the limits of the equipment, the two hours is my
equipment's limit - no ergonomics in that library). 20,000 / 70 = 200 hours
/ 2 hours per session = 100 sessions / 4 sessions a week = 25 weeks / 4
weeks a month = 6 months. I would process the images as I scanned them
(though what a challenge for a descriptive bibliography!), so It could
definitely be done in less than a year without even pushing very hard.
Which I wouldn't want to do, 'cause there's lots of other work to do too.
So, I'd say it's all doable. Plus I could post the output online as I go so
people could watch the progress.

In case you're interested, with print sources I can do about 70 scans an
hour when I've got rhythym. If the source is small enough (trade paperback
or smaller) that I can fit two books on a 12.2x17.2 scanner, that's 280
pages an hour! But then I have to crop(manual), straighten(manual),
compress(macro) and convert(macro) everything to jpg. All told I can scan
and process about 3-4,000 pages a month without feeling too busy.

About readers, the Minolta MS6000, their cheaper microfilm scanner, has a
list price of around $5500. Canon MS300 = $5,900. Some other innovative but
iffy systems come down as far as around $3,000. With those prices, plus the
inevitable service plan and software, I really prefer the public access
systems at the U's library here. I think there's 6 or 7 scanning stations
at Wilson library. If we really want to get it done fast, the Donnegan
Systems: M525 Microfilm Scanner, 100 images per minute (I'm only slightly
slower, though check the math above), is listed at only $53,000. But it
can't do fiche! (The system for fiche is only $59,900!)

So, whoever wants to send me a check for an even $60,000, we can get to
work. Or we can do it for free (well, the cost of postage to mail the film
here).

Anyone interested?
Pat



More information about the Siouan mailing list