mircofilm digitization

Pat Warren warr0120 at umn.edu
Mon Jan 5 21:20:34 UTC 2004


Yes, it's all possible. I've spent the last two years working on digitizing
dakota and ojibwe texts from both print and microfilm. I just completed
converting Iapi Oaye from microfilm to a web-navigable format. The images
are archived as 500 dpi (actually better resolution than necessary for
microfilm, but necessary for ocr of printed materials) tiff but converted
to jpg for web page display. If anyone wants to see the Iapi Oaye cds let
me know. Out of the 70 years it was published I'm missing less than ten
pages (about 3100 images total. I'm hoping to distribute them more openly
this spring when I get better at working with xslt processeors and can make
the web pages work in more browser versions. As of right now, all the data
and web pages are in xml so at this point it only works in internet
explorer 6.0 on a pc. It might work on IE for mac too but I haven't
checked.

The University of Minnesota Wilson Library has all their microfilm print
stations hooked up to computers now with capturing software that can send
what you see on the reader to a printer or to a file. The 35mm film
scanners and slide scanners don't work with microfilm. You have to have a
reader with a paralell port output and software for requesting the image.
The equipment to do all this is still too pricy for personal purchase in my
opinion, so I'm happy to use the public equipment. My focus has been
setting up standards and methods that anyone can replicate if they have the
equipment. I work with great, trainable OCR software (Abbyy Finereader
7.0). I did lots of testing to find out what resolution you need to get the
best results (500dpi), the best archiving format (tiff for black and white
documents, 300 dpi jpg for greyscale or color).

If you're interested in jumping into a digitizing project, let me know.
This is what I'm committing much of my time to now. Don't waste time with
grants and don't spend money on overpriced digitizing services. The quality
of most of the digitized material I've seen so far, like those from the LOC
and National Library of Canada, are actually really poor quality and
consistency and their interfaces are pretty unimpressive and confusing. I'm
interested in making all these materials available to anyone as low cost as
possible.

I posted a few of the images from Iapi Oaye so you can see the output.
Here's the URIs:

www.tc.umn.edu/~warr0120/images/1871_05_01.jpg
www.tc.umn.edu/~warr0120/images/1871_05_02.jpg
www.tc.umn.edu/~warr0120/images/1871_05_03.jpg
www.tc.umn.edu/~warr0120/images/1871_05_04.jpg

They're very large images so it may be a slow download at home.

Let me know if you want the current (IE 6.0 for Windows only) version of
the iapi oaye cds (only images, it'll probably be a few years before I've
got it converted to text, or maybe someone else will do it). It took 4 cds
to fit it all, but keep in mind that the images are very very large. I
chose to make them huge since the originals were newspaper sized, and I
want it to be easily readable. With normal 8.5 by 11 or smaller you'd be
able to fit a lot more onto a cd. I have lots of other samples to of
digitized print sources, and a few dissertations I got from fiche. In the
next few months I'll be posting a list of what I've got. I hope to find
some nice person at a university who can offer server space to distrbute
the files so people can burn their own cds. I've got a lot of public domain
sources digitized (though only a couple converted to full text and it'll be
a while before I get the programming done to make those useful), though
full text versions are my main goal. Here's some of what I've got:

Dakotan:
-most of the BIA's indian reader series in lakota (Emil Afraid of Hawk and
Ann Nolan Clark)
-buechel's grammar, bible history
-deloria's dakota texts
-dorsey's omaha ponca letters
-hunflavy's dakota nyelv (hungarian)
-hunt's bible history
-pilling's biblio
-rigg's grammar, dictionary, 1852 combo

Ojibwe:
-both baraga grammars, both dictionaries
-belcourt's sauteux grammar
-cuoq's grammar, dictionary
-jones' ojibwe texts
-lemoine's dictionary
-pilling's biblio
-verwyst's exercises
-wilson's ojebway grammar

I think now I have total around 25-30,000 pages of Dakota material and
15-20,000 pages of Ojibwe material scanned and useable in my nice web-page
format. I'm focusing now on encoding full text versions so they're
integrable. Now I'm coding full text versions of the Pilling and Pentland
algonquian bibliographies and finding ways to combine them in a useful
format. Next will be practicing combining a couple of dictionaries. Then
there's the possibility of hooking it all together with texts linked to
dictionaries and vice versa, having citations and bibliographies linked to
digital versions of the original sources... endless possibility that should
save lots of research time. There's a lot to this work, and I could go on
for hours.

I hope that sometime this year everything I've digitized (the public domain
stuff) will be freely available to all. I'm am very interested in working
with others on digitizing projects. I can give you a complete list of
equipment, software, standards, and methods I use if you like. But I'm also
open to the possibility of just having microfilm sent here for me to scan.
I'm fast, I do good work, and I'd hate to see people spend time and money
for low quality output. I enjoy the digitizing work, and from there I can
set people up to train and run the ocr software and proof full text
versions themselves. I know that in the near future this work will be an
essential part of research. The best part is, if you do a good job, once
you digitize something you can make it immediately available to everyone
for free, and then every time anyone wants to work with the material, there
it is! Some of the people here at the U of MN have loved having all the
Ojibwe grammars on one cd and all the Ojibwe dicitonaries on another. It
saves a lot of time, and makes things available that weren't really all
that available before.

Pat Warren



More information about the Siouan mailing list