mircofilm digitization

bi1 at soas.ac.uk bi1 at soas.ac.uk
Mon Jan 12 18:40:28 UTC 2004


Pat
Interesting to see that you are working on the Iapi Oaye.  It is an
interesting collection of things.  There are a few pages of it here at the
British Museum Library.  Also in our library somewhere, but lost, there
is said to be a translation of the Pilgrim's Progress into Dakota.
Yours
Bruce
Date sent:      	Mon, 05 Jan 2004 15:20:34 CST
Send reply to:  	siouan at lists.colorado.edu
From:           	Pat Warren <warr0120 at umn.edu>
To:             	siouan at lists.colorado.edu
Subject:        	mircofilm digitization

> Yes, it's all possible. I've spent the last two years working on digitizing
> dakota and ojibwe texts from both print and microfilm. I just completed
> converting Iapi Oaye from microfilm to a web-navigable format. The images
> are archived as 500 dpi (actually better resolution than necessary for
> microfilm, but necessary for ocr of printed materials) tiff but converted
> to jpg for web page display. If anyone wants to see the Iapi Oaye cds let
> me know. Out of the 70 years it was published I'm missing less than ten
> pages (about 3100 images total. I'm hoping to distribute them more openly
> this spring when I get better at working with xslt processeors and can make
> the web pages work in more browser versions. As of right now, all the data
> and web pages are in xml so at this point it only works in internet
> explorer 6.0 on a pc. It might work on IE for mac too but I haven't
> checked.
>
> The University of Minnesota Wilson Library has all their microfilm print
> stations hooked up to computers now with capturing software that can send
> what you see on the reader to a printer or to a file. The 35mm film
> scanners and slide scanners don't work with microfilm. You have to have a
> reader with a paralell port output and software for requesting the image.
> The equipment to do all this is still too pricy for personal purchase in my
> opinion, so I'm happy to use the public equipment. My focus has been
> setting up standards and methods that anyone can replicate if they have the
> equipment. I work with great, trainable OCR software (Abbyy Finereader
> 7.0). I did lots of testing to find out what resolution you need to get the
> best results (500dpi), the best archiving format (tiff for black and white
> documents, 300 dpi jpg for greyscale or color).
>
> If you're interested in jumping into a digitizing project, let me know.
> This is what I'm committing much of my time to now. Don't waste time with
> grants and don't spend money on overpriced digitizing services. The quality
> of most of the digitized material I've seen so far, like those from the LOC
> and National Library of Canada, are actually really poor quality and
> consistency and their interfaces are pretty unimpressive and confusing. I'm
> interested in making all these materials available to anyone as low cost as
> possible.
>
> I posted a few of the images from Iapi Oaye so you can see the output.
> Here's the URIs:
>
> www.tc.umn.edu/~warr0120/images/1871_05_01.jpg
> www.tc.umn.edu/~warr0120/images/1871_05_02.jpg
> www.tc.umn.edu/~warr0120/images/1871_05_03.jpg
> www.tc.umn.edu/~warr0120/images/1871_05_04.jpg
>
> They're very large images so it may be a slow download at home.
>
> Let me know if you want the current (IE 6.0 for Windows only) version of
> the iapi oaye cds (only images, it'll probably be a few years before I've
> got it converted to text, or maybe someone else will do it). It took 4 cds
> to fit it all, but keep in mind that the images are very very large. I
> chose to make them huge since the originals were newspaper sized, and I
> want it to be easily readable. With normal 8.5 by 11 or smaller you'd be
> able to fit a lot more onto a cd. I have lots of other samples to of
> digitized print sources, and a few dissertations I got from fiche. In the
> next few months I'll be posting a list of what I've got. I hope to find
> some nice person at a university who can offer server space to distrbute
> the files so people can burn their own cds. I've got a lot of public domain
> sources digitized (though only a couple converted to full text and it'll be
> a while before I get the programming done to make those useful), though
> full text versions are my main goal. Here's some of what I've got:
>
> Dakotan:
> -most of the BIA's indian reader series in lakota (Emil Afraid of Hawk and
> Ann Nolan Clark)
> -buechel's grammar, bible history
> -deloria's dakota texts
> -dorsey's omaha ponca letters
> -hunflavy's dakota nyelv (hungarian)
> -hunt's bible history
> -pilling's biblio
> -rigg's grammar, dictionary, 1852 combo
>
> Ojibwe:
> -both baraga grammars, both dictionaries
> -belcourt's sauteux grammar
> -cuoq's grammar, dictionary
> -jones' ojibwe texts
> -lemoine's dictionary
> -pilling's biblio
> -verwyst's exercises
> -wilson's ojebway grammar
>
> I think now I have total around 25-30,000 pages of Dakota material and
> 15-20,000 pages of Ojibwe material scanned and useable in my nice web-page
> format. I'm focusing now on encoding full text versions so they're
> integrable. Now I'm coding full text versions of the Pilling and Pentland
> algonquian bibliographies and finding ways to combine them in a useful
> format. Next will be practicing combining a couple of dictionaries. Then
> there's the possibility of hooking it all together with texts linked to
> dictionaries and vice versa, having citations and bibliographies linked to
> digital versions of the original sources... endless possibility that should
> save lots of research time. There's a lot to this work, and I could go on
> for hours.
>
> I hope that sometime this year everything I've digitized (the public domain
> stuff) will be freely available to all. I'm am very interested in working
> with others on digitizing projects. I can give you a complete list of
> equipment, software, standards, and methods I use if you like. But I'm also
> open to the possibility of just having microfilm sent here for me to scan.
> I'm fast, I do good work, and I'd hate to see people spend time and money
> for low quality output. I enjoy the digitizing work, and from there I can
> set people up to train and run the ocr software and proof full text
> versions themselves. I know that in the near future this work will be an
> essential part of research. The best part is, if you do a good job, once
> you digitize something you can make it immediately available to everyone
> for free, and then every time anyone wants to work with the material, there
> it is! Some of the people here at the U of MN have loved having all the
> Ojibwe grammars on one cd and all the Ojibwe dicitonaries on another. It
> saves a lot of time, and makes things available that weren't really all
> that available before.
>
> Pat Warren
>
>



More information about the Siouan mailing list