[Corpora-List] Is the TEI a waste of time?

Lars Aronsson lars at aronsson.se
Fri Jun 27 17:56:20 UTC 2003


David Graff wrote:
> I agree wholeheartedly with these points.  However, it is possible to
> devote all due attention and care to the "niceties, whys and wherefores"
> without strict adherence to the full details of TEI specifications.

I'm new on this list and I'm not a linguist.  But I do run a website
of Scandinavian/Nordic literature ("Project Runeberg", runeberg.org)
since 1992 and I was among those who read the TEI P3 guidelines when
they first appeared.  My own conclusion was that TEI was too much and
I didn't see any immediate need for it in my application.  None of my
users have asked me to add TEI markup.  Some have asked me why I don't
use TEI, but that is a different thing.  I return by asking for a
reason, and I never hear any.  If *you* can explain why TEI markup of
Project Runeberg's texts would make them *any more useful to you*, I
might well give it a shot.  This is an invitation.  The explanation
should detail what parts of TEI would be useful to you.

Since 1998, Project Runeberg has added facsimile images to the old
books and journals that we digitize.  Since this is a zero-budget
hobbyist project and the highest cost of digitization is proofreading,
we publish facsimile images together with raw OCR text.  This spring
we have got a system working where any reader can correct errors and
proofread these texts over the web, directly from the browser (earlier
we used proofreading by e-mail).  Usage is catching on and we are
now producing high quality texts at a good speed.  Also this spring,
we fininished scanning two editions (20 + 38 volumes) of the Swedish
encyclopedia "Nordisk familjebok" (http://runeberg.org/nf/), which now
constitutes about half of our entire facsimile collection (45,000 of
100,000 pages).  The OCR text from the encyclopedia is 245 megabytes.
A typical page would be http://runeberg.org/nfbe/0408.html

I'd be interested in ways to make this collection more useful.


--
  Lars Aronsson (lars at aronsson.se)
  Project Runeberg - free Nordic literature - http://runeberg.org/



More information about the Corpora mailing list