LL-L "Technica" 2003.10.19 (03) [E]

Lowlands-L lowlands-l at lowlands-l.net
Sun Oct 19 17:25:46 UTC 2003


======================================================================
L O W L A N D S - L * 19.OCT.2003 (03) * ISSN 189-5582 * LCSN 96-4226
http://www.lowlands-l.net * lowlands-l at lowlands-l.net
Rules & Guidelines: http://www.lowlands-l.net/index.php?page=rules
Posting Address: lowlands-l at listserv.linguistlist.org
Server Manual: http://www.lsoft.com/manuals/1.8c/userindex.html
Archives: http://listserv.linguistlist.org/archives/lowlands-l.html
Encoding: Unicode (UTF-8) [Please switch your view mode to it.]
=======================================================================
You have received this because you have been subscribed upon request.
To unsubscribe, please send the command "signoff lowlands-l" as message
text from the same account to listserv at listserv.linguistlist.org or
sign off at http://linguistlist.org/subscribing/sub-lowlands-l.html.
=======================================================================
A=Afrikaans Ap=Appalachian B=Brabantish D=Dutch E=English F=Frisian
L=Limburgish LS=Lowlands Saxon (Low German) N=Northumbrian
S=Scots Sh=Shetlandic V=(West)Flemish Z=Zeelandic (Zeêuws)
=======================================================================

From: Sandy Fleming [sandy at scotstext.org]
Subject: "Technica"

> From: Andy Eagle <andy at SCOTS-online.org>
> Subject:  "Help needed"
>
> Does this still allow text searches or cut and paste with an Adobe
> PDF Reader?
> Something I assume researchers would appreciate.

As I said in my original message, there are problems with the idea of making
OCR'd texts available as exact copies of the original. The researcher can
never be sure that it's faithful to the original if it's "cut and
paste"-able. The normal Scotstext OCR'd (and edited) texts will always be
available as the main text, the photoscans are just intended as copies for
those who need to check the original text.

> From: R. F. Hahn <sassisch at yahoo.com>
> Subject: Technica
>
> > I've tried it both ways for the first 14 pages of a book. See
> the results
> at:
> >
> > http://www.scotstext.org/photoscans/the_winds_heart/
>
> Very nice!  It's slow downloading to older computers, though.

Can't be helped! These are scanned at a resolution of 300 dpi, then I've
reduced the palette to 16 colours (or I think I did do that in the end -
that's something I'll need to double check). I did try reducing to 2 colours
but the loss of aliasing made it difficult to read. I can't think of
anything else I can do that would make good copies at a smaller size. Again,
since these are intended as support for the edited texts, it probably
doesn't matter.

I wonder if it would be acceptable to present the covers as JPG's to reduce
the size? Another possibility for the covers would be to "tidy" them into
flat colours and present them as PNG's. All that would be lost is texture,
but maybe the results would give the wrong impression of what sort of book
we're looking at, eg an old cloth bound book might end up looking like a
modern paperback.

I could crop all the blank margins in the textual pages, but again this
might give the wrong impression of what the book was like.

> That sounds like a good plan.  It would be nice for the user to
> have a link
> to the corresponding page facsimile on each hypertext page.  The facsimile
> should open in a new window to allow on-screen comparison.

Yes, I think that's a good way to do it.

> It would be interesting to see what some of the better libraries do with
> their special collections.  I believe the British Museum
> (http://www.thebritishmuseum.ac.uk/,
> http://www.thebritishmuseum.ac.uk/compass/) and the British Library
> (http://www.bl.uk/) display some ancient manuscripts in facsimile, and I

Leonardo da Vinci's notebook! Amazing! Shame that you need Shockwave - I
can't be bothered waiting for it to download!

I have a fair idea now where I want to go with ScotsteXt. Ultimately I hope
to be able to put it all into XML to give some sort of guarantee of
permanence, transformability and syndication. Each document (probably
counting single poems and book chapters as "documents" - not in
correspondence with "files") would be part of the "semantic web", carrying
semantic content as well as text. Thus if you ask for "helicon verses
written by Robert Burns or Alexander Montgomerie" you should get just that -
each poem would have semantic information such as its author and the poetic
form and suchlike, and similarly for prose.

I'm hoping data mining will be a better solution to finding things in
ScotsteXt rather than brute-force searches. Thus if you search for "gowans"
it should list texts most densely spattered with the word first, with
logarithmic damping to prevent very common words in the language from
swamping the more interesting keywords.

The photoscans could be included in the search if required - just search the
textual documents and give back a link to the corresponding photoscans.

Sandy
http://scotstext.org/

================================END===================================
* Please submit postings to lowlands-l at listserv.linguistlist.org.
* Postings will be displayed unedited in digest form.
* Please display only the relevant parts of quotes in your replies.
* Commands for automated functions (including "signoff lowlands-l") are
  to be sent to listserv at listserv.linguistlist.org or at
  http://linguistlist.org/subscribing/sub-lowlands-l.html.
=======================================================================



More information about the LOWLANDS-L mailing list