LL-L "Text processing" 2003.03.02 (03) [E]

Lowlands-L admin at lowlands-l.net
Sun Mar 2 19:21:26 UTC 2003


======================================================================
 L O W L A N D S - L * 02.MAR.2003 (03) * ISSN 189-5582 * LCSN 96-4226
 http://www.lowlands-l.net  * admin at lowlands-l.net * Encoding: Unicode UTF-8
 Rules & Guidelines: http://www.lowlands-l.net/rules.htm
 Posting Address: lowlands-l at listserv.linguistlist.org
 Server Manual: http://www.lsoft.com/manuals/1.8c/userindex.html
 Archives: http://listserv.linguistlist.org/archives/lowlands-l.html
=======================================================================
 You have received this because you have been subscribed upon request.
 To unsubscribe, please send the command "signoff lowlands-l" as message
 text from the same account to <listserv at listserv.linguistlist.org> or
 sign off at <http://linguistlist.org/subscribing/sub-lowlands-l.html>.
=======================================================================
 A=Afrikaans Ap=Appalachian B=Brabantish D=Dutch E=English F=Frisian
 L=Limburgish LS=Lowlands Saxon (Low German) N=Northumbrian
 S=Scots Sh=Shetlandic V=(West)Flemish Z=Zeelandic (Zeêuws)
=======================================================================

From: Sandy Fleming [sandy at scotstext.org]
Subject: "Text processing"

There's a new version of the ScotsteXt site on the way. One
of the things that I've learned since making the present site
is how to use Perl/CGI.

This technology offers particularly powerful text processing
capabilities (the haiku generator on ScotsteXt is an example -
I just wish I could find the time to improve its dictionary!),
so that such things as concordances and glossaries should be
fairly easy to produce. Some ideas of the sort of results I
could provide:

Glossaries
Concordances
Hapax (a word or list of words of which only one instance is known)
Word frequency lists

Obviously a glossary is a very useful thing, no problem there.

However, I'm not keen on cluttering up the site with stuff
people won't want to use, and I'm not so sure about the other
kinds of texts for electronic media. Is a concordance really
worth having when with electronic media you can search for
instances of a particular word throughout the text anyway?

Probably the last three lists I mentioned could be combined
into a concordance categorised by word frequency. Again, is
this really worth having or would it be imitating paper
publishing too closely? Are there better equivalents for
electronic media?

One reason why I list the hapax especially is that I find
this very useful for checking scanned texts in a
nonspellcheckable language to see which words only occur
once - and hence might be errors. I don't know if this
would be useful to users of texts that are already proofed,
however.

Concordances and glossaries aside, are there any other
particularly useful tools or lists that can help readers
of electronic literary works? The field seems wide open
and yet I'm stuck for ideas!

I'm aware of various "data mining" tools such as statistical
searches that find the most relevant documents for a given
set of keywords, and I'll be giving these some thought. Has
anyone had any particularly useful or useless experiences in
using such things?

Thanks in advance for any suggestions and ideas.

Sandy
http://scotstext.org/

==================================END===================================
* Please submit postings to <lowlands-l at listserv.linguistlist.org>.
* Postings will be displayed unedited in digest form.
* Please display only the relevant parts of quotes in your replies.
* Commands for automated functions (including "signoff lowlands-l") are
  to be sent to <listserv at listserv.linguistlist.org> or at
  <http://linguistlist.org/subscribing/sub-lowlands-l.html>.
 =======================================================================



More information about the LOWLANDS-L mailing list