LL-L "Resources" 2003.10.04 (08) [E]

Lowlands-L lowlands-l at lowlands-l.net
Sun Oct 5 21:15:25 UTC 2003


======================================================================
L O W L A N D S - L * 05.OCT.2003 (08) * ISSN 189-5582 * LCSN 96-4226
http://www.lowlands-l.net * lowlands-l at lowlands-l.net
Rules & Guidelines: http://www.lowlands-l.net/index.php?page=rules
Posting Address: lowlands-l at listserv.linguistlist.org
Server Manual: http://www.lsoft.com/manuals/1.8c/userindex.html
Archives: http://listserv.linguistlist.org/archives/lowlands-l.html
Encoding: Unicode (UTF-8) [Please switch your view mode to it.]
=======================================================================
You have received this because you have been subscribed upon request.
To unsubscribe, please send the command "signoff lowlands-l" as message
text from the same account to listserv at listserv.linguistlist.org or
sign off at http://linguistlist.org/subscribing/sub-lowlands-l.html.
=======================================================================
A=Afrikaans Ap=Appalachian B=Brabantish D=Dutch E=English F=Frisian
L=Limburgish LS=Lowlands Saxon (Low German) N=Northumbrian
S=Scots Sh=Shetlandic V=(West)Flemish Z=Zeelandic (Zeêuws)
=======================================================================

From: Kenneth Rohde Christiansen <kenneth at gnu.org>
Subject: LL-L "Resources" 2003.10.04 (01) [E/LS]

Te be honest this is not an easy task - You can easily say, just do it
like this or like that...but you have to bear in mind how much memory it
will consume per page indexed (what about huge pages? if the system
starts swapping while it indexes huge pages you're in thouble), how
quick it can index, and how precise it has to be. Well there's a lot of
other things to think about - so please make a good analysis before
starting.

Writing such a thing in Perl or PHP is plain stupid. They are good
programming languages for webpages and small scripts to automate stuff -
but really, you don't want to maintain code written in perl for
instance! I am doing that for a gnu/gnome project - it is a pain - you
can code something quick but it is hard to write good maintainable code
- if not impossible.

Personally I would code it in C# - even though this is a Microsoft
language it is a very good one that seems to be open. There is also a
good open source implementation (www.go-mono.com) done by my friends at
Ximian/Novell that runs on Linux, MacOS etc.

Find some of the various ways to solve the problem - describe them, with
pros and cons - and let me read though it - I have implemented
webcrawlers and search engines before.

Kenneth

> How about coming up with an automatic generator of alternative spelling,
for
> instance written in Perl, PHP or such?  Such an "engine" would be very
> useful for many of us, for instance if it were available online for
> automated transformation into different orthographic systems (which sounds
> like a projet right there).  Maybe Sandy, Mathieu and others could act as
> advisers on that.

----------

From: Jan Strunk <strunkjan at hotmail.com>
Subject: LL-L "Resources" 2003.10.04 (01) [E/LS]

Hello,

Ron wrote:
> I'd be happy to do some test driving and might be able to contribute a few
> URLs.
Thank you!

> How about coming up with an automatic generator of alternative spelling,
for
> instance written in Perl, PHP or such?  Such an "engine" would be very
> useful for many of us, for instance if it were available online for
> automated transformation into different orthographic systems (which sounds
> like a projet right there).  Maybe Sandy, Mathieu and others could act as
> advisers on that.
That's what I was thinking about. I might do it in Perl because that's the
language
I mainly used so far for statistical NLP, but at Stanford they seem to use
more
Java. Anyone is ok, I guess. I was thinking of trying to learn automatic
"transducer"
that can translate all dialects to a kind of "common representation" which
would be
stored in the index. Occasional overgeneration of variant forms would
probably not
be too problematic. If this works reasonably well, I could also try to build
translaters from
one dialect into another. But please don't expect me to build these things
in a months time.
Maybe, it won't even work too well. Of course, if it works, one could
probably also
produce a system for Scots...
The most important thing now is to get as much Low Saxon texts from the
internet as
possible in as many different writing systems as possible. Also, could
anyone who knows
about spelling guidelines send me the rules if they are available online!

Thank you very much!

Jan Strunk
strunk at linguistics.ruhr-uni-bochum.de
jstrunk at stanford.edu

================================END===================================
* Please submit postings to lowlands-l at listserv.linguistlist.org.
* Postings will be displayed unedited in digest form.
* Please display only the relevant parts of quotes in your replies.
* Commands for automated functions (including "signoff lowlands-l") are
  to be sent to listserv at listserv.linguistlist.org or at
  http://linguistlist.org/subscribing/sub-lowlands-l.html.
=======================================================================



More information about the LOWLANDS-L mailing list