LL-L "Resources" 2003.10.04 (12) [E]

Mon Oct 6 00:12:10 UTC 2003

======================================================================
L O W L A N D S - L * 05.OCT.2003 (12) * ISSN 189-5582 * LCSN 96-4226
http://www.lowlands-l.net * lowlands-l at lowlands-l.net
Rules & Guidelines: http://www.lowlands-l.net/index.php?page=rules
Posting Address: lowlands-l at listserv.linguistlist.org
Server Manual: http://www.lsoft.com/manuals/1.8c/userindex.html
Archives: http://listserv.linguistlist.org/archives/lowlands-l.html
Encoding: Unicode (UTF-8) [Please switch your view mode to it.]
=======================================================================
You have received this because you have been subscribed upon request.
To unsubscribe, please send the command "signoff lowlands-l" as message
text from the same account to listserv at listserv.linguistlist.org or
sign off at http://linguistlist.org/subscribing/sub-lowlands-l.html.
=======================================================================
A=Afrikaans Ap=Appalachian B=Brabantish D=Dutch E=English F=Frisian
L=Limburgish LS=Lowlands Saxon (Low German) N=Northumbrian
S=Scots Sh=Shetlandic V=(West)Flemish Z=Zeelandic (Zeêuws)
=======================================================================

From: Kenneth Rohde Christiansen <kenneth at gnu.org>
Subject: LL-L "Resources" 2003.10.04 (10) [E]

This seems like a very expensive (lot of regulary expressions) function.
Does that scale well? I know the regex implementation in Perl is
excellent, but how many words can you check per minute?

Have you done test with a simple Fuzzy Matching algorithm - it would be
nice to know the result. Here is a bit of C code - I can port it to perl
if you want.

int CompareFuzzy (char *str1, char *str2)
{
    int i = 0; /* string iterator for str1 */
    int j = 0;
    int is_matching = TRUE;

    while (str1 [i] && str2 [j] && is_matching)
    {
        is_matching = FALSE;

        if (str1 [i] == str2 [j])
            is_matching = TRUE;
        else
        {
            /* lets swap some chars */

            if (str1 [i] == str2 [j+1] &&
                str1 [i+1] == str2 [j])
            {
                is_matching = TRUE;
                i++; /* iterator increment -> next char */
                j++;
            }

            /* lets ommit chars */

            if (!is_matching && i == j && str1 [i] == str2 [j+1])
            {
                is_matching = TRUE;
                j++;
            }

            if (!is_matching && i == j && str1 [i+1] == str2 [j])
            {
                is_matching = TRUE;
                i++;
            }
        }

        i++;
        j++;
    }

    /* extra check to check the lenght - can be removed */

    if (str1 [i] != '\0' || str2 [i] != '\0')
        is_matching = FALSE;

    return is_maching;
}

Kenneth

Sandy wrote:

> Below is a Perl subroutine I wrote a few months ago to enable me to match
> variant spellings in collections of Scots proverbs.

================================END===================================
* Please submit postings to lowlands-l at listserv.linguistlist.org.
* Postings will be displayed unedited in digest form.
* Please display only the relevant parts of quotes in your replies.
* Commands for automated functions (including "signoff lowlands-l") are
  to be sent to listserv at listserv.linguistlist.org or at
  http://linguistlist.org/subscribing/sub-lowlands-l.html.
=======================================================================