[Corpora-List] 155 *billion* (155, 000, 000, 000) word corpus of American English

John F. Sowa sowa at bestweb.net
Fri May 13 16:45:37 UTC 2011


On 5/13/2011 11:54 AM, Michal Ptaszynski wrote:
> A word of explanation if someone was wondering about a difference between
> a "phrase" and a "pattern"...
> The patterns I am looking for are not (or "not always")
> something you could put into a dictionary. For example, in a sentence "Oh,
> what a beautiful day it is today, isn't it!", my method finds a pattern
> "Oh, what a * isn't it!", or  "what a * isn't it!"
> ... I also allow for more than one wildcard.
>
> I did some background search but, although it would be reasonable that
> such a method existed, I could not find any. If anyone knows about such a
> method developed earlier, I would be in debt.

SNOBOL is a pattern-matching language that allows arbitrary regular
expressions, which can contain any number of wildcards.

The simplest SNOBOL pattern is a quoted string, which could be
a word or phrase.

SNOBOL also supports named patterns.  That option takes it beyond
regular expressions to a subset of context-free grammars.  In fact,
it's possible to use SNOBOL pattern matching as a recursive-descent
parser with backtracking.

Since the first version of SNOBOL was implemented in 1962, that would
count as prior art.  See

    http://www.snobol4.org/history.html

AWK is a simpler pattern matching language, which also supports
arbitrary regular expressions with arbitrary wildcards.  Like SNOBOL,
AWK was also designed and implemented at Bell Labs.  The basic
structure of an AWK program is a loop consisting of condition-action
statements, in which each condition is a pattern and the action
part does something with whatever the pattern found.

AWK is freely available for download.  For more info, tutorials,
manuals, and software, type "AWK" to your favorite search engine.

John Sowa


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list