[Corpora-List] Grammar checker for English
Mike Maxwell
maxwell at ldc.upenn.edu
Thu Apr 14 02:39:00 UTC 2005
Corrin Lakeland wrote:
> On Tue, 12 Apr 2005 23:32, you wrote:
>
>> Does anyone have a technique or tool for checking the
>> grammatical correctness of a sentence?
>>
>> A full parser would be computationally too expensive,
>> so is there a computationally cheap method for this?
>
> I do not know of any systems which check if a sentence is
> well-formed without parsing it, although it is
> theoretically possible to do. However, there are many
> parsers that are quite efficient.
>
> ... I'm sure there is lots of other work in the field.
(I didn't see the original msg for some reason, but I'm
assuming it was posted to Corpora-List, hence a reply is
appropriate.)
Like Corrin, I don't know of any work done on testing
well-formedness without parsing. (Unlike him, I have a hard
time imagining how that would work--I suppose you could do
some sort of n-gram tests, but there would be no guarantee
that there wouldn't be an error at n+1, or for that matter
that back-off didn't lead to problems at larger n. But
maybe I just lack imagination :-).)
At any rate, there is a considerable amount of work done on
parsing _restricted_ English, with the intention of finding
ungrammatical sentences where the standard of grammaticality
is precisely some computational grammar. One domain where
this has been used is in aircraft manuals, which must be
read by technicians who do not have English as their first
language. As I understand it, the version of simplified
English used in these manuals is restricted both as to its
vocabulary and its grammar. (I'm not sure how compound
nouns are treated, maybe there's just a limit on nesting.)
One of the simplified-English-for-aircraft checkers was done
by Boeing. I wrote most of the original grammar rules back
in the mid-1980s, without the intent of restricting it, so
that it covered all the constructions we could come up with
(from both generative grammars and descriptive texts like
Quark, Greenbaum, Svartvik and Leach (sp?), plus testing
against various text corpora). I believe that after I left
in 1987, and the restricted English application came up,
many of the rules were removed so as to accept only the
desired restricted language. Phil Harrison wrote the
original parser in Lisp; I am told it was re-written in C
(or C++?) for speed, and that after the re-write its speed
was adequate for checking large manuals. (That was in the
late 1980s or early 1990s. Moore's Law has, I would
imagine, made its speed more adequate since then :-).)
--
Mike Maxwell
Linguistic Data Consortium
maxwell at ldc.upenn.edu
More information about the Corpora
mailing list