[Corpora-List] Grammar checker for English

Thu Apr 14 12:54:11 UTC 2005

The Educational Testing Service has a product, Criterion, which uses n-gram
techniques to identify common types of grammatical errors. A good reference
is:

Chodorow, Martin and Claudia Leacock. (2000). An unsupervised method for
detecting grammatical errors. In Proceedings of the 1st Annual Meeting of
the North American Chapter of the Association for Computational Linguistics,
140-147.

-----Original Message-----
From: Mike Maxwell [mailto:maxwell at ldc.upenn.edu]
Sent: Wednesday, April 13, 2005 10:39 PM
To: Corrin Lakeland
Cc: D.G.Damle; CORPORA at HD.UIB.NO
Subject: Re: [Corpora-List] Grammar checker for English

Corrin Lakeland wrote:
> On Tue, 12 Apr 2005 23:32, you wrote:
>
>> Does anyone have a technique or tool for checking the
>> grammatical correctness of a sentence?
>>
>> A full parser would be computationally too expensive,
>> so is there a computationally cheap method for this?
>
> I do not know of any systems which check if a sentence is
>  well-formed without parsing it, although it is
> theoretically possible to do. However, there are many
> parsers that are quite efficient.
>
> ... I'm sure there is lots of other work in the field.

(I didn't see the original msg for some reason, but I'm
assuming it was posted to Corpora-List, hence a reply is
appropriate.)

Like Corrin, I don't know of any work done on testing
well-formedness without parsing.  (Unlike him, I have a hard
time imagining how that would work--I suppose you could do
some sort of n-gram tests, but there would be no guarantee
that there wouldn't be an error at n+1, or for that matter
that back-off didn't lead to problems at larger n.  But
maybe I just lack imagination :-).)

At any rate, there is a considerable amount of work done on
parsing _restricted_ English, with the intention of finding
ungrammatical sentences where the standard of grammaticality
is precisely some computational grammar.  One domain where
this has been used is in aircraft manuals, which must be
read by technicians who do not have English as their first
language.  As I understand it, the version of simplified
English used in these manuals is restricted both as to its
vocabulary and its grammar.  (I'm not sure how compound
nouns are treated, maybe there's just a limit on nesting.)

One of the simplified-English-for-aircraft checkers was done
by Boeing.  I wrote most of the original grammar rules back
in the mid-1980s, without the intent of restricting it, so
that it covered all the constructions we could come up with
(from both generative grammars and descriptive texts like
Quark, Greenbaum, Svartvik and Leach (sp?), plus testing
against various text corpora).  I believe that after I left
in 1987, and the restricted English application came up,
many of the rules were removed so as to accept only the
desired restricted language.  Phil Harrison wrote the
original parser in Lisp; I am told it was re-written in C
(or C++?) for speed, and that after the re-write its speed
was adequate for checking large manuals.  (That was in the
late 1980s or early 1990s.  Moore's Law has, I would
imagine, made its speed more adequate since then :-).)
--
	Mike Maxwell
	Linguistic Data Consortium
	maxwell at ldc.upenn.edu

**************************************************************************
This e-mail and any files transmitted with it may contain privileged or
confidential information. It is solely for use by the individual for whom
it is intended, even if addressed incorrectly. If you received this e-mail
in error, please notify the sender; do not disclose, copy, distribute, or
take any action in reliance on the contents of this information; and delete
it from your system. Any other use of this e-mail is prohibited. Thank you
for your compliance.