26.913, Review: Computational Linguistics: Leacock, Chodorow, Gamon‌, Tetreault (2014)

The LINGUIST List via LINGUIST linguist at listserv.linguistlist.org
Fri Feb 13 17:57:13 UTC 2015


LINGUIST List: Vol-26-913. Fri Feb 13 2015. ISSN: 1069 - 4875.

Subject: 26.913, Review: Computational Linguistics: Leacock, Chodorow, Gamon‌, Tetreault (2014)

Moderators: linguist at linguistlist.org (Damir Cavar, Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Anthony Aristar, Helen Aristar-Dry, Sara Couture)
Homepage: http://linguistlist.org

Editor for this issue: Sara  Couture <sara at linguistlist.org>
================================================================


Date: Fri, 13 Feb 2015 12:56:36
From: Cornelia Tschichold [C.Tschichold at swansea.ac.uk]
Subject: Automated Grammatical Error Detection for Language Learners

 
Discuss this message:
http://linguistlist.org/pubs/reviews/get-review.cfm?subid=35954517


Book announced at http://linguistlist.org/issues/25/25-1603.html

AUTHOR: Claudia  Leacock
AUTHOR: Martin  Chodorow
AUTHOR: Michael  Gamon‌
AUTHOR: Joel  Tetreault
TITLE: Automated Grammatical Error Detection for Language Learners
SUBTITLE: Second Edition
SERIES TITLE: Synthesis Lectures on Human Language Technologies
PUBLISHER: Morgan & Claypool Publishers
YEAR: 2014

REVIEWER: Cornelia I. Tschichold, Swansea University

Review's Editor: Helen Aristar-Dry

SUMMARY

This slim book is an updated version of the 2010 edition of the same title.
The authors justify this update with the substantial expansion of the field
and the fact that error detection technology has become mainstream. Language
learners, especially learners of English, constitute a huge potential market
for tools that promise them to improve the quality of their writing by
detecting grammatical errors. Such tools are now becoming more widespread and
are increasingly being used by educational institutions. 

In the introduction, the authors define ‘grammatical errors’ as including not
only grammar errors, but also usage and punctuation errors. These are the
kinds of errors that tools such as the grammar checker in MS Word can deal
with. The second chapter gives a short overview of the field of grammar
checking tools starting with the early commercial tools (‘CorrecText’,
‘Grammatik’, ‘Epistle’, ‘Critique’) in the 1980s with their varying amounts of
string-matching and parsing. The 1990s then gave way to statistical methods
that were combined with the earlier error-tolerant grammars or error rules.
Today most systems use a hybrid approach combining a statistical element that
relies on a large training corpus and some rules written to detect specific
error types, e.g. errors that are known to be typical for certain learner
groups.

Chapter 3 gives a short overview of the types of errors that grammar checkers
for language learners can be expected to deal with today. While learners will
have problems with different areas of the English language, partly depending
on their first language, a number of error types are quite frequent for almost
all learners of English, and much work has concentrated on these areas. The
article system of the English language is one such area; the use of
prepositions and collocations constitute two further sources of errors
commonly found in texts written by learners of English.

With Chapter 4, the book becomes slightly more technical. For the evaluation
and comparison of different error detection system, researchers use the terms
of ‘precision’ and ‘recall’ to calculate the results that grammar checkers
achieve for a given text. Both of these figures are calculated on the basis of
the numbers for the so-called true positives (errors correctly detected), the
false positives (errors detected where there are none), and the false
negatives (errors not detected). Precision and recall can also be combined
into an overall F-score. The measures are not perfect as they tend to show an
improvement of the system simply when there are more errors in the text. A
further issue is the precise definition of “error”.  What counts as an error,
and how the error is best corrected, is not always easy to determine. In order
to address these issues, so-called Shared Tasks have been developed, mainly
with a view to making a better comparison between systems possible. In these
tasks, true and false positives and false negatives are defined, and every
occurrence in the corpus identified before the evaluation starts. In contrast
to these Shared Tasks, learner corpora contain naturally occurring data,
without ready error annotation. Often, more than one correction is possible,
and sometimes it is not possible to say with certainty what the learner was
trying to express.

Chapters 5 and 6 focus on data-driven approaches to article and preposition
errors, and on errors relating to collocations, respectively. Data-driven
systems look at the context around each token to determine the typical context
for that particular token, but the number of words inspected on each side can
vary. If the corpus is tagged or even parsed, the syntactic context can
further enrich the information gained from the surrounding words and tags. If
deemed necessary, semantic information can be gained from electronic
dictionaries or sources such as WordNet. This contextual information is then
used to train the system. The training data can either be derived from a
corpus of texts that show correct usage only (typical for the earlier
systems), from a corpus of correct usage with artificially introduced errors
(e.g. random substitutions, deletions, etc.), or from a corpus containing both
correct usage and real errors. Once the training is finished, the system can
check any new text against its model. Features that show an unusual pattern,
i.e. one that deviates too much from the model of correct usage, will be
flagged up as potential errors. It is then up to the user to decide whether
the feature really does constitute an error or not. To help the user make this
decision, a number of typical usage examples can be displayed.

Like article and preposition errors, some collocation errors can be found by
comparing the learner’s text with native-speaker texts. Among the best systems
for the detection of collocation errors is one for Chinese learners that will
find typical mis-collocations such as “eat medicine” and suggest more
appropriate word combinations. Transfer errors caused by the learner’s first
language are very common in this area, so a good system needs to take this
into consideration.

Chapter 7 considers the final group of errors, those concerning spelling,
punctuation and verb forms.  Statistical approaches may be less well suited to
errors where the local context is very relevant. A verb form such as *writed
can be corrected without recourse to heuristics; the word form can just be
looked up in a list of over-regularized verb forms. After a list of potential
errors has been created (manually), a set of rules can be written for their
detection, and finally some filters added to prevent over-flagging. To
illustrate this approach, the grammar checker ‘Criterion’ has a
(bigram-derived) rule that the sequence of the article a followed by a plural
noun is usually wrong, but a filter then applies when this occurs in the
sequence “a systems analyst”. The ‘ESL Assistant’ has rules for the use of
modals and similar verb-related errors, prepositions, and other word
combinations that often lead to errors. Other systems for various languages
also use such error rules to identify potential problems. 

One of the most problematic aspects in the detection of verb form errors is
the fact that a wrong word class (part-of-speech) is often attached to a verb
during the tagging process precisely because there is an error in the text,
i.e. one resulting in another possible word. Some of the Shared Tasks now
include this type of error, so more attention may be paid to these problems in
the future.

Other punctuation and spelling errors are typically treated using the same
methods as in spell checkers aimed at native speakers, partly because we do
not know whether the errors non-native writers make are significantly
different in quality from those that native speaker writers make.

In Chapter 8, the authors take a closer look at the issues surrounding the
annotation of learner corpora for errors. To create a gold standard, a large
manually annotated corpus is needed in order to make objective evaluation
across different systems a realistic possibility. The ideal annotation would
be multi-layered, allowing for more than one correction, and arrived at by
agreement among several annotators. Annotator agreement can be low, however,
especially if a correction needs to be supplied as well. In the context of
automatic error detection systems, relatively simple annotation systems are
typically used, e.g. annotation schemes that are limited to the types of
errors the system can actually correct. Crowdsourcing is now opening up the
possibility of making more comprehensive error annotation more affordable both
in terms of time and money. The exploitation of online revision logs of wiki
texts or on language learning websites is another possibility mentioned that
is worth exploring in this context.

In the last chapter the authors take a look at some interesting developments
in the field since the first edition of this book came out. The first of these
topics is the Shared Tasks that have become available in the field of
automatic grammatical error correction. The first Shared Task used in the 2011
competition was restricted to a set of 13 errors concerning articles and
preposition use in a relatively small corpus. Since then the scope of errors
has widened, and more and more teams are taking part in the competitions. A
major issue in these Shared Task competitions are wrong annotations, and
errors that are detected, but were not originally flagged, making the whole
procedure quite labour-intensive. How to treat multiple errors is another
unresolved issue for Shared Tasks. Progress in machine translation systems is
the authors’ second reason for devoting a chapter to new developments in the
field. Some language pairs are problematic for article generation in the
target language text, and much work has gone into improving the post-editing
process for such pairs, i.e. determining definiteness from the source
language, so that the appropriate article can be generated in the target
language. Two methods developed for machine translation systems promise some
potential for re-use in error correction systems. The noisy channel model
treats error correction as a translation task from English with errors into
English without errors. This may be particularly useful if there are multiple
errors in a single sentence. As with any data-driven method, a large corpus is
required to train the system. The round trip model involves translating the
text into the writer’s native language first, then translating this back into
the target language. One experiment used this to correct preposition errors by
French speakers in English. Using multiple pivot languages can further improve
the result. The third potentially interesting development is the use of
crowdsourcing for error correction, divided up into identification, correction
and verification. The response time and the quality can be very good, so this
is an area with much promise for automatic error detection.

The chapter ends on the assumption that feedback is beneficial for writing
quality, but this is not universally accepted. The majority of linguists
probably agree that good feedback is beneficial. With automated systems, the
feedback quality is lower than it is for humans, but there is still evidence
of improved writing quality. The important point seems to be that writers are
happy to accept feedback from automated systems as long as it is largely
reliable. Systems should therefore be optimized for precision and avoid false
positives, even if this means some errors remain undetected. 

The conclusion the authors draw is that automatic error correction has been
most successful with data-driven approaches. Many error types have not
received much attention yet, and as the systems become more widely used, the
complexity of the problem becomes more and more obvious. A number of avenues
remain to be fully explored, including taking into consideration the writer’s
first language, techniques used in machine translation, and findings from the
field of second language acquisition research. 

EVALUATION

The intended readership of this book includes researchers in the field and
more generally people with an interest in the area of computer-assisted
language learning and NLP (Natural Language Processing).  Apart from this
group of people, the book could also be recommended for anyone with an
interest in how these automated error detection tools work. Given the fact
that more and more exam text written in English, whether by first or second
language writers, will be evaluated by tools such as those described in this
book, it would be highly advisable for many educators to familiarize
themselves with the basics of how these tools work. Despite the occasional
slightly more technical passage, this book is very readable for readers even
without any background in computational linguistics.

In hindsight, the order of topics as they appear in the chapters seems almost
in reverse and probably makes more sense from the point of view of the authors
than for people outside the field. The chapter on the history of the field is
logically positioned early on, but issues such as the definition of what
constitutes an error, given in the introduction, become much easier to
understand once the reader has learnt something about the difficulties the
tools described in this book actually face and the techniques used by them.
>From this definition, the authors work their way ‘downwards’ all the way to
the list of learner corpora given in the appendix, thus dissecting the problem
layer by layer.

Four years ago, when the first edition of this book appeared, the lack of
Shared Tasks made a comparison between different (commercial) systems
practically impossible. In this edition, the authors clearly explain the need
for and the value of these Shared Tasks. To the reader, the advances described
here also reveal that we are still at the beginning of the development of
grammar checkers for language learners and non-native writers. The brief look
into the three areas that may provide some key impetus to the field is one of
the most interesting aspects of the book and can offer a glimpse of what the
future may hold for automatic essay evaluation and error detection.

To conclude, I can highly recommend this book to any reader who is interested
in a short, readable overview of the state-of-the art in automated grammatical
error detection, written by some of the most influential researchers in this
field.


ABOUT THE REVIEWER

Cornelia Tschichold wrote her MA thesis on grammar checking for non-native
speakers, before going on to work on English computational phraseology for her
PhD. She now works at swansea University, where she teaches courses on English
linguistics. Her research interests include the acquisition of English
vocabulary and phraseology, and computer-assisted language learning.








----------------------------------------------------------
LINGUIST List: Vol-26-913	
----------------------------------------------------------







More information about the LINGUIST mailing list