17.740, Diss: Computational Ling: Sofkova Hashemi: 'Automatic..'

linguist at LINGUISTLIST.ORG linguist at LINGUISTLIST.ORG
Fri Mar 10 19:57:03 UTC 2006


LINGUIST List: Vol-17-740. Fri Mar 10 2006. ISSN: 1068 - 4875.

Subject: 17.740, Diss: Computational Ling: Sofkova Hashemi: 'Automatic..'

Moderators: Anthony Aristar, Wayne State U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>
 
Reviews (reviews at linguistlist.org) 
        Sheila Dooley, U of Arizona  
        Terry Langendoen, U of Arizona  

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Takako Matsui <tako at linguistlist.org>
================================================================  

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.


===========================Directory==============================  

1)
Date: 08-Mar-2006
From: Sylvana Sofkova Hashemi < sylvana at ling.gu.se >
Subject: Automatic Detection of Grammar Errors in Primary School Children's Texts. A Finite State Approach. 

	
-------------------------Message 1 ---------------------------------- 
Date: Fri, 10 Mar 2006 14:54:41
From: Sylvana Sofkova Hashemi < sylvana at ling.gu.se >
Subject: Automatic Detection of Grammar Errors in Primary School Children's Texts. A Finite State Approach. 
 

Institution: Göteborg University 
Program: Department of Linguistics 
Dissertation Status: Completed 
Degree Date: 2003 

Author: Sylvana Sofkova Hashemi

Dissertation Title: Automatic Detection of Grammar Errors in Primary School
Children's Texts. A Finite State Approach. 

Dissertation URL:  http://www.ling.gu.se/~sylvana/

Linguistic Field(s): Computational Linguistics
                     Language Acquisition
                     Syntax

Subject Language(s): Swedish (swe)


Dissertation Director(s):
Robin Cooper

Dissertation Abstract:

This thesis concerns the analysis of grammar errors in Swedish texts written by
primary school children and the development of a finite state system for finding
such errors. Grammar errors are more frequent for this group of writers than for
adults and the distribution of the error types is different in children's texts.
In addition, other writing errors above word-level are discussed here, including
punctuation and spelling errors resulting in existing words.

The method used in the implemented tool FiniteCheck involves subtraction of
finite state automata that represent grammars with varying degrees of detail,
creating a machine that classifies phrases in a text containing certain kinds of
errors. The current version of the system handles errors concerning agreement in
noun phrases, and verb selection of finite and non-finite forms. At the lexical
level, we attach all lexical tags to words and do not use a tagger which could
eliminate information in incorrect text that might be needed later to find the
error. At higher levels, structural ambiguity is treated by parsing order,
grammar extension and some other heuristics.

The simple finite state technique of subtraction has the advantage that the
grammars one needs to write to find errors are always positive, describing the
valid rules of Swedish rather than grammars describing the structure of errors.
The rule sets remain quite small and practically no prediction of errors is
necessary.

The linguistic performance of the system is promising and shows comparable
results for the error types implemented to other Swedish grammar checking tools,
when tested on a small adult text not previously analyzed by the system. The
performance of the other Swedish tools was also tested on the children's data
collected for this study, revealing quite low recall rates. This fact motivates
the need for adaptation of grammar checking techniques to children, whose errors
are different from those found in adult writers and pose more challenge to
current grammar checkers, that are oriented towards texts written by adult writers.

The robustness and modularity of FiniteCheck makes it possible to perform both
error detection and diagnostics. Moreover, the grammars can in principle be
reused for other applications that do not necessarily have anything to do with
error detection, such as extracting information in a given text or even parsing.

Key Words: grammar errors, spelling errors, punctuation, children's writing,
Swedish, language checking, light parsing, finite state technology





-----------------------------------------------------------
LINGUIST List: Vol-17-740	

	



More information about the LINGUIST mailing list