23.88, Confs: Computational Linguistics/Canada

Thu Jan 5 18:56:02 UTC 2012

LINGUIST List: Vol-23-88. Thu Jan 05 2012. ISSN: 1069 - 4875.

Subject: 23.88, Confs: Computational Linguistics/Canada

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews: Veronika Drake, U of Wisconsin-Madison
Monica Macaulay, U of Wisconsin-Madison
Rajiv Rao, U of Wisconsin-Madison
Joseph Salmons, U of Wisconsin-Madison
Anja Wanner, U of Wisconsin-Madison
       <reviews at linguistlist.org>

Homepage: http://linguistlist.org

The LINGUIST List is funded by Eastern Michigan University,
and donations from subscribers and publishers.

Editor for this issue: Amy Brunett <brunett at linguistlist.org>

LINGUIST is pleased to announce the launch of an exciting new feature:  
Easy Abstracts! Easy Abs is a free abstract submission and review facility 
designed to help conference organizers and reviewers accept and process 
abstracts online.  Just go to: http://www.linguistlist.org/confcustom, and 
begin your conference customization process today! With Easy Abstracts, 
submission and review will be as easy as 1-2-3!


Date: 22-Dec-2011
From: Robert Dale [Robert.Dale at mq.edu.au]
Subject: Shared Task in Preposition and Determiner Error Correction

-------------------------Message 1 ---------------------------------- 
Date: Thu, 05 Jan 2012 13:55:46
From: Robert Dale [Robert.Dale at mq.edu.au]
Subject: Shared Task in Preposition and Determiner Error Correction

E-mail this message to a friend:
Shared Task in Preposition and Determiner Error Correction 
Short Title: HOO2012 at BEA 

Date: 07-Jun-2012 - 07-Jun-2012 
Location: Montreal, Canada 
Contact: Robert Dale 
Contact Email: Robert.Dale at mq.edu.au 
Meeting URL: http://www.correcttext.org 

Linguistic Field(s): Computational Linguistics 

Meeting Description: 

HOO 2012, a shared task hosted by the Building Educational Applications Workshop at NAACL 2012, focusses on the correction of preposition and determiner errors in a large collection of non-native speaker texts. 

Preliminary Call for Participation:


The HOO (Helping Our Own) Exercise [Dale and Kilgarriff 2010] is concerned with correcting textual errors. The HOO Pilot Shared Task run in 2011 (see [Dale and Kilgarriff 2011]) looked at a diverse range of error types in a small set of documents. HOO 2012, which will be hosted by the Building Educational Applications Workshop (see http://www.cs.rochester.edu/~tetreaul/naacl-bea7.html) at NAACL 2012, focusses on the correction of preposition and determiner errors in a large collection of non-native speaker texts. These are widely recognized to be amongst the most challenging aspects of English lexico-syntax for non-native speakers to deal with: see [Leacock et al. 2010] for a review.

The Task:

The goal of this task is to provide a forum for the comparative evaluation of approaches to the correction of errors in the use of prepositions and determiners by non-native speakers of English. Although these have already been the focus of a considerable body of research in natural language processing, so far it has been hard to compare the results delivered by different teams as a consequence of different data sets and slightly different task descriptions. This shared task provides a common dataset and a shared evaluation framework as a means of overcoming these problems.

The HOO-2012 Prepositions and Determiners Shared Task follows on from the HOO-2011 Shared Task Pilot Round held in 2011 as part of the 2011 European Natural Language Generation Workshop. That task had a much broader focus on all kinds of errors in non-native speaker writing, and use a much smaller dataset (see http://www.clt.mq.edu.au/research/projects/hoo/). The evaluation framework for HOO-2012 is an enhancement of the scheme developed for HOO-2011, taking advantage of what was learned in that exercise.

The Data:

The data to be used for the task is drawn from the Cambridge Learner Corpus (CLC), and contains exam scripts written by students undertaking the First Certificate in English (FCE) exams; it is used with the kind permission ofCambridge University Press.

The data we are using has been converted from the mark-up provided in the released version of the CLC FCE data to use the HOO annotation scheme. The data to be released for training consists of 1000 exam scripts extracted from the FCE dataset. A further subset of 100 exam scripts will be released for testing and evaluation at the appropriate point in the schedule. We are endeavouring to obtain fresh data for this stage of the exercise, but in the event that this turns out not to be possible, we will use part of the published FCE dataset that was held back from the training data. For this reason, we would ask participants not to use the originally-released FCE dataset for training purposes, but only the HOO-formatted subset that we release independently.


The evaluation methodology is essentially the same as that used in the HOO Pilot Round, but limited to preposition and determiner errors only. Tools will be provided which compute detection (lenient recognition, with at least one character overlap), recognition (exact extent recognition of an error), and correction (provision of an appropriate replacement string in addition to recognition).

What You Should Do Now:

Register your interesting by sending your email address to Robert.Dale at mq.edu.au. You'll be added to the HOO Google Groups list, where over the next few weeks some fine-tuning of the shared task will be discussed. Formal registration for the task will be as indicated in the schedule below.


The current schedule for HOO-2012 is as follows.

Friday 27th January: Website registration for participation in HOO-2012 opens; development data for the Shared Task released.
Friday 6th April: Test data for evaluation released.
Friday 13th April: Deadline for submissions from teams for evaluation.
Monday 23rd April: Results of evaluation released.
Friday May 4th: Final versions of team reports for proceedings due. 


Robert Dale, Macquarie University


R. Dale and A. Kilgarriff [2010] Helping Our Own: Text massaging for computational linguistics as a new shared task. In Proceedings of the 6th International Natural Language Generation Conference, pages 261-265, Dublin, Ireland, 7th-9th July 2010.

R. Dale and A. Kilgarriff [2011] Helping Our Own: The HOO 2011 Pilot Shared Task. In Proceedings of the 13th European Workshop on Natural Language Generation, Nancy, France, 28th-30th September 2011.

C. Leacock, M. Chodorow, M. Gamon, and J. Tetreault [2010] Automated Grammatical Error Detection for Language Learners. Synthesis Lectures on Human Language Technologies. Morgan and Claypool.

LINGUIST List: Vol-23-88	

More information about the Linguist mailing list