[Corpora-List] HOO-2012 at BEA: Preposition and Determiner Error Correction -- Call for Registration

Fri Jan 27 21:33:48 UTC 2012

HOO-2012 at BEA: Preposition and Determiner Error Correction

Context

The HOO (Helping Our Own) Shared Task is concerned with
correcting textual errors. HOO 2012, which will be hosted by the
Building Educational Applications Workshop at NAACL 2012,
focusses on the correction of preposition and determiner errors
in a large collection of non-native speaker texts. These are
widely recognized to be amongst the most challenging aspects of
English lexico-syntax for non-native speakers to deal with: see
[Leacock et al. 2010] for a review.

The Task

The goal of this task is to provide a forum for the comparative
evaluation of approaches to the correction of errors in the use
of prepositions and determiners by non-native speakers of
English. Although these have already been the focus of a
considerable body of research in natural language processing, so
far it has been hard to compare the results delivered by
different teams as a consequence of different data sets and
slightly different task descriptions. This shared task provides a
common dataset and a shared evaluation framework as a means of
overcoming these problems.

The HOO-2012 Preps and Dets Shared Task follows on from the
HOO-2011 Shared Task Pilot Round held in 2011 as part of the 2011
European Natural Language Generation Workshop. That task had a
much broader focus on all kinds of errors in non-native speaker
writing, and use a much smaller dataset. The evaluation framework
for HOO-2012 is an enhancement of the scheme developed for
HOO-2011, taking advantage of what was learned in that exercise.

The Data

The data to be used for the task is derived from the Cambridge
Learner Corpus (CLC) described in [Yannakoudakis et al 2011]. The
data, which contains exam scripts written by students undertaking
the First Certificate in English (FCE) exams, is jointly provided
by Cambridge ESOL and Cambridge University Press.

The data we are using has been converted from the mark-up
provided in the released version of the CLC FCE data to use the
HOO annotation scheme.

What You Should Do Now

If you would like to participate in HOO 2012, you need to
formally register in order to obtain the data and evaluation
tools. To formally register, send the following information to
info at correcttext.org:

- Name of institution or other label appropriate for your team
- Name of contact person for your team
- Email address of contact person for your team

The HOO Google Group will be used for discussions in regard to
the data and the task more generally.  If you are not already a
member of the HOO Google Groups list, please also indicate the
email addresses you would like added to this list (the contact
email address will not explicitly added unless requested).

Schedule

The current schedule for HOO-2012 is as follows.

    Friday 27th January: Development data for the Shared Task
    released.
    Friday 6th April: Test data for evaluation released.
    Friday 13th April: Deadline for submissions from teams for evaluation.
    Monday 23rd April: Results of evaluation released.
    Friday May 4th: Final versions of team reports for proceedings due. 

See the HOO 2012 website at www.correcttext.org/hoo2012 for more
information.

Organizers

Robert Dale and Ilya Anisimoff, Macquarie University

References

R. Dale and A. Kilgarriff [2010] Helping Our Own: Text massaging
for computational linguistics as a new shared task. In
Proceedings of the 6th International Natural Language Generation
Conference, pages 261-265, Dublin, Ireland, 7th-9th July 2010.

R. Dale and A. Kilgarriff [2011] Helping Our Own: The HOO 2011
Pilot Shared Task. In Proceedings of the 13th European Workshop
on Natural Language Generation, Nancy, France, 28th-30th
September 2011.

C. Leacock, M. Chodorow, M. Gamon, and J. Tetreault [2010]
Automated Grammatical Error Detection for Language
Learners. Synthesis Lectures on Human Language
Technologies. Morgan and Claypool.

H. Yannakoudakis, T. Briscoe and B. Medlock [2011] A New Dataset
and Method for Automatically Grading ESOL Texts. In Proceedings
of the 49th Annual Meeting of the Association for Computational
Linguistics: Human Language Technologies, Portland, Oregon, USA,
19th-24th June 2011.

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora