[Corpora-List] error tagging

Timothy Baldwin tbaldwin at csli.stanford.edu
Fri Sep 26 17:49:31 UTC 2003


> I am interested in error tagging and I am looking for corpora which are (or are being) error tagged. Do you know of any? And do you know of any available error tagset?

One more recent effort I know of is the SST Corpus, which is a 1m word corpus
of transcribed English speech by Japanese learners of English. Various errors
are tagged, although I can't find any online account of the full tagset. There
are a couple of papers in English on the corpus, notably:

Tono, Y., Kaneko, T., Isahara, H., Saiga, T. and Izumi, E.  The Standard
Speaking Test (SST) Corpus: A 1 million-word spoken corpus of Japanese
learners of English and its implications for L2 lexicography. Lee, S. (ed.)
ASIALEX 2001 Proceedings: Asian Bilingualism and the Dictionary. The Second
Asialex International Congress, August 8-10, 2001, Yonsei University, Korea,
pp. 257-262

There is a web page with some documentation and a copy of this paper at:

http://leo.meikai.ac.jp/~tono/sst/

There was also a paper at this year's ACL:

Emi Izumi, Kiyotaka Uchimoto, Toyomi Saiga, Thepchai Supnithi and Hitoshi
Isahara (2003) Automatic error detection in the Japanese learners' English
spoken data. In Companion Volume to the Proceedings of the 41st Annual Meeting
of the Association for Computational Linguistics (ACL '03), pp. 145-8.

which is also available online at:

http://acl.ldc.upenn.edu/acl2003/posterdemo/pdf/Izumi.pdf



Tim

*-----------------------------------*

Timothy Baldwin
Senior research engineer
Multiword Expression project
CSLI LinGO Lab


Contact details:

 Email: tbaldwin at csli.stanford.edu
Tel:   (+1)-650-723-0515
Fax:   (+1)-650-723-2166

*-----------------------------------*



More information about the Corpora mailing list