Corpora: error tagging of learners' English

John Milton lcjohn at ust.hk
Fri Nov 17 12:01:13 UTC 2000


Care to share how the coding was done, and an example of tagged text? I
annotated about 100,000 words of written HK IL by first POS-tagging it 
(with CLAWS), mapping a keyboard with a set of error tags that describe
mostly morpheme-level errors, and (guided by the CLAWS tags), going 
through manually inserting error tags. Then I concordanced on the error
tags to determine higher constituent errors. Of course, this is fraught
with subjectivity since you have to project what the L2 writer would have
written had s/he used an acceptable structure, which is often quite
different from what an NS might have written, and there are usually
multiple possibilities... tricky stuff this...

Your example reminds me of the instructor who objected to having his
students accessing a concordancer because he found a single line in French
in thousands of corcordanced examples of English texts that read something
like "Réservations et Informations Pour le Passager", and was convinced
that his students would use this to challenge the countability rule...

John



More information about the Corpora mailing list