[Corpora-List] Looking for studies of preprocessing for syntactic annotation

Arne Skjærholt arnskj at ifi.uio.no
Thu Apr 18 14:24:48 UTC 2013


Hello all,
I'm looking for studies that have been done of preprocessing for the
task of syntactic annotation and how the quality of the preprocessing
affects annotator agreement and the time used to annotate. Studies of
dependency annotation are particularly interesting to me, but phrase
structure and other formats are interesting as well.

>>From the articles I've found so far, it seems that virtually all
treebanking efforts use some kind of syntactic preprocessing step, but
not many have actually done systematic studies quantifying the impact
of the preprocessing steps and how this scales as a function of
preprocessing quality.

The only relevant literature I've found so far is (full references at
the end of the mail):
- Fort and Sagot (2010): In-depth study of preprocessing quality for
PoS annotation
- Marcus e.a. (1993): Reports gains for PoS annotation, but no numbers
are given for syntactic bracketing
- Chiou e.a. (2001): Reports speed gains with increased F_1 of parser,
no study of agreement
- Tanaka e.a. (2005): Reports speed and agreement gains for
discriminant-based HPSG parse disambiguation when certain
discriminants are automatically rejected or selected before annotation

If anyone is aware of more work along similar lines of inquiry, I'd be
very grateful if they could point them out to me.

Regards,
Arne

Fort and Sagot (2010): "Influence of pre-annotation on POS-tagged
corpus development". Proc. 4th Linguistic annotation workshop.

Marcus, Santorini and Marcinkiewicz (1993): "Building a large
annotated corpus of English: The Penn Treebank". Comp.Ling.

Chiou, Chang and Palmer (2001): "Facilitating Treebank Annotation
Using a Statistical Parser". Proc. 1st Conf. on Human language
technology research

Tanaka, Bond, Oepen and Fujita (2005): "High precision treebanking:
Blazing useful trees using POS information". Proc 43rd ACL

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list