how to report mor tagging accuracy rate

Huang Y.J. ihappylearning at gmail.com
Thu May 31 09:37:46 UTC 2012


I am currently using CLAN to do frequency analysis based on
syntactical structure in English. Currently, my corpus size is about
2,000,000 words in the genre of children's novels (about 50 books in
multiple files).

For I will be using part of speech (POS) information often, so I
consider to use Mor and POST to tag my corpus.  And for the increasing
size and variety of my corpus, I won't be able to do POSTTRAINing to
improve the tagging accuracy.

So one of the questions that has been concerning by me is the tagging
accuracy rate. I have read the Enriching CHIDLES for Morphosyntactic
Analysis; and I found that the accuracy rate is claimed in the
document to be close to 94~95% (pp.10-11 & pp.14-15)

But I was wondering is there any formal report to support this
accuracy rate of tagging with mor without posttraining? And if not,
how can I be sure of my corpus reached a certain standard of accuracy?
I mean is there any way to test the accuracy tagging rate of my
corpus, or any document to support the mor tagging accuracy rate for
validity?

Many thanks.
Huang Y.J.

-- 
You received this message because you are subscribed to the Google Groups "Info-CHILDES" group.
To post to this group, send email to info-childes at googlegroups.com.
To unsubscribe from this group, send email to info-childes+unsubscribe at googlegroups.com.
For more options, visit this group at http://groups.google.com/group/info-childes?hl=en.



More information about the Info-childes mailing list