[Corpora-List] Question about evaluation

Emad Mohamed emohamed at umail.iu.edu
Sun Dec 2 22:13:55 UTC 2012


Hello Corpora members,
I have a corpus of 80,000 words in which each word is assigned either the
class S or the class E. Class S occurs 72,000 times while class E occurs
8,000 times only.
I'm wondering what the best way to evaluate the classifier performance
should be. I have randomly selected a dev set (5%) and a test set (10%).
I'm mainly interested in predicting which words are class E.

I've read this page:
webdocs.cs.ualberta.ca/~eisner/measures.html
but I'm still a little bit confused. Do we use specificity in linguistics
papers? Should I report these measures for each of the two classes or a as
a general number? Does this make sense / a difference?

Thank you so much.

-- 
Emad Mohamed
aka Emad Nawfal
Université du Québec à Montréal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20121202/e15ed579/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list