<div dir="ltr">Thank you all for the valuable advice. <br>I think I'll go with making one of them the positive class and the other the negative one, and measure precision, recall and the F-score.<br>Thank you again.<br>

</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Dec 3, 2012 at 6:54 AM, Detmar Meurers <span dir="ltr"><<a href="mailto:dm@sfs.uni-tuebingen.de" target="_blank">dm@sfs.uni-tuebingen.de</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>

<br>

an option you could consider is to sample an equal number of both, so<br>

that you get a random baseline of 50%.<br>

<br>

Given the low frequency of class E, you could just take all<br>

instances of class E and then randomly select the same number of<br>

instances of Class S.<br>

<br>

Then you can report 10-fold cross validation results for this balanced<br>

data set.<br>

<br>

While this is useful to get a good grip on the performance of your<br>

features and classifier setup, in case you want to test the<br>

performance for a real-world application, you'll want to take into<br>

account that one class is much more prominent in the data that<br>

real-world application needs to be dealing with. Depending on what the<br>

application is supposed to do, you'd then maximize precision or recall<br>

for the class you're most interested in.<br>

<br>

Best,<br>

Detmar<br>

<br>

--<br>

Prof. Dr. Detmar Meurers, Universität Tübingen       <a href="http://purl.org/dm" target="_blank">http://purl.org/dm</a><br>

Seminar für Sprachwissenschaft, Wilhelmstr. 19, 72074 Tübingen, Germany<br>

<div class="HOEnZb"><div class="h5"><br>

<br>

<br>

<br>

On Sun, Dec 02, 2012 at 05:13:55PM -0500, Emad Mohamed wrote:<br>

> Hello Corpora members,<br>

> I have a corpus of 80,000 words in which each word is assigned either the<br>

> class S or the class E. Class S occurs 72,000 times while class E occurs<br>

> 8,000 times only.<br>

> I'm wondering what the best way to evaluate the classifier performance<br>

> should be. I have randomly selected a dev set (5%) and a test set (10%).<br>

> I'm mainly interested in predicting which words are class E.<br>

><br>

> I've read this page:<br>

> <a href="http://webdocs.cs.ualberta.ca/~eisner/measures.html" target="_blank">webdocs.cs.ualberta.ca/~eisner/measures.html</a><br>

> but I'm still a little bit confused. Do we use specificity in linguistics<br>

> papers? Should I report these measures for each of the two classes or a as<br>

> a general number? Does this make sense / a difference?<br>

><br>

> Thank you so much.<br>

><br>

> --<br>

> Emad Mohamed<br>

> aka Emad Nawfal<br>

> Université du Québec à Montréal<br>

<br>

</div></div></blockquote></div><br><br clear="all"><br>-- <br><div dir="ltr"><font size="1">Emad Mohamed<br>aka Emad Nawfal<br><span dir="auto">Université du Québec à Montréal</span><br></font></div><br>

</div>