John,<div><br></div><div>I agree with you, the difference is really impressive. The author (James Berardinelli meant 2 stars). In my opinion, you have to consider language variance. In some cases, an author tends to use grammatical means to express opinion; in other cases, it is not beneficial. In some cases an author tends to use lexical means; in other cases, it is not beneficial. An so on. To counteract this variance, we can use an aggregate vote: majority or average.</div>
<div><br></div><div>How I extract grammatical features? Leech and Svartvik consider 9-10 grammatical rules, for example, fronted negation (Not every house is so beautiful!) or repetitions (It is a big big house). In data mining, if I identify a pattern in a text that corresponds to particular rule I increment a corresponding feature value. It doesn't matter much if this identification was correct -- the approach is robust since it relies on an aggregate vote.</div>
<div><br></div><div>Your semantic example. As I already mentioned, I don't extract slang words. That's why the word "crap" in example "This demo is a lot of crap" is not in the dictionary and you get the meaning "neutral". In the second case, "This demo gets rid of the crap" you get the meaning "low_neg" -- it is not because "crap" magically got a meaning, but because you supply other words.</div>
<div><br></div><div>My engine uses words of 4 dictionaries: Wordnet-Affect with Dictionary of Affect, positive/negative GeneralInquirer (GI), Levin verbs. Some dictionaries define distinct emotion words such as good or bad; some dictionaries as negative GI relies on lexical affinity that means according to Pang a particular emotional orientation of a word. In your example, the resulting "low_neg" emotional meaning is calculated by accident. Emotional meaning is expressed not by the word "crap" as expected but by words "get" and "rid" that are taken from the negative GI. I assume that according to GI these words express most times a negative meaning. You can try the example containing only one word "rid" in order to see the meaning.</div>
<div><br></div><div>However, your example reminds me on the connection of semantic and grammatical meaning. In this case, word "rid" plays a role of a negation as word "not" or "never" or "except" and I think all of such words can be considered in the system dictionary. I have already discussed a similar issue earlier in connection with implicit negation (<a href="http://mailman.uib.no/public/corpora/2007-October/005412.html" target="_blank">http://mailman.uib.no/public/corpora/2007-October/005412.html</a>).</div>
<div><br></div><div>Alexander </div>
<div><br><div class="gmail_quote">2011/12/16 John F. Sowa <span dir="ltr"><<a href="mailto:sowa@bestweb.net" target="_blank">sowa@bestweb.net</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Alexander,<br>
<br>
I tried the default movie review with both the naive Bayes and<br>
the SVM options. For most of the options, both versions evaluated<br>
the review as average (2.5 stars).<br>
<br>
But with the option "with a dataset containing grammar features",<br>
naive Bayes dropped to 1.5 stars, and SVM dropped to 0.5 stars.<br>
<br>
That's an impressive difference. Could you say something more about<br>
how the system uses grammar features to derive those results.<br>
<br>
In particular, it would be helpful to select some sentences from<br>
that default text for which the grammar features make a significant<br>
difference and say how that difference was derived.<br>
<br>
For the semantic demo, I tried the following two sentences:<br>
<br>
"This demo is a lot of crap." Rating: neutral.<br>
<br>
"This demo gets rid of the crap." Rating: low-neg.<br>
<br>
That's the reverse of what one might expect.<br>
<br>
John<div><br>
<br>
<br>
<br>
______________________________<u></u>_________________<br>
UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/<u></u>corpora</a><br>
Corpora mailing list<br>
<a href="mailto:Corpora@uib.no" target="_blank">Corpora@uib.no</a><br>
</div><a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/<u></u>listinfo/corpora</a><br>
</blockquote></div><br></div>