George,<div><br></div><div>> ... male ... coordinating conjunctions and contracted forms of the definite article</div><div>> ... female ... personal pronouns (i.e. us, we etc). </div><div><br></div><div>wonderful! Completely in line with Biber 1988, the main dimension is interactional vs informational (with def article as strong indicator for 'nouniness', informational), and - surprise surprise (see eg Deborah Tannen's "You just don't understand") - women are more interactional, men more informational (or, in Tannen's terminology, women do more rapport, men more report). Like English, like Greek.</div>
<div><br></div><div>Pronouns are the litmus paper of text type</div><div><br></div><div>Adam</div><div><br><div class="gmail_quote">On 4 November 2010 23:00, Georgios Mikros <span dir="ltr"><<a href="mailto:gmikros@isll.uoa.gr">gmikros@isll.uoa.gr</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Dear Diana,<br>
I recently tried to predict the gender of 14 authors (7 males and 7 females)<br>
in a newspaper corpus of Modern Greek. I used a variety of stylometric<br>
variables including 100 most frequent words, word length, letter frequencies<br>
and lexical "richness" indices like Yule's K, Lexical Density, Text entropy<br>
etc. The classifier employed in the research was an artificial neural<br>
network (multilayer perceptron) using the above variables as input and the<br>
author gender as an output. 10-fold cross validation results report a<br>
precision value of 0.85. The most "useful" category of variables in gender<br>
discrimination was the most frequent words. It is interesting that among the<br>
words that predict male gender were many coordinating conjunctions and<br>
contracted forms of the definite article. On the other hand "female" words<br>
contained many personal pronouns (i.e. us, we etc). I just finished the<br>
analysis and I don't have anything written yet. However, if you are<br>
interested in this I could send you a copy when I wrote the full paper.<br>
Best<br>
George Mikros<br>
University of Athens<br>
Greece<br>
<div class="im"><br>
-----Original Message-----<br>
From: <a href="mailto:corpora-bounces@uib.no">corpora-bounces@uib.no</a> [mailto:<a href="mailto:corpora-bounces@uib.no">corpora-bounces@uib.no</a>] On Behalf Of<br>
</div><div class="im">Diana Maynard<br>
Sent: Thursday, November 04, 2010 12:16 PM<br>
To: Adam Kilgarriff<br>
Cc: <a href="mailto:corpora@uib.no">corpora@uib.no</a><br>
Subject: Re: [Corpora-List] For students: "CL in Action"<br>
<br>
</div><div><div></div><div class="h5">I wondered that as well.<br>
<br>
On another note, I guess the success of it depends critically on at<br>
least two things:<br>
(1) how good the gender guesser is (I didn't see any statistics on that,<br>
but I didn't search extensively).<br>
<br>
(2) (which is related) - the proportion of American names in the twitter<br>
corpus (since I think the guesser used is based solely on American first<br>
names) - and this could have some impact. Even the differences between<br>
first name gender in the US and Britain are not insignificant.<br>
<br>
On a related note, has anyone done the reverse and used vocabulary<br>
selection to help identify the gender of the speaker, with any success?<br>
I'm sure people must have played with this idea.<br>
<br>
I'm interested in techniques to improve person gender recognition - in<br>
my experience, using pre-built lists of male and female names and simple<br>
frequency information is often not accurate enough. Again, I haven't<br>
searched extensively for this, but if anyone happens to know offhand<br>
about it I'd be interested.<br>
Diana<br>
<br>
On 04/11/2010 09:51, Adam Kilgarriff wrote:<br>
</div></div><div class="im">> Cool!<br>
><br>
> So, what is it about 3? (see<br>
><br>
<a href="http://labs.buradayiz.webfactional.com/gender/query/query?words=1+2+3+4+5+6+" target="_blank">http://labs.buradayiz.webfactional.com/gender/query/query?words=1+2+3+4+5+6+</a><br>
7+8+9)<br>
> You must have a theory<br>
><br>
> adam<br>
<br>
<br>
</div><div><div></div><div class="h5">_______________________________________________<br>
Corpora mailing list<br>
<a href="mailto:Corpora@uib.no">Corpora@uib.no</a><br>
<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>
<br>
</div></div></blockquote></div><br><br clear="all"><br>-- <br>================================================<br>Adam Kilgarriff <a href="http://www.kilgarriff.co.uk">http://www.kilgarriff.co.uk</a> <br>
Lexical Computing Ltd <a href="http://www.sketchengine.co.uk">http://www.sketchengine.co.uk</a><br>Lexicography MasterClass Ltd <a href="http://www.lexmasterclass.com">http://www.lexmasterclass.com</a><br>
Universities of Leeds and Sussex <a href="mailto:adam@lexmasterclass.com">adam@lexmasterclass.com</a><br>================================================<br>
</div>