[Corpora-List] For students: "CL in Action"

Adam Kilgarriff adam at lexmasterclass.com
Fri Nov 5 05:59:19 UTC 2010


George,

> ... male ... coordinating conjunctions and contracted forms of the
definite article
> ... female ...  personal pronouns (i.e. us, we etc).

wonderful! Completely in line with Biber 1988, the main dimension is
interactional vs informational (with def article as strong indicator for
'nouniness', informational), and - surprise surprise (see eg Deborah
Tannen's "You just don't understand") - women are more interactional, men
more informational (or, in Tannen's terminology, women do more rapport, men
more report). Like English, like Greek.

Pronouns are the litmus paper of text type

Adam

On 4 November 2010 23:00, Georgios Mikros <gmikros at isll.uoa.gr> wrote:

> Dear Diana,
> I recently tried to predict the gender of 14 authors (7 males and 7
> females)
> in a newspaper corpus of Modern Greek. I used a variety of stylometric
> variables including 100 most frequent words, word length, letter
> frequencies
> and lexical "richness" indices like Yule's K, Lexical Density, Text entropy
> etc. The classifier employed in the research was an artificial neural
> network (multilayer perceptron) using the above variables as input and the
> author gender as an output. 10-fold cross validation results report a
> precision value of 0.85. The most "useful" category of variables in gender
> discrimination was the most frequent words. It is interesting that among
> the
> words that predict male gender were many coordinating conjunctions and
> contracted forms of the definite article. On the other hand "female" words
> contained many personal pronouns (i.e. us, we etc). I just finished the
> analysis and I don't have anything written yet. However, if you are
> interested in this I could send you a copy when I wrote the full paper.
> Best
> George Mikros
> University of Athens
> Greece
>
> -----Original Message-----
> From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
> Diana Maynard
> Sent: Thursday, November 04, 2010 12:16 PM
> To: Adam Kilgarriff
> Cc: corpora at uib.no
> Subject: Re: [Corpora-List] For students: "CL in Action"
>
> I wondered that as well.
>
> On another note, I guess the success of it depends critically on at
> least two things:
> (1) how good the gender guesser is (I didn't see any statistics on that,
> but I didn't search extensively).
>
> (2) (which is related) - the proportion of American names in the twitter
> corpus (since I think the guesser used is based solely on American first
> names) - and this could have some impact. Even the differences between
> first name gender in the US and Britain are not insignificant.
>
> On a related note, has anyone done the reverse and used vocabulary
> selection to help identify the gender of the speaker, with any success?
> I'm sure people must have played with this idea.
>
> I'm interested in techniques to improve person gender recognition - in
> my experience, using pre-built lists of male and female names and simple
> frequency information is often not accurate enough. Again, I haven't
> searched extensively for this, but if anyone happens to know offhand
> about it I'd be interested.
> Diana
>
> On 04/11/2010 09:51, Adam Kilgarriff wrote:
> > Cool!
> >
> > So, what is it about 3?  (see
> >
>
> http://labs.buradayiz.webfactional.com/gender/query/query?words=1+2+3+4+5+6+
> 7+8+9)
> >   You must have a theory
> >
> > adam
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>


-- 
================================================
Adam Kilgarriff
http://www.kilgarriff.co.uk
Lexical Computing Ltd                   http://www.sketchengine.co.uk
Lexicography MasterClass Ltd      http://www.lexmasterclass.com
Universities of Leeds and Sussex       adam at lexmasterclass.com
================================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20101105/cb8dca7a/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list