[Corpora-List] Chiniese Name Gender Recognition

Mark Lewellen lewellen at erols.com
Wed Dec 21 18:39:14 UTC 2005


Good point, Yorick.  However, the results of 
Jun's method could also reflect a situation in
which the majority of name occurrences are possible 
to analyze in this way, while a minority are not.
(I believe this to be the case.)
There are many common Chinese given names that are
reliably male/female, as well as rare or novel names
that could be guessed at.  Another zipfian distribution!
In this case, with a very long tail, since anything
can be used.  (There are wild examples in the literature,
such as names that reflected political slogans during the 
Cultural Revolution, or disgusting names meant to ward
off evil demons.)
I think that sometimes, given the great success of
statistical methods, we expect them to work magic in
every instance...however, I think this is a case of a 
problem domain that can only yield a partial solution.  
A similarly problematic domain is in identifying the 
Chinese characters of a name, given only the romanization
(when multiple romanizations of multiple Chinese
languages/dialects are considered). 
In such problem domains, it would be useful to 
present confidence measures (in human terms: "I'm sure 
this is a female name", or "Could possibly be male--no 
data to back this up--but it could be associated with
'male' traits.)

Mark

> -----Original Message-----
> From: Yorick Wilks [mailto:yorick at dcs.shef.ac.uk] 
> Sent: Wednesday, December 21, 2005 12:39 PM
> To: lewellen at erols.com
> Cc: 'Jun Lang'; 'Xiaofei Lu'; corpora at uib.no
> Subject: Re: [Corpora-List] Chiniese Name Gender Recognition
> 
> 
> If Jun's method gets 70% name-gender right, that alone 
> suggests there  
> is some real gender bias in the symbols that
> > statistical method, algorithm, or even native speaker
> could indeed model, and does!
> Yorick Wilks
> 
> 
> 
> On 21 Dec 2005, at 15:49, Mark Lewellen wrote:
> 
> > Since Chinese given names are not limited to a set of
> > lexical items that are prototypically 'names' (i.e. they
> > can be just about any lexical item), Chinese given names,
> > as you probably know, often have no clue about gender.
> > There has been some discussion on 'traits' that are
> > more feminine or masculine and would be reflected in names,
> > but there remains a lot of ambiguity.  I doubt there is any
> > statistical method, algorithm, or even native speaker that
> > can make up for that problem!
> >
> > Mark Lewellen
> >
> >
> >> -----Original Message-----
> >> From: owner-corpora at lists.uib.no
> >> [mailto:owner-corpora at lists.uib.no] On Behalf Of Jun Lang
> >> Sent: Tuesday, December 13, 2005 7:31 AM
> >> To: 'Xiaofei Lu'
> >> Cc: corpora at uib.no
> >> Subject: [Corpora-List] 答复: [Corpora-List] Chiniese Name
> >> Gender Recognition
> >>
> >>
> >> Yeah! There are many names which could be used for mail and
> >> female. It is a
> >> difficult problem. Now I have done some simple research on this  
> >> topic.
> >> Recently, I am trying to get more and more data. Since the
> >> parameter space
> >> is very huge, decision trees can not get the final result
> >> quickly. I want to
> >> use Bayes Model again.
> >>
> >> Can you give me some ideas about it?  Thanks a lot!
> >>
> >> Best wishes,
> >> Jun Lang
> >>
> >> -----邮件原件-----
> >> 发件人: Xiaofei Lu [mailto:xflu at ling.ohio-state.edu]
> >> 发送时间: 2005年12月13日 13:56
> >> 收件人: Jun Lang
> >> 主题: Re: [Corpora-List] Chiniese Name Gender Recognition
> >>
> >> Interesting. What is and how do you establish the baseline?
> >> Many names can
> >> be either male or female, can't they?
> >>
> >> On Tue, 13 Dec 2005, Jun Lang wrote:
> >>
> >>
> >>> Hi all Corpora Members,
> >>>
> >>>    Now I am studying on Chinese Name Gender Recognition.
> >>>
> >> The input is a
> >>
> >>> Chinese name. The output is the corresponding gender. I
> >>>
> >> used decision
> >> trees
> >>
> >>> method. But finally, the accuracy is only about 70%.
> >>>
> >>>    Do you know any other method which can achieve higher
> >>>
> >> accuracy? And is
> >>
> >>> there somebody has done any similar research?
> >>>
> >>>    Thanks a lot!
> >>>
> >>>
> >>>
> >>> Best wishes,
> >>>
> >>> Bill_Lang(Jun Lang): Ph.D Candidate
> >>>
> >>> Information Retrieval Laboratory
> >>>
> >>> Harbin Institute of Technology
> >>>
> >>> Mail: bill_lang at gmail.com
> >>>
> >>> Homepage: http://ir.hit.edu.cn/~bill_lang
> >>>
> >>>
> >>>
> >>
> >>
> >
> >
> >
> >
> 



More information about the Corpora mailing list