[Corpora-List] Chiniese Name Gender Recognition

Xiaofei Lu xflu at ling.ohio-state.edu
Thu Dec 22 18:45:29 UTC 2005


Are you planning to look at context at all? The pronoun resolution idea 
should definitely help. Plus, looking at the context in which a personal 
name appears may help a bit, too, e.g., in cases where one or more names 
appears after things like "member(s) of the women's team", etc.

Xiaofei


On Thu, 22 Dec 2005, Heng Ji wrote:

>
> I believe your IR idea will boost the performance. Besides, you may want to 
> try applying pronoun reference resolution before gender disambiguation.
> Since Chinese person pronouns are distinguished clearly based on genders. If 
> you could link the pronoun in the context with the name candidate, that
> might help. In addition a few gender-specific title words in the context 
> would be useful too.
>
> I would guess only using lexical information can accurately recognize name 
> genders for people born before 1980; but might not be enough for those
> names appearing later - many names have been given intentionally
> gender-insensitive.:) So you may want to incorporate the time frame 
> information in your system.
>
> Heng
>
> On Thu, 22 Dec 2005, Jun Lang wrote:
>
>> Hi Mark Lewellen,
>> 	Thanks for your concerning about this problem.
>> 	Yes. After doing some baseline research, I found there were many
>> related problems about the gender recognition based on Chinese Name. May be 
>> using only Name could not achieve better result. I am considering
>> combining some other resource for disambiguation the gender. For example, I 
>> could use some search engine for some gender designing word to enhance the 
>> final accuracy.
>> 	How do you think about it?
>> 
>> Thanks!
>> 
>> May you nice Christmas Eve and Day!
>> 
>> Best wishes,
>> Bill_Lang(Jun Lang): Ph.D Candidate
>> Information Retrieval Laboratory
>> Harbin Institute of Technology
>> Mail: bill_lang at gmail.com
>> Homepage: http://ir.hit.edu.cn/~bill_lang
>> 
>> 
>> -----Original Message-----
>> From: Mark Lewellen [mailto:lewellen at erols.com]
>> Sent: Wednesday, December 21, 2005 11:49 PM
>> To: 'Jun Lang'; 'Xiaofei Lu'
>> Cc: corpora at uib.no
>> Subject: RE: [Corpora-List] Chiniese Name Gender Recognition
>> 
>> Since Chinese given names are not limited to a set of
>> lexical items that are prototypically 'names' (i.e. they
>> can be just about any lexical item), Chinese given names,
>> as you probably know, often have no clue about gender.
>> There has been some discussion on 'traits' that are
>> more feminine or masculine and would be reflected in names,
>> but there remains a lot of ambiguity.  I doubt there is any
>> statistical method, algorithm, or even native speaker that
>> can make up for that problem!
>> 
>> Mark Lewellen
>> 
>>> -----Original Message-----
>>> From: owner-corpora at lists.uib.no
>>> [mailto:owner-corpora at lists.uib.no] On Behalf Of Jun Lang
>>> Sent: Tuesday, December 13, 2005 7:31 AM
>>> To: 'Xiaofei Lu'
>>> Cc: corpora at uib.no
>>> Subject: [Corpora-List] ´ð¸´: [Corpora-List] Chiniese Name
>>> Gender Recognition
>>> 
>>> 
>>> Yeah! There are many names which could be used for mail and
>>> female. It is a
>>> difficult problem. Now I have done some simple research on this topic.
>>> Recently, I am trying to get more and more data. Since the
>>> parameter space
>>> is very huge, decision trees can not get the final result
>>> quickly. I want to
>>> use Bayes Model again.
>>> 
>>> Can you give me some ideas about it?  Thanks a lot!
>>> 
>>> Best wishes,
>>> Jun Lang
>>> 
>>> -----ÓʼþÔ­¼þ-----
>>> ·¢¼þÈË: Xiaofei Lu [mailto:xflu at ling.ohio-state.edu]
>>> ·¢ËÍʱ¼ä: 2005Äê12ÔÂ13ÈÕ 13:56
>>> ÊÕ¼þÈË: Jun Lang
>>> Ö÷Ìâ: Re: [Corpora-List] Chiniese Name Gender Recognition
>>> 
>>> Interesting. What is and how do you establish the baseline?
>>> Many names can
>>> be either male or female, can't they?
>>> 
>>> On Tue, 13 Dec 2005, Jun Lang wrote:
>>> 
>>>> Hi all Corpora Members,
>>>>
>>>>    Now I am studying on Chinese Name Gender Recognition.
>>> The input is a
>>>> Chinese name. The output is the corresponding gender. I
>>> used decision
>>> trees
>>>> method. But finally, the accuracy is only about 70%.
>>>>
>>>>    Do you know any other method which can achieve higher
>>> accuracy? And is
>>>> there somebody has done any similar research?
>>>>
>>>>    Thanks a lot!
>>>> 
>>>> 
>>>> 
>>>> Best wishes,
>>>> 
>>>> Bill_Lang(Jun Lang): Ph.D Candidate
>>>> 
>>>> Information Retrieval Laboratory
>>>> 
>>>> Harbin Institute of Technology
>>>> 
>>>> Mail: bill_lang at gmail.com
>>>> 
>>>> Homepage: http://ir.hit.edu.cn/~bill_lang
>>>> 
>>>> 
>>> 
>> 
>> 
>> 
>> 
>


More information about the Corpora mailing list