[Corpora-List] Gender dataset

Amaç Herdağdelen amac at herdagdelen.com
Fri Apr 13 15:02:44 UTC 2012


Hi Kiran,

I compiled the 1990 Census data and US Social Security Administration's  
statistics for popular baby names for every year between 1960 and 2010  
together:

https://github.com/amacinho/Name-Gender-Guesser

In this repository, there are also some simple Python scripts which may  
help you to get started. If you want an evaluation of the name-based  
heuristics you can have a look at this manuscript:

http://clic.cimec.unitn.it/amac/twitter_ngram/Herdagdelen2012-RTC-draft.pdf  
(Section 3, page 8).

There is also an older Perl module:

http://search.cpan.org/~edaly/Text-GenderFromName-0.32/GenderFromName.pm

by Jon Orwant and  Eamon Daly, which has an option for fuzzy search --  
based on phonetic similarity of the names, I believe.

Best,

Amaç

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list