[Corpora-List] Gender dataset
Amaç Herdağdelen
amac at herdagdelen.com
Fri Apr 13 15:02:44 UTC 2012
Hi Kiran,
I compiled the 1990 Census data and US Social Security Administration's
statistics for popular baby names for every year between 1960 and 2010
together:
https://github.com/amacinho/Name-Gender-Guesser
In this repository, there are also some simple Python scripts which may
help you to get started. If you want an evaluation of the name-based
heuristics you can have a look at this manuscript:
http://clic.cimec.unitn.it/amac/twitter_ngram/Herdagdelen2012-RTC-draft.pdf
(Section 3, page 8).
There is also an older Perl module:
http://search.cpan.org/~edaly/Text-GenderFromName-0.32/GenderFromName.pm
by Jon Orwant and Eamon Daly, which has an option for fuzzy search --
based on phonetic similarity of the names, I believe.
Best,
Amaç
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list