[Corpora-List] Request People Name Corpus (English)
Amaç Herdağdelen
amac at herdagdelen.com
Mon Jun 14 18:04:52 UTC 2010
Hello,
I wasn't aware of Kantrowitz's Names Corpus so very recently (like two
days ago), I compiled a list of common English (American) names based on
the data provided by US Census Bureau (1990 census) and Social Security
Agency (popular baby names between 1960 and 2010).
I release these datasets and the code on GitHub. I also provide another
script which helps you to guess the gender of an unknown name X by
searching for phrases like "X himself", "X herself", "X and his *", "X and
her *" via Yahoo! BOSS API and comparing the hit counts. It's a very naive
method but I find its performance quite acceptable for my purposes.
The project is here: http://github.com/amacinho/Name-Gender-Guesser
Feel free to use and improve it!
Amaç Herdağdelen
On Mon, 14 Jun 2010 19:18:27 +0200, Nathan Schneider <nathan at cmu.edu>
wrote:
> Mark Kantrowitz's Names Corpus distributed with NLTK sounds like what
> you're looking for (at least for English):
> http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml
>
> Nathan
>
> On Mon, Jun 14, 2010 at 12:53 PM, Waleed Oransa <woransa at gmail.com>
> wrote:
>> Hello all,
>> I am looking for People Name Corpus in English, categorized by gender.
>> do
>> you know of such one exists? Some web sites have such data (e.g. baby
>> names,
>> etc.) so I thought to check with you first since it needs some effort to
>> extract the names from the web beside possible copyright issue. I
>> appreciate
>> your help.
>> of course, similar parallel corpus is fine, especially English-Arabic
>> one.
>> Thank you!
>> Waleed
>> _______________________________________________
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list