[Corpora-List] Looking for Igbo, Hausa, and Yoruba Corpora

Fink, Clayton R. finkcr1 at jhuapl.edu
Sat Feb 25 19:31:37 UTC 2012


There's a BBC Hausa service and a Yoruba-language Wikipedia, so there 
are some possibilities for those languages. Igbo seems to be a real 
problem, though, in terms of finding text corpora.

I'm interested, mostly, in training up language id models that I can use 
on names. I have some small corpora of first names and surnames scraped 
off of the Web, but it might be interesting to have some larger corpora 
to work from.

Thanks,

Clay

-- 
Clay Fink
Senior Software Engineer
The Johns Hopkins University Applied Physics Laboratory

240-228-4220


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list