[Corpora-List] Looking for Igbo, Hausa, and Yoruba Corpora
Fink, Clayton R.
finkcr1 at jhuapl.edu
Sat Feb 25 19:31:37 UTC 2012
There's a BBC Hausa service and a Yoruba-language Wikipedia, so there
are some possibilities for those languages. Igbo seems to be a real
problem, though, in terms of finding text corpora.
I'm interested, mostly, in training up language id models that I can use
on names. I have some small corpora of first names and surnames scraped
off of the Web, but it might be interesting to have some larger corpora
to work from.
Thanks,
Clay
--
Clay Fink
Senior Software Engineer
The Johns Hopkins University Applied Physics Laboratory
240-228-4220
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list