[Corpora-List] sports-related resources / corpora
Adrien Barbaresi
adrien.barbaresi at ens-lyon.fr
Thu Sep 5 20:12:08 UTC 2013
Hi William,
I wrote a specialized crawler and corpus builder for the French sport
newspaper L'Équipe. The proof of concept is open source, it is available
here: https://code.google.com/p/equipe-crawler/
Its purpose is to enable others to make their own version of the corpus,
as crawling the website is not explicitly forbidden by the right-holders.
As I have built a corpus using this tool, I could easily produce
derivates such as n-grams lists if you are interested.
Regards
--
Adrien Barbaresi
http://perso.ens-lyon.fr/adrien.barbaresi/
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list