[Corpora-List] sports-related resources / corpora

Adrien Barbaresi adrien.barbaresi at ens-lyon.fr
Thu Sep 5 20:12:08 UTC 2013


Hi William,

I wrote a specialized crawler and corpus builder for the French sport
newspaper L'Équipe. The proof of concept is open source, it is available
here: https://code.google.com/p/equipe-crawler/
Its purpose is to enable others to make their own version of the corpus,
as crawling the website is not explicitly forbidden by the right-holders.

As I have built a corpus using this tool, I could easily produce
derivates such as n-grams lists if you are interested.

Regards

-- 
Adrien Barbaresi
http://perso.ens-lyon.fr/adrien.barbaresi/

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list