[Corpora-List] Teaching corpora for romance languages

Carlos Rodriguez crodriguezp at gmail.com
Tue Apr 26 14:25:20 UTC 2005


Hi all,
I am trying to coordinate compilation, adaptation and licencing of
various language resources (corpora, treebanks, ontologies) for
non-commercial use in teaching computational linguistics and Natural
Language Processing programming techniques in Romance languages, using
the Natural Language ToolKit (NLTK, at http://nltk.sf.net, is a
Python-based plattform that already provides with its processing
modules, for didactic purposes, sample data for English from the Brown
corpus, the Penn treebank, among other sources ). We will soon have
available some Spanish and Catalan datasets, interfases and tutorial
translations, but will be great to have also Portuguese, French,
Italian, and so on. There is a gap in these teaching resources for
languages other than English, and this initiative can help fill it.
If anyone is interested in providing and licensing corpora and other
resources (formatted in internationally and scientifically-accepted
standards), please contact me at CRodriguezP at gmail.com.

Thanks,

Carlos Rodríguez
-----------------
IIMAS-National Autonomous University (Mexico)



More information about the Corpora mailing list