Corpora: Italian corpus
Philip Resnik
resnik at umiacs.umd.edu
Wed Feb 21 13:08:13 UTC 2001
> I'm looking for a large corpus of Italian, and also either a smaller
> POS-tagged corpus or a lemmatiser/POS-tagger. I'm planning to
> participate in the Senseval-2 word sense disambiguation competition for
> Italian and these are the resources that our system needs.
For taggers, have a look at
http://www.comp.lancs.ac.uk/computing/research/ucrel/public/1610.html
which summarizes the replies to the same query a year or two ago. I'm
sure some things have changed, but I know that the Italian treetagger
is still available.
Regarding corpora, I've been collecting a corpus of English-Italian
pairs of translated Web pages. I'll post to the list when I have
something to make available, which I hope will be quite soon, and the
information will be available at http://umiacs.umd.edu/~resnik/strand.
Philip
----------------------------------------------------------------
Philip Resnik, Assistant Professor
Department of Linguistics and Institute for Advanced Computer Studies
1401 Marie Mount Hall UMIACS phone: (301) 405-6760
University of Maryland Linguistics phone: (301) 405-8903
College Park, MD 20742 USA Fax : (301) 405-7104
http://umiacs.umd.edu/~resnik E-mail: resnik at umiacs.umd.edu
More information about the Corpora
mailing list