Corpora: Italian corpus

Philip Resnik resnik at
Wed Feb 21 13:08:13 UTC 2001

>   I'm looking for a large corpus of Italian, and also either a smaller
>   POS-tagged corpus or a lemmatiser/POS-tagger.  I'm planning  to
>   participate in the Senseval-2 word sense disambiguation competition for
>   Italian and these are the resources that our system needs.

For taggers, have a look at

which summarizes the replies to the same query a year or two ago.  I'm
sure some things have changed, but I know that the Italian treetagger
is still available.

Regarding corpora, I've been collecting a corpus of English-Italian
pairs of translated Web pages.  I'll post to the list when I have
something to make available, which I hope will be quite soon, and the
information will be available at


  Philip Resnik, Assistant Professor
  Department of Linguistics and Institute for Advanced Computer Studies

  1401 Marie Mount Hall            UMIACS phone: (301) 405-6760
  University of Maryland           Linguistics phone: (301) 405-8903
  College Park, MD 20742 USA	   Fax   : (301) 405-7104	   E-mail: resnik at

More information about the Corpora mailing list