Corpora: Italian corpus

Philip Resnik resnik at umiacs.umd.edu
Wed Feb 21 13:08:13 UTC 2001


>   I'm looking for a large corpus of Italian, and also either a smaller
>   POS-tagged corpus or a lemmatiser/POS-tagger.  I'm planning  to
>   participate in the Senseval-2 word sense disambiguation competition for
>   Italian and these are the resources that our system needs.

For taggers, have a look at

  http://www.comp.lancs.ac.uk/computing/research/ucrel/public/1610.html

which summarizes the replies to the same query a year or two ago.  I'm
sure some things have changed, but I know that the Italian treetagger
is still available.

Regarding corpora, I've been collecting a corpus of English-Italian
pairs of translated Web pages.  I'll post to the list when I have
something to make available, which I hope will be quite soon, and the
information will be available at http://umiacs.umd.edu/~resnik/strand.

  Philip

  ----------------------------------------------------------------
  Philip Resnik, Assistant Professor
  Department of Linguistics and Institute for Advanced Computer Studies

  1401 Marie Mount Hall            UMIACS phone: (301) 405-6760
  University of Maryland           Linguistics phone: (301) 405-8903
  College Park, MD 20742 USA	   Fax   : (301) 405-7104
  http://umiacs.umd.edu/~resnik	   E-mail: resnik at umiacs.umd.edu



More information about the Corpora mailing list