[Corpora-List] Part-of-speech tagger
Miles Osborne
miles at inf.ed.ac.uk
Tue Nov 12 12:26:03 UTC 2002
Quoting Afsaneh Fazly <afsaneh at cs.toronto.edu>:
>
> Greetings,
>
> I need to build a part-of-speech tagger for a new language
> (for which there is no PoS-tagger available). For this, I need
> to hand-annotate a minimum amount of text. I would like to know
> how much text (minimum of course) I need to hand-tag. Also,
> for this much text, what is the reasonable size of the tagset
> used for annotation?
>
> Regards,
>
> Afsaneh
>
>
>
this is a question about the sample complexity of POS tagging. citeseer is
overloaded right now, but this link
http://www.cs.washington.edu/research/jair/volume11/argamon99a.ps
for
Shlomo Argamon-Engelson and Ido Dagan (1999) Committee-Based Sample Selection
for Probabilistic Classifiers, in Journal of Artificial Intelligence Research,1999
is a good place to look.
also, at this year's CoNLL, there was a paper on creating a POS tagger in a
single day:
http://ilk.kub.nl/~signll/conll02/
Miles
More information about the Corpora
mailing list