[Corpora-List] Part-of-speech tagger

Miles Osborne miles at inf.ed.ac.uk
Tue Nov 12 12:26:03 UTC 2002


Quoting Afsaneh Fazly <afsaneh at cs.toronto.edu>:

>
> Greetings,
>
>   I need to build a part-of-speech tagger for a new language
> (for which there is no PoS-tagger available). For this, I need
> to hand-annotate a minimum amount of text. I would like to know
> how much text (minimum of course) I need to hand-tag. Also,
> for this much text, what is the reasonable size of the tagset
> used for annotation?
>
> Regards,
>
> Afsaneh
>
>
>

this is a question about the sample complexity of POS tagging.  citeseer is
overloaded right now, but this link

http://www.cs.washington.edu/research/jair/volume11/argamon99a.ps

for

Shlomo Argamon-Engelson and Ido Dagan (1999) Committee-Based Sample Selection
for Probabilistic Classifiers, in Journal of Artificial Intelligence Research,1999

is a good place to look.

also, at this year's CoNLL, there was a paper on creating a POS tagger in a
single day:

http://ilk.kub.nl/~signll/conll02/

Miles



More information about the Corpora mailing list