[Corpora-List] Part-of-speech tagger

Rob Freeman rjfreeman at email.com
Fri Nov 15 00:02:38 UTC 2002


Hello Afsaneh,

I was away from my mail for a few days and so missed this thread.

As others have pointed out you don't _need_ to hand tag anything at all. Of
course the tags you get at the end of the day are (selected from?) tags the
algorithm gives you, and there are issues of "which tag is the one true tag".

Personally I eschew tags altogether as a subjective and largely irrelevant
generalization of a structure which is much more complex and dynamic than any
one characterization can portray.

If you were interested you could try my "classless" parsing algorithm. You
just need a moderate amount of _untagged_ text which is indexed and then
sifted for relevant structure at parse time.

Have a look at my English demo (based on 12 million words of very raw text)
at:

http://www.chaoticlanguage.com

Cheers,

Rob Freeman

On Tuesday 12 November 2002 9:52 am, Afsaneh Fazly wrote:
> Greetings,
>
>   I need to build a part-of-speech tagger for a new language
> (for which there is no PoS-tagger available). For this, I need
> to hand-annotate a minimum amount of text. I would like to know
> how much text (minimum of course) I need to hand-tag. Also,
> for this much text, what is the reasonable size of the tagset
> used for annotation?
>
> Regards,
>
> Afsaneh



More information about the Corpora mailing list