AW: [Corpora-List] Current state of the art of POS tagging/evaluation?
Barbara Plank
barbara at ilanga.net
Fri May 4 07:21:28 UTC 2007
Hi Orion,
Ratnaparkhi's MXPOST tagger:
http://www.cogsci.ed.ac.uk/~jamesc/taggers/MXPOST.html
Barbara
>
> We are looking to evaluate POS-taggers for English, to establish which
> to use for future tagging of the Oxford English Corpus. Taggers we
> are aware of, and hope to evaluate, include
>
> CLAWS
> RASP
> EngCG
> Connexor
> TreeTagger
> Brill
>
> We would appreciate pointers for any of the following:
>
> * other taggers that we should consider
> * papers describing comparative evaluation exercises
> * data to use as 'gold standard': we are aware of the BNC
> sampler and the Penn TreeBank, though we are also aware of the roles
> these datasets have played as training and development material, for
> various taggers. The OEC is web-sourced and covers a wide range of
> text types so ideally we shall evaluate it on a dataset like that.
>
> Since tagger performance, for many taggers, depends on the quality and
> volume of training text, we'd also appreciate pointers on how that can
> be brought in to the evaluation, to give us a good idea of which
> tagger will perform best on our dataset.
>
> I would be particularly pleased to find a top-quality tagger with
> freely modifiable source code.
>
> Many thanks; offlist replies will be summarized, but on-list replies
> may prove interesting.
>
> --
> Orion Montoya
> Data & Development Editor
> Dictionaries
> Oxford University Press
>
More information about the Corpora
mailing list