AW: [Corpora-List] Current state of the art of POS tagging/evaluation?

Barbara Plank barbara at ilanga.net
Fri May 4 07:21:28 UTC 2007


Hi Orion,

Ratnaparkhi's MXPOST tagger:
http://www.cogsci.ed.ac.uk/~jamesc/taggers/MXPOST.html

Barbara

> 
> We are looking to evaluate POS-taggers for English, to establish which
> to use for future tagging of the Oxford English Corpus.  Taggers we
> are aware of, and hope to evaluate, include
> 
> 	CLAWS
> 	RASP
> 	EngCG
> 	Connexor
> 	TreeTagger
> 	Brill
> 
> We would appreciate pointers for any of the following:
> 
> 	* other taggers that we should consider
> 	* papers describing comparative evaluation exercises
> 	* data to use as 'gold standard': we are aware of the BNC
> sampler and the Penn TreeBank, though we are also aware of the roles
> these datasets have played as training and development material, for
> various taggers.  The OEC is web-sourced and covers a wide range of
> text types so ideally we shall evaluate it on a dataset like that.
> 
> Since tagger performance, for many taggers, depends on the quality and
> volume of training text, we'd also appreciate pointers on how that can
> be brought in to the evaluation, to give us a good idea of which
> tagger will perform best on our dataset.
> 
> I would be particularly pleased to find a top-quality tagger with
> freely modifiable source code.
> 
> Many thanks; offlist replies will be summarized, but on-list replies
> may prove interesting.
> 
> --
> Orion Montoya
> Data & Development Editor
> Dictionaries
> Oxford University Press
> 



More information about the Corpora mailing list