AW: [Corpora-List] Current state of the art of POS tagging/evaluation?
    Barbara Plank 
    barbara at ilanga.net
       
    Fri May  4 07:21:28 UTC 2007
    
    
  
Hi Orion,
Ratnaparkhi's MXPOST tagger:
http://www.cogsci.ed.ac.uk/~jamesc/taggers/MXPOST.html
Barbara
> 
> We are looking to evaluate POS-taggers for English, to establish which
> to use for future tagging of the Oxford English Corpus.  Taggers we
> are aware of, and hope to evaluate, include
> 
> 	CLAWS
> 	RASP
> 	EngCG
> 	Connexor
> 	TreeTagger
> 	Brill
> 
> We would appreciate pointers for any of the following:
> 
> 	* other taggers that we should consider
> 	* papers describing comparative evaluation exercises
> 	* data to use as 'gold standard': we are aware of the BNC
> sampler and the Penn TreeBank, though we are also aware of the roles
> these datasets have played as training and development material, for
> various taggers.  The OEC is web-sourced and covers a wide range of
> text types so ideally we shall evaluate it on a dataset like that.
> 
> Since tagger performance, for many taggers, depends on the quality and
> volume of training text, we'd also appreciate pointers on how that can
> be brought in to the evaluation, to give us a good idea of which
> tagger will perform best on our dataset.
> 
> I would be particularly pleased to find a top-quality tagger with
> freely modifiable source code.
> 
> Many thanks; offlist replies will be summarized, but on-list replies
> may prove interesting.
> 
> --
> Orion Montoya
> Data & Development Editor
> Dictionaries
> Oxford University Press
> 
    
    
More information about the Corpora
mailing list