[Corpora-List] Current state of the art of POS tagging/evaluation?

Wed May 9 13:14:23 UTC 2007

Thank you, Orion, for inspiring Corpora contributors to give us pointers
to a wide range of taggers:

CLAWS http://www.comp.lancs.ac.uk/ucrel/claws/

RASP http://www.informatics.susx.ac.uk/research/nlp/rasp/

EngCG http://www.ling.helsinki.fi/~avoutila/cg/

Connexor http://www.connexor.com/software/tagger/

TreeTagger http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/

Brill tagger http://www.cs.jhu.edu/~brill/

FreeLing http://www.lsi.upc.edu/~nlp/freeling/

TnT http://www.coli.uni-saarland.de/~thorsten/tnt/

Ratnaparkhi's MXPOST tagger
http://www.cogsci.ed.ac.uk/~jamesc/taggers/MXPOST.html

Hans van Halterenâ€™s tagger hvh at let.ru.nl

Acopost HMM-tagger  http://acopost.sourceforge.net/

MTB Memory Based Tagger http://ilk.uvt.nl/mbt/

There are yet more taggers, eg

Qtag http://www.english.bham.ac.uk/staff/omason/software/qtag.html

Xerox HMM tagger
http://www.xrce.xerox.com/competencies/content-analysis/fsnlp/tagger.en.html

... and also  adapted  versions of some of the above, eg our Amalgam-tagger
was just the result of re-training Brillâ€™s tagger with a range of rival
corpora to produce taggers to apply alternative tagsets used in Brown /
ICE / London-Lund / LOB / PoW / SEC / UPenn corpora ...

... and others have extended or reimplemented the software, e.g.

Mu-TBL  generalised transformation-based learning taggers
http://www.ling.gu.se/~lager/

GPoSTTL http://www.imsc.res.in/~golam/gposttl/

If you want open-source, extensible taggers with documentation and even
tutorial, try NLTK:

NLTK Python versions of regexp, n-gram, backoff, Brill, HMM taggers
http://nltk.sourceforge.net/

As an evaluation dataset, I suggest you use a "multicorpus", a
collection of parts of existing tagged corpora, so you wont be biassed
in favour of one specific corpus (whcih may have been used to train a
specific tagger).
However, before you download and try out all these taggers, I suggest
you consider what tag-set you want to use, what criteria you are looking
for in choosing a tag-set. Accuracy scores for rival taggers are at
least partly dependent on the tag-set; for example, our amalgam-taggers
all used the same Brill algorithm but scored a range of accuracies for
different tag-sets, from 91% to 97%. You also want to consider what the
purpose of tagging is: for dictionary building, a basic minimal tag-set
may be sufficient, but for detailed grammatical studies of, say,
world-wide variation in adverb usages, you may need subtler
subcategorisations.

For some suggestions of criteria to consider in choosing your tagset,
see

Atwell, E. Development of tag sets for part-of-speech tagging in: Anke
Ludeling & Merja Kyto (editors) Corpus Linguistics: An International
Handbook Mouton de Gruyter. 2007.

Atwell, E; Demetriou, G; Hughes, J; Schriffin, A; Souter, C; Wilcock, S.
A comparative evaluation of modern English corpus grammatical annotation
schemes. ICAME Journal, vol. 24, pp. 7-23. 2000

Atwell, Eric; Demetriou, G; Hughes, J; Souter, C; Wilcock, S. Comparing
linguistic interpretation schemes for English corpora in: Thorsten
Brants (editors) Proceedings of COLING LINC-2000 Workshop on
Linguistically Interpreted Corpora, pp. 1-10. 2000.

Preprints of these and other papers are downloadable from
http://www.comp.leeds.ac.uk/eric/publications.html

Eric Atwell, 
Senior Lecturer, Language research group leader, School of Computing 
Faculty of Engineering, UNIVERSITY OF LEEDS, Leeds LS2 9JT, England
TEL: 0113-3435430  FAX: 0113-3435468  WWW/email: google Eric Atwell

On Thu, 3 May 2007, Orion Buckminster Montoya wrote:

> We are looking to evaluate POS-taggers for English, to establish which
> to use for future tagging of the Oxford English Corpus.  Taggers we
> are aware of, and hope to evaluate, include
>
> 	CLAWS
> 	RASP
> 	EngCG
> 	Connexor
> 	TreeTagger
> 	Brill
>
> We would appreciate pointers for any of the following:
>
> 	* other taggers that we should consider
> 	* papers describing comparative evaluation exercises
> 	* data to use as 'gold standard': we are aware of the BNC
> sampler and the Penn TreeBank, though we are also aware of the roles
> these datasets have played as training and development material, for
> various taggers.  The OEC is web-sourced and covers a wide range of
> text types so ideally we shall evaluate it on a dataset like that.
>
> Since tagger performance, for many taggers, depends on the quality and
> volume of training text, we'd also appreciate pointers on how that can
> be brought in to the evaluation, to give us a good idea of which
> tagger will perform best on our dataset.
>
> I would be particularly pleased to find a top-quality tagger with
> freely modifiable source code.
>
> Many thanks; offlist replies will be summarized, but on-list replies
> may prove interesting.
>
> --
> Orion Montoya
> Data & Development Editor
> Dictionaries
> Oxford University Press
>
>

-- 
Eric Atwell,
Senior Lecturer, Language research group, School of Computing
Faculty of Engineering, UNIVERSITY OF LEEDS, Leeds LS2 9JT, England
TEL: 0113-3435430  FAX: 0113-3435468  WWW/email: google Eric Atwell