[Corpora-List] POS-tagger maintenance and improvement

Adam Kilgarriff adam at lexmasterclass.com
Wed Feb 25 11:15:35 UTC 2009


All,

My lexicography colleagues and I use POS-tagged corpora all the time, every
day, and very frequently spot systematic errors.  (This is for a range of
languages, but particularly English.)   We would dearly like to be in a
dialogue with the developers of the POS-tagger and/or the relevant language
models so the tagger+model could be improved in response to our
feedback. (We have been using standard models rather than training our
own.)   However it seems, for the taggers and language models we use (mainly
TreeTagger, also CLAWS) and also for other market leaders, all of which seem
to be from Universities, the developers have little motivation for
continuing the improvement of their tagger, since
incremental improvements do not make for good research papers, so there is
nowhere for our feedback to go, nor any real prospect of these
taggers/models improving.

Am I too pessimistic?  Are there ways of improving language models other
than developing bigger and better training corpora - not an exercise we have
the resources to invest in?  Are there commercial taggers I should be
considering (as, in the commercial world, there is motivation for
incremental improvements and responding to customer feedback)?
Responses and ideas most welcome

Adam Kilgarriff
-- 
================================================
Adam Kilgarriff
http://www.kilgarriff.co.uk
Lexical Computing Ltd                   http://www.sketchengine.co.uk
Lexicography MasterClass Ltd      http://www.lexmasterclass.com
Universities of Leeds and Sussex       adam at lexmasterclass.com
================================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20090225/825cfb86/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list