[Corpora-List] POS-tagger maintenance and improvement

Jana Diesner janadiesner at gmx.net
Wed Feb 25 16:26:46 UTC 2009


Dear Adam,

We did a systematic study on the impact of various variables (the technical
decisions that one has to make when implementing a POS tagger) on POS
tagging accuracy. 

The report might provide some more detailed information on possible error
sources, respective loss or gain of accuracy, and addresses difficulties in
doing an error analysis with systematic rigor.

URL for the report:
http://reports-archive.adm.cs.cmu.edu/anon/isr2008/CMU-ISR-08-131R.pdf

Best regards, Jana

 

Jana Diesner

Carnegie Mellon University

School of Computer Science

Center for Computational Analysis of Social and Organizational Systems

Web: http://www.andrew.cmu.edu/user/jdiesner/

 

From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
Adam Kilgarriff
Sent: Wednesday, February 25, 2009 6:16 AM
To: Corpora List
Cc: Sue Atkins; Valerie GRUNDY; Patrick Hanks
Subject: [Corpora-List] POS-tagger maintenance and improvement

 

All,

 

My lexicography colleagues and I use POS-tagged corpora all the time, every
day, and very frequently spot systematic errors.  (This is for a range of
languages, but particularly English.)   We would dearly like to be in a
dialogue with the developers of the POS-tagger and/or the relevant language
models so the tagger+model could be improved in response to our feedback.
(We have been using standard models rather than training our own.)   However
it seems, for the taggers and language models we use (mainly TreeTagger,
also CLAWS) and also for other market leaders, all of which seem to be from
Universities, the developers have little motivation for continuing the
improvement of their tagger, since incremental improvements do not make for
good research papers, so there is nowhere for our feedback to go, nor any
real prospect of these taggers/models improving.

 

Am I too pessimistic?  Are there ways of improving language models other
than developing bigger and better training corpora - not an exercise we have
the resources to invest in?  Are there commercial taggers I should be
considering (as, in the commercial world, there is motivation for
incremental improvements and responding to customer feedback)?


Responses and ideas most welcome

 

Adam Kilgarriff
-- 
================================================
Adam Kilgarriff
http://www.kilgarriff.co.uk              
Lexical Computing Ltd                   http://www.sketchengine.co.uk
Lexicography MasterClass Ltd      http://www.lexmasterclass.com
Universities of Leeds and Sussex       adam at lexmasterclass.com
================================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20090225/0a21383f/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list