[Corpora-List] POS-tagger maintenance and improvement
Jana Diesner
janadiesner at gmx.net
Wed Feb 25 16:26:46 UTC 2009
Dear Adam,
We did a systematic study on the impact of various variables (the technical
decisions that one has to make when implementing a POS tagger) on POS
tagging accuracy.
The report might provide some more detailed information on possible error
sources, respective loss or gain of accuracy, and addresses difficulties in
doing an error analysis with systematic rigor.
URL for the report:
http://reports-archive.adm.cs.cmu.edu/anon/isr2008/CMU-ISR-08-131R.pdf
Best regards, Jana
Jana Diesner
Carnegie Mellon University
School of Computer Science
Center for Computational Analysis of Social and Organizational Systems
Web: http://www.andrew.cmu.edu/user/jdiesner/
From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
Adam Kilgarriff
Sent: Wednesday, February 25, 2009 6:16 AM
To: Corpora List
Cc: Sue Atkins; Valerie GRUNDY; Patrick Hanks
Subject: [Corpora-List] POS-tagger maintenance and improvement
All,
My lexicography colleagues and I use POS-tagged corpora all the time, every
day, and very frequently spot systematic errors. (This is for a range of
languages, but particularly English.) We would dearly like to be in a
dialogue with the developers of the POS-tagger and/or the relevant language
models so the tagger+model could be improved in response to our feedback.
(We have been using standard models rather than training our own.) However
it seems, for the taggers and language models we use (mainly TreeTagger,
also CLAWS) and also for other market leaders, all of which seem to be from
Universities, the developers have little motivation for continuing the
improvement of their tagger, since incremental improvements do not make for
good research papers, so there is nowhere for our feedback to go, nor any
real prospect of these taggers/models improving.
Am I too pessimistic? Are there ways of improving language models other
than developing bigger and better training corpora - not an exercise we have
the resources to invest in? Are there commercial taggers I should be
considering (as, in the commercial world, there is motivation for
incremental improvements and responding to customer feedback)?
Responses and ideas most welcome
Adam Kilgarriff
--
================================================
Adam Kilgarriff
http://www.kilgarriff.co.uk
Lexical Computing Ltd http://www.sketchengine.co.uk
Lexicography MasterClass Ltd http://www.lexmasterclass.com
Universities of Leeds and Sussex adam at lexmasterclass.com
================================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20090225/0a21383f/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list