[Corpora-List] POS-tagger maintenance and improvement
Eckhard Bick
eckhard.bick at mail.dk
Wed Feb 25 12:06:55 UTC 2009
Hello,
This is an interesting observation.
Maybe one explanation for the lack of response to user-feedback is that
it is much harder to make incremental changes to probabilistic /
machine-learned systems than to rule-based ones. If a corpus user
identifies systematic errors this can - in a rule-based parser - be used
to remove errors or add rules, or introduce new lexical sets and
categories, while in an ML-system this would have to be done by paying
somebody to annotate the changes into a treebank, which is, as you say,
unlikely.
Though my view is probably biased, I think this might be an example of
the side-effects of using trained systems for corpus work rather than
rule-based ones (like AGFL or CG, to name a couple).
Best regards,
Eckhard Bick
Adam Kilgarriff wrote:
> All,
>
> My lexicography colleagues and I use POS-tagged corpora all the time,
> every day, and very frequently spot systematic errors. (This is for a
> range of languages, but particularly English.) We would dearly like
> to be in a dialogue with the developers of the POS-tagger and/or the
> relevant language models so the tagger+model could be improved in
> response to our feedback. (We have been using standard models rather
> than training our own.) However it seems, for the taggers and
> language models we use (mainly TreeTagger, also CLAWS) and also for
> other market leaders, all of which seem to be from Universities, the
> developers have little motivation for continuing the improvement of
> their tagger, since incremental improvements do not make for good
> research papers, so there is nowhere for our feedback to go, nor any
> real prospect of these taggers/models improving.
>
> Am I too pessimistic? Are there ways of improving language models
> other than developing bigger and better training corpora - not an
> exercise we have the resources to invest in? Are there commercial
> taggers I should be considering (as, in the commercial world, there is
> motivation for incremental improvements and responding to customer
> feedback)?
> Responses and ideas most welcome
>
> Adam Kilgarriff
> --
> ================================================
> Adam Kilgarriff
> http://www.kilgarriff.co.uk
> Lexical Computing Ltd http://www.sketchengine.co.uk
> Lexicography MasterClass Ltd http://www.lexmasterclass.com
> Universities of Leeds and Sussex adam at lexmasterclass.com
> <mailto:adam at lexmasterclass.com>
> ================================================
> ------------------------------------------------------------------------
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
--
Eckhard Bick,
cand.med., dr.phil.
University of Southern Denmark
e-mail: eckhard.bick at mail.dk
web: http://beta.visl.sdu.dk
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list