[Corpora-List] POS-tagger maintenance and improvement

Helmut Schmid schmid at ims.uni-stuttgart.de
Thu Feb 26 08:06:45 UTC 2009


Hi Adam,

as the developer of the TreeTagger, I would like to emphasize that I am 
still maintaining this software and that any feedback and  suggestions 
for improvements are highly welcome! I am also very interested in 
collaborations for training the TreeTagger on new languages.

Best regards,
  Helmut Schmid

Adam Kilgarriff schrieb:
> All,
>  
> My lexicography colleagues and I use POS-tagged corpora all the time, 
> every day, and very frequently spot systematic errors.  (This is for a 
> range of languages, but particularly English.)   We would dearly like 
> to be in a dialogue with the developers of the POS-tagger and/or the 
> relevant language models so the tagger+model could be improved in 
> response to our feedback. (We have been using standard models rather 
> than training our own.)   However it seems, for the taggers and 
> language models we use (mainly TreeTagger, also CLAWS) and also for 
> other market leaders, all of which seem to be from Universities, the 
> developers have little motivation for continuing the improvement of 
> their tagger, since incremental improvements do not make for good 
> research papers, so there is nowhere for our feedback to go, nor any 
> real prospect of these taggers/models improving.
>  
> Am I too pessimistic?  Are there ways of improving language models 
> other than developing bigger and better training corpora - not an 
> exercise we have the resources to invest in?  Are there commercial 
> taggers I should be considering (as, in the commercial world, there is 
> motivation for incremental improvements and responding to customer 
> feedback)?
> Responses and ideas most welcome
>  
> Adam Kilgarriff
> -- 
> ================================================
> Adam Kilgarriff                                     
>  http://www.kilgarriff.co.uk              
> Lexical Computing Ltd                   http://www.sketchengine.co.uk
> Lexicography MasterClass Ltd      http://www.lexmasterclass.com
> Universities of Leeds and Sussex       adam at lexmasterclass.com 
> <mailto:adam at lexmasterclass.com>
> ================================================
> ------------------------------------------------------------------------
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>   


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list