[Corpora-List] POS-tagger maintenance and improvement

John F. Sowa sowa at bestweb.net
Wed Feb 25 17:57:27 UTC 2009


Adam and Eric,

AK> Am I too pessimistic?  Are there ways of improving language
 > models other than developing bigger and better training corpora
 > -- not an exercise we have the resources to invest in?  Are
 > there commercial taggers I should be considering (as, in the
 > commercial world, there is motivation for incremental improvements
 > and responding to customer feedback)?

At our company (VivoMind Intelligence, Inc.) we have been getting
good results by using a high-speed analogy engine.  For some slides
that illustrate three applications, see

    http://www.jfsowa.com/talks/pursue.pdf

All three of those applications processed plain text with no tagging.
The last slide of that talk has URLs of related papers.

EA> As others have commented, TreeTagger models for other languages
 > are also derived from a PoS-tagged corpus, which suggest the only
 > way to eradicate systematic errors is to "correct" the tagging
 > in the training corpus, or perhaps to use a different corpus
 > altogether.

We have obtained good results by using multiple agents, which use
different methods, data, or paradigms.  Systematic errors caused
by one agent can be corrected by evidence from other agents.

For the slides of another talk that discusses that approach, see

    http://www.jfsowa.com/talks/using.pdf

John Sowa


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list