[Corpora-List] POS-tagger maintenance and improvement
John F. Sowa
sowa at bestweb.net
Wed Feb 25 17:57:27 UTC 2009
Adam and Eric,
AK> Am I too pessimistic? Are there ways of improving language
> models other than developing bigger and better training corpora
> -- not an exercise we have the resources to invest in? Are
> there commercial taggers I should be considering (as, in the
> commercial world, there is motivation for incremental improvements
> and responding to customer feedback)?
At our company (VivoMind Intelligence, Inc.) we have been getting
good results by using a high-speed analogy engine. For some slides
that illustrate three applications, see
http://www.jfsowa.com/talks/pursue.pdf
All three of those applications processed plain text with no tagging.
The last slide of that talk has URLs of related papers.
EA> As others have commented, TreeTagger models for other languages
> are also derived from a PoS-tagged corpus, which suggest the only
> way to eradicate systematic errors is to "correct" the tagging
> in the training corpus, or perhaps to use a different corpus
> altogether.
We have obtained good results by using multiple agents, which use
different methods, data, or paradigms. Systematic errors caused
by one agent can be corrected by evidence from other agents.
For the slides of another talk that discusses that approach, see
http://www.jfsowa.com/talks/using.pdf
John Sowa
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list