[Corpora-List] POS-tagger maintenance and improvement

Jimmy O'Regan joregan at gmail.com
Thu Feb 26 21:40:16 UTC 2009


2009/2/26 Linas Vepstas <linasvepstas at gmail.com>:
> BTW, I am *very* interested in automatically learning
> new disjuncts (link-grammar rules) via corpus statistics
> -- I think this is an excellent line of research, PhD level,
> for this parser, or any other NLP system, POS tagger, etc.

Marcin Miłkowski (LanguageTool) has a blog post about using Wikipedia
edits as a corpus of errors:
http://morfologik.blogspot.com/2007/01/wikipedia-history-diff-as-revision.html

He has done more work since then towards automating rule construction;
it might be worth your while getting in contact with him.

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list