Corpora: parser recommendation

Mon Jan 22 16:39:07 UTC 2001

Hello Tamy and list members,

 In the various reference lists, it is likely that some of the
 taggers were generated by training. For example, Brill's tagger
 train on Wall-Street Journal. If your material is from a different
 domain and you are going to evaluate the results, performance could
 be less than the reported figures. Therefore, you might like to
 try several taggers. Also refer further to the discussion on POS
 tagger evaluation on this mailing list, Early 2000 as far as I remember.

 In general, I think it would be in favour of the statistical NLP
 community if we could gather somewhere evaluations of statistical
 tools on domains and genres different from those they were trained on.

 Such information can be a basis for further research on 'flexibility'
 of trained systems, let alone serve as a guide for people looking
 for a system for a particular need.

   Yuval Krymolowski
   Bar-Ilan University
   Israel