Corpora: corpus testing

Paul Llido pllideau at yahoo.com
Tue Oct 16 17:47:36 UTC 2001


Hello Corpus list,

I have gotten a certain volume of email messages for
my *corpus* (30,000 sentences about specific software
support). I'd like to know whether:

1. this is a workable size?
2. this size is useful for training the Brill tagger?
3. I can build a gold standard
   out of it for testing?
4. the size is enough for a supervised test
   and unsupervised test.

I'd also would like to ask for advice on testing.
As far as I know, one first creates a gold standard
and then batches the data into supervised and
unsupervised sections. Is this all there is to
the material preparation of the testing, that is,
excluding the statistical measures part and what I
should be testing for?

I'll post the replies...

Many thanks,
Paul Llido

=====
**********************************************************
************************************* *** Paul C Llido ***
** quae sursum sunt quaerite ****** pllideau at yahoo.com ***
**********************************************************

__________________________________________________
Do You Yahoo!?
Make a great connection at Yahoo! Personals.
http://personals.yahoo.com



More information about the Corpora mailing list