Corpora: corpus testing
Paul Llido
pllideau at yahoo.com
Tue Oct 16 17:47:36 UTC 2001
Hello Corpus list,
I have gotten a certain volume of email messages for
my *corpus* (30,000 sentences about specific software
support). I'd like to know whether:
1. this is a workable size?
2. this size is useful for training the Brill tagger?
3. I can build a gold standard
out of it for testing?
4. the size is enough for a supervised test
and unsupervised test.
I'd also would like to ask for advice on testing.
As far as I know, one first creates a gold standard
and then batches the data into supervised and
unsupervised sections. Is this all there is to
the material preparation of the testing, that is,
excluding the statistical measures part and what I
should be testing for?
I'll post the replies...
Many thanks,
Paul Llido
=====
**********************************************************
************************************* *** Paul C Llido ***
** quae sursum sunt quaerite ****** pllideau at yahoo.com ***
**********************************************************
__________________________________________________
Do You Yahoo!?
Make a great connection at Yahoo! Personals.
http://personals.yahoo.com
More information about the Corpora
mailing list