Corpora: Text Classification System
Gabriela Cavaglia
Gabriela.Cavaglia at itri.brighton.ac.uk
Thu Jan 17 17:27:20 UTC 2002
Dear List members,
Can anyone point me to a free Text Classification system?
(More details of what I want it for below.)
Thank you in advance for any help
Gabriela Cavaglia`
Phd Student
ITRI
Measuring Corpus homogeneity
=====================================================
My thesis project is to measure corpus homogeneity. As part of that
project, I have developed methods for unsupervised classification of
documents based on text internal evidence. I now want a supervised
classification system which I can use to evaluate the unsupervised
classification I have developed.
To date, the corpus I used for the experiments is made of 107
documents from the BNC (about 2 million words). The idea is to use the
BNC Index information and part of the corpus documents to produce a
training sample and use the rest of the corpus documents as a test
corpus. I would like to compare the results of the unsupervised
classification againt those from the supervised classification.
=====================================================
More information about the Corpora
mailing list