Corpora: Text Classification System

Gabriela Cavaglia Gabriela.Cavaglia at itri.brighton.ac.uk
Thu Jan 17 17:27:20 UTC 2002


Dear List members,

Can anyone point me to a free Text Classification system?
(More details of what I want it for below.)

Thank you in advance for any help

Gabriela Cavaglia`
Phd Student
ITRI

Measuring Corpus homogeneity
=====================================================

My thesis project is to measure corpus homogeneity.  As part of that
project, I have developed methods for unsupervised classification of
documents based on text internal evidence.  I now want a supervised
classification system which I can use to evaluate the unsupervised
classification I have developed.

To date, the corpus I used for the experiments is made of 107
documents from the BNC (about 2 million words). The idea is to use the
BNC Index information and part of the corpus documents to produce a
training sample and use the rest of the corpus documents as a test
corpus. I would like to compare the results of the unsupervised
classification againt those from the supervised classification.
=====================================================



More information about the Corpora mailing list