Corpora: Text Classification System

Bruce L. Lambert, Ph.D. lambertb at uic.edu
Thu Jan 17 17:45:53 UTC 2002


Simple Google search turned up:

http://www-a2k.is.tokushima-u.ac.jp/member/kita/NLP/nlp_tools.html

At 05:27 PM 1/17/02 +0000, Gabriela Cavaglia wrote:
>Dear List members,
>
>Can anyone point me to a free Text Classification system?
>(More details of what I want it for below.)
>
>Thank you in advance for any help
>
>Gabriela Cavaglia`
>Phd Student
>ITRI
>
>Measuring Corpus homogeneity
>=====================================================
>
>My thesis project is to measure corpus homogeneity.  As part of that
>project, I have developed methods for unsupervised classification of
>documents based on text internal evidence.  I now want a supervised
>classification system which I can use to evaluate the unsupervised
>classification I have developed.
>
>To date, the corpus I used for the experiments is made of 107
>documents from the BNC (about 2 million words). The idea is to use the
>BNC Index information and part of the corpus documents to produce a
>training sample and use the rest of the corpus documents as a test
>corpus. I would like to compare the results of the unsupervised
>classification againt those from the supervised classification.
>=====================================================



More information about the Corpora mailing list