[Corpora-List] text categorisation - newspaper

Tony Rose tr at acl.icnet.uk
Wed Jun 18 08:54:12 UTC 2003


> We would like to find information about other projects concerning the
> categorization of newspaper text -- in particular, we are
> interested in
> the topic sets that have been used in similar projects. For
> example, if
> somebody has the list of topics used in the AP text cat
> collection, and
> could send us a copy, that would be extremely useful.

The Reuters Corpus comes complete with code sets for topics, industries and
geography, and is freely available from:
http://about.reuters.com/researchandstandards/corpus/

> More in general, we would be grateful for any sort of
> advice/information
> that seems relevant (e.g., pointers to other text cat work on Italian,
> etc.)

And you can find further details of the coding scheme, the
categorisation/coding process, inter-coder consistency, etc. from here:
http://about.reuters.com/researchandstandards/corpus/LREC_camera_ready.pdf

Cheers,
Tony



More information about the Corpora mailing list