[Corpora-List] Analysing Reuters Corpus Using Wordsmith Version 3

Ute Römer ute.roemer at uni-koeln.de
Fri Jun 11 16:51:40 UTC 2004


Tony,

> > Btw, have you (or anyone else) done a proper word count of the
> > corpus? (the
> > RC distributors told me they hadn't) -- Using MP2.2 would of course be a
> > solution to that problem since it does a word count whenever you load a
> > corpus anyway.
>
> FYI you can find lots more statistics on the corpus at:
>
> http://about.reuters.com/researchandstandards/corpus/statistics/index.asp

Yes, I've seen the statistics on the Reuters pages, thanks. You offer a lot
of diagrams on interesting features like distribution of stories across days
or POS distribution, but unfortunately there is no word/token count of the
entire corpus (or maybe I missed that information). Maybe somebody else has
done such a word count?

Best... Ute



More information about the Corpora mailing list