[Corpora-List] Brown Corpus

Steven Bird sb at ldc.upenn.edu
Tue Jun 14 19:33:14 UTC 2005


Note that this version of the Brown Corpus contains 500 files, each
consisting of around 200 lines of text on average.  Perhaps these were
as big as they could handle back in 1961.  I think it would make matters
simpler if the file structure was rationalized now, so that, e.g.:

Brown Corpus file names
Existing     -> Proposed
ca01 .. ca44 -> a
cb01 .. cb26 -> b
etc

(NB this is how things are being restructured in NLTK-Lite, a new,
steamlined version of NLTK that will be released later this month.)

-Steven Bird


On Tue, 2005-06-14 at 17:27 +0100, Lou Burnard wrote:
> By one of those uncanny coincidences, I am planning to include an
> XMLified version of the Brown corpus on the next edition of the BNC Baby
> corpus sampler. The version I have is derived from the GPLd version
> distributed as part of the LTK tool set (http://nltk.sourceforge.net)
> and includes POS tagging; there is also a version which has been
> enhanced to include Wordnet semantic tagging but I am not clear as to
> the rights in that.
>
> Lou Burnard
>
>
> Xiao, Zhonghua wrote:
> > The plain text version of Brown is available here:
> > http://dingo.sbs.arizona.edu/~hammond/ling696f-sp03/browncorpus.txt
> >
> > Richard
> > ________________________________
> >
> > From: owner-corpora at lists.uib.no on behalf of Jörg Schuster
> > Sent: Tue 14/06/2005 14:39
> > To: CORPORA at hd.uib.no
> > Subject: [Corpora-List] Brown Corpus
> >
> >
> >
> > Hello,
> >
> > where can the Brown Corpus be downloaded or purchased?
> >
> > Jörg Schuster
> >
> >
> >
> >
> >
> >
>



More information about the Corpora mailing list