[Corpora-List] BNC raw text
Grzegorz Chrupała
pitekus at gmail.com
Thu Sep 22 09:25:03 UTC 2005
Hi Robert,
Why don't you just extract the plain text from the marked up files? It
should be pretty trivial if you use some SGML library.
Best,
--
Grzegorz Chrupała ♦ pithekos.net
On 22/09/05, Robert Rittman <robert.rittman at gmail.com> wrote:
> I am working with the British National Corpus - World Edition CD-ROM. The CD
> does not contain the raw text of the 4,000+ documents. It only contains
> tagged text in SGML format (including metadata). Does anyone know where I
> can obtain the raw (untagged) text in plain text format?
> Thank you,
> Robert Rittman
> PhD Candidate
> School of Communication, Information and Library Studies
> Rutgers University
>
>
>
More information about the Corpora
mailing list