[Corpora-List] BNC raw text

Grzegorz Chrupała pitekus at gmail.com
Thu Sep 22 09:25:03 UTC 2005


Hi Robert,
Why don't you just extract the plain text from the marked up files? It
should be pretty trivial if  you use some SGML library.
Best,
--
Grzegorz Chrupała ♦ pithekos.net

On 22/09/05, Robert Rittman <robert.rittman at gmail.com> wrote:
> I am working with the British National Corpus - World Edition CD-ROM. The CD
> does not contain the raw text of the 4,000+ documents. It only contains
> tagged text in SGML format (including metadata). Does anyone know where I
> can obtain the raw (untagged) text in plain text format?
>  Thank you,
>  Robert Rittman
> PhD Candidate
> School of Communication, Information and Library Studies
> Rutgers University
>
>
>


More information about the Corpora mailing list