[Corpora-List] Java Document Parsing for BNC

David J. Brooks D.J.Brooks at cs.bham.ac.uk
Mon Feb 23 23:15:42 UTC 2004


Dear List Members,

NOTE: By "parsing", I mean simply reading a BNC document into the machine,
not performing syntactic analysis.

Does anyone have or know of a reliable and easy set of Java libraries for
parsing British National Corpus documents?  I'm after something equivalent
to the SAX or JAXP XML parsing libraries, that follow (at least to some
extent) DOM parsing.  Ideally, I would like to be able to access all parts
of a document, not simply the words (and punctuation).

Thanks in advance,
David



More information about the Corpora mailing list