[Corpora-List] Converting the LDC NANTC to XML

Scott James Cederberg cederber at csli.stanford.edu
Thu Jun 12 20:16:14 UTC 2003


Hello corpora folks,

      I'm attempting to convert the LDC North American News Text
      Corpus (NANTC; LDC95T21) to XML, using the OSX tool (descended
      from James Clark's SX).

      Has anyone else done this?  One thing that stands in the way is
      that we don't have a DTD for the NANTC SGML format; does anyone
      have one?

      Any help/pointers/advice appreciated.

						Scott Cederberg
						CSLI
						Stanford University



More information about the Corpora mailing list