[Corpora-List] Parsed corpus file format

Michael Maxwell maxwell at umiacs.umd.edu
Thu Jul 3 16:44:59 UTC 2008


Lou Burnard wrote:
> The TEI is a good way of defining your XML schema in a way that
> leverages the work done by lots of others before you in defining
> standards based formats for such data. It comes as a suite of modules
> defining XML elements and attrbutes which can be used buffet-style to
> define a schema that is interoperable with any other TEI derived schema.
> I gave a paper at the last LREC highlghting this aspect of it, but theres
> no shortage of other information about it -- it has
> been around even longer than the wacky chaps!

Do you have a link to that paper?  I looked on your website, but couldn't
find it at a quick glance...

Also, I'd be interested in hearing more about the problems Linas Vepstas
had with using XML to represent trees, and why "triples" or sexprs seem
better.  I would have thought that XML would be ideal (apart maybe from a
bit of verboseness), and that triples would be procrustean.  But then, I
haven't had to represent syntactic trees in a long time (and I did use
sexprs then--but only because we were using a parser written in LISP, and
XML hadn't been invented yet; which shows you how long ago that was...)

   Mike Maxwell
   CASL/ U MD


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list