[Corpora-List] Parsed corpus file format
Michael Maxwell
maxwell at umiacs.umd.edu
Thu Jul 3 16:44:59 UTC 2008
Lou Burnard wrote:
> The TEI is a good way of defining your XML schema in a way that
> leverages the work done by lots of others before you in defining
> standards based formats for such data. It comes as a suite of modules
> defining XML elements and attrbutes which can be used buffet-style to
> define a schema that is interoperable with any other TEI derived schema.
> I gave a paper at the last LREC highlghting this aspect of it, but theres
> no shortage of other information about it -- it has
> been around even longer than the wacky chaps!
Do you have a link to that paper? I looked on your website, but couldn't
find it at a quick glance...
Also, I'd be interested in hearing more about the problems Linas Vepstas
had with using XML to represent trees, and why "triples" or sexprs seem
better. I would have thought that XML would be ideal (apart maybe from a
bit of verboseness), and that triples would be procrustean. But then, I
haven't had to represent syntactic trees in a long time (and I did use
sexprs then--but only because we were using a parser written in LISP, and
XML hadn't been invented yet; which shows you how long ago that was...)
Mike Maxwell
CASL/ U MD
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list