[Corpora-List] Is the TEI a waste of time? / Lack of TEI software tools

Burnard Towers lou.burnard at computing-services.oxford.ac.uk
Sat Jul 5 22:00:13 UTC 2003


Is there any software anywhere which *doesn't* operate on some sort of
internal non-Xml format?
Is there / has there ever been any software anywhere which *didnt* convert
an external form into something more compact and efficient before processing
it?

 XML is a serialization of a tree structure. It's hard to imagine software
which wouldn't store and process such structures non-serially! XML uses UTF8
or UTF16 to encode character data. It's highly probable that any efficient
software would pack such character data  into shorter representations.

So what?



> -----Original Message-----
> From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no]On
> Behalf Of Oliver Mason
> Sent: 04 July 2003 12:21
> To: corpora at uib.no
> Subject: [Corpora-List] Is the TEI a waste of time? / Lack of TEI
> software tools
>
>
> I would guess that most software uses internal, non-XML formats, as they
> are generally easier to process from a programmer's point of view and
> more efficient computationally; and if you've got large corpora time and
> space efficiency are quite important.  My own approach has always been
> that TEI-style markup is fine for exchanging data, but when it is being
> indexed and prepared for processing it'll be converted into some
> tool-specific form.
>
> So, yes, the TEI is important, as it means that there is a standard for
> the data that's coming in, even though corpus processing software will
> typically not operate directly on that.  Corpus tools should accept TEI
> marked-up data, but might convert it into their own format.
>
> Oliver
>
> PS Of course I'm not denying that it is possible to write concordancing
>    software that works with XML data.
>
>
>
>
>



More information about the Corpora mailing list