[Corpora-List] Is the TEI a waste of time? / Lack of TEI software tools

Oliver Mason O.Mason at bham.ac.uk
Fri Jul 4 11:20:58 UTC 2003


I would guess that most software uses internal, non-XML formats, as they
are generally easier to process from a programmer's point of view and
more efficient computationally; and if you've got large corpora time and
space efficiency are quite important.  My own approach has always been
that TEI-style markup is fine for exchanging data, but when it is being
indexed and prepared for processing it'll be converted into some
tool-specific form.

So, yes, the TEI is important, as it means that there is a standard for
the data that's coming in, even though corpus processing software will
typically not operate directly on that.  Corpus tools should accept TEI
marked-up data, but might convert it into their own format.

Oliver

PS Of course I'm not denying that it is possible to write concordancing
   software that works with XML data.



More information about the Corpora mailing list