[Corpora-List] Is the TEI a waste of time? / Lack of TEI software tools

Sylvain Loiseau liste_linguistique at toucheraveclesyeux.com
Sat Jul 5 09:38:34 UTC 2003


> I would guess that most software uses internal, non-XML formats, as they
> are generally easier to process from a programmer's point of view and
> more efficient computationally; and if you've got large corpora time and
> space efficiency are quite important.  My own approach has always been
> that TEI-style markup is fine for exchanging data, but when it is being
> indexed and prepared for processing it'll be converted into some
> tool-specific form.

But I think that many people use the TEI as a ready-to-use way of encoding
data, without developing DTD or format, and to exploit them, thanks to XSLT
or other tools that doesn't require development. Standardisation of
software and format is costly in CPU time but they increase the capacity of
exploiting corpora without strong development skill, which is perhaps an
important bottle neck of the development of the interest in TEI in
linguistics fields (This facility of use were the explicit aim of XSLT, if
I remember correctly). If the TEI is a waste of time for many people it is
perhaps due to this lack of tool.

A simple framework allowing to just plug and run easily SAX handlers for
processing tasks, as a concordancer for instance (and conversion), would be
of general interest I think, and tools write in that way are more
reusable and quicker to write.

With best regards,
Sylvain Loiseau



More information about the Corpora mailing list