[Corpora-List] Is the TEI a waste of time?

Marco Baroni baroni at sslmit.unibo.it
Fri Jun 27 12:47:23 UTC 2003


> TEI's role, in a world with XML in
> it, is much harder to delineate

But nowadays TEI is a form of XML, right? The way I see it, it's an XML
language to represent linguistic data, like MathML is an XML language
to represent mathematical formulas. So, ideally, if you have
TEI-encoded data, you should be able to use any general purpose XML
tool on them... right?

Imho, TEI's usefulness (as with any standard) depends on how successful
it is, on how many people use it.

If everybody used TEI, then we would not have to spend time worrying
about the format of our input and output data, data exchange would be a
trivial issue, and one could write all sort of TEI-aware tools knowing
that they will be useful to many people. In this TEI-conformant world,
the time you spend TEI-encoding the data would definitely be
well-spent, since you would save a lot of time later when dealing with
other people's data, and you would get access to all sort of useful
tools that can immediately understand your data. (And TEI seems to be
flexible enough that a minimal TEILite-encoding does not look like sooo
much work...)

Obviously, this is not the current situation, and in the real world the
presence of TEI-encoding can be a (minor) hassle, since many tools you
may want to use (pos taggers, morphological analyzers, machine learning
packages, databases, command-line programs, your own scripts) are not
TEI-compatible, and TEI is not the easiest format to deal with (as
compared to, eg, tab-delimited text...)

I suppose that the best way for people in favor of TEI to convince
others to adopt the standard would be to provide all sorts of cool
TEI-conformant tools: programs helping (manual and automated)
TEI-encoding, programs that perform all sorts of linguistic and
statistical analyses of TEI-encoded data, indexers and fast searching
engines for TEI-encoded corpora, TEI-db's, input/output conversion
tools...

Sara/Xara seems to be an excellent example of this sort of tool, but,
as far as I know, it only runs under Windows and it is more of a
self-contained program that something one could use in combination with
other tools...

Regards,

Marco

---
Marco Baroni
SSLMIT, University of Bologna
http://sslmit.unibo.it/~baroni



More information about the Corpora mailing list