[Corpora-List] Is the TEI a waste of time?
geoffrey.williams
geoffrey.williams at wanadoo.fr
Fri Jun 27 09:04:21 UTC 2003
Whilst trying to observe and not react for a while so as to cut my email writing time, I cannot but reply to this string with an unequivocal, no, TEI is definitely not a waste of time, but a cornerstone of corpus linguistics.
It is obvious that for the small corpus designer, one million or less tokens, markup is a considerable investment in time. However, if one holds, as I do, that a corpus is not a simple mass of data, but a carefully compiled selection of texts, then we need a means to treat them as texts, to store both their general features and their particularities. This the TEI does.
In my own work in the field of English for Academic Purposes, I tend not to use the corpus header but a standard individual header so as to stock all the bibliographic information and socilinguistic parameters associated with the text. The depth of markup depends on my needs, and time, for an individual text. In this way I can move with ease from a fully annotated single text to a more lightly marked up corpus. This is possible because of the encoding possibilities of the TEI.
Education is very much part of the answer. Easy access to vast amounts of downloadable data has meant that a number of "corpus linguists" neither know nor care about the niceties of corpus creation, and the whys and wherefores of selecting and marking up data. Ease of access has become the main criterion, potentially to the detriment of the discipline itself. Easy solutions do not necessarily answer the most pertinent questions.
It is true that all this takes time, but if we throw out all that is time-consuming drudgery from corpus linguistics, we may find that we have thrown out our text baby with the corpus bathwater and are only left with ready-made corpora for ready-made answers.
Back to some time consuming markup.
Geoffrey
***********************************************************
Dr. Geoffrey C. Williams,
Département Langues Etrangères Appliquées
U.F.R. Lettres et Sciences Humaines
4, rue Jean Zay
B.P. 92116
56321 LORIENT Cedex
FRANCE
tél : 33 (0) 2 97 87 29 68
fax : 33 (0) 2 97 87 29 70
email : Geoffrey.Williams at univ-ubs.fr
http://www.univ-ubs.fr/crellic
***************************************************
----- Original Message -----
From: "Mcenery, Tony" <eiaamme at exchange.lancs.ac.uk>
To: "Simpson, Rita" <ritacsim at umich.edu>; "Christopher Brewster" <C.Brewster at dcs.shef.ac.uk>; <corpora at uib.no>
Sent: Thursday, June 26, 2003 4:54 PM
Subject: RE: [Corpora-List] Is the TEI a waste of time?
Dear Rita,
Yes, I have some sympathy with the point you make. The thing that has attracted me to the TEI in the past, though, is once the effort is made to get to grips with it (and it is daunting) there is usually a well thought through solution contained in it for almost any problem situation you come across in encoding a corpus! With that said, it is a clear theme of the posts so far that there is, at the very least, an advocacy issue related to the TEI in corpus linguistics, which is interesting.
Best,
T
-----Original Message-----
From: Simpson, Rita [mailto:ritacsim at umich.edu]
Sent: 26 June 2003 14:09
To: Christopher Brewster; corpora at uib.no
Subject: RE: [Corpora-List] Is the TEI a waste of time?
Interesting question...
> There are two issues here:
> 1. Ignorance and confusion. Most people have only a vague
> idea what TEI
> is or does or what it is good for. There would need to be a effort to
> (re-) educate the potential users of TEI. Does TEI do something
> different from XML? Absurd question I know but that is the kind of
> confusion which I suspect exists.
>
> 2. Complexity. When it was introduced many people reacted
> against it as
> too complex. Now they have all adopted xml, rdf etc. which
> are much more
> complicated to use. So potential users' perception would now
> be ripe for
> a re-presentation of TEI.
>
Related to both of these issues is that of the documentation available
to educate people & help potential users understand what TEI is, does,
& is good for. A research assistant & I have recently been poring over a
couple chapters of the TEI guidelines, looking for guidelines & relevant
examples to add some markup to our already (mostly) TEI-conformant
corpus
markup scheme. Although the documentation is extensive, it is inadequate
in many ways, missing examples, not very good at giving a larger picture
to people who aren't sure if they need/want the TEI at all or who just
need
some pointers to a few relevant sections. If the only people who can
read the documentation and make use of it are information/library
science
people who are specifically trained in that area, then it's no wonder
linguists & others who are in the business of building corpora are not
using it or promoting it.
Rita Simpson
------------------------------------------------------------------------
-
Project Director, Michigan Corpus of Academic Spoken English (MICASE)
English Language Institute
University of Michigan
------------------------------------------------------------------------
-
More information about the Corpora
mailing list