[Corpora-List] Is the TEI a waste of time?

Elisabeth Burr elisabeth.burr at uni-duisburg.de
Fri Jun 27 13:30:26 UTC 2003


I agree with every word, Geoffrey Williams says, and I see the same dangers 
for corpus
linguistics.

Best
Elisabeth Burr

At 11:04 27.06.2003 +0200, you wrote:
>Whilst trying to observe and not react for a while so as to cut my email 
>writing time, I cannot but reply to this string with an unequivocal, no, 
>TEI is definitely not a waste of time, but a cornerstone of corpus linguistics.
>
>It is obvious that for the small corpus designer, one million or less 
>tokens, markup is a considerable investment in time. However, if one 
>holds, as I do, that a corpus is not a simple mass of data, but a 
>carefully compiled selection of texts, then we need a means to treat them 
>as texts, to store both their general features and their particularities. 
>This the TEI does.
>
>In my own work in the field of English for Academic Purposes, I tend not 
>to use the corpus header but a standard individual header so as to stock 
>all the bibliographic information and socilinguistic parameters associated 
>with the text. The depth of markup depends on my needs, and time, for an 
>individual text. In this way I can move with ease from a fully annotated 
>single text to a more lightly marked up corpus. This is possible because 
>of the encoding possibilities of the TEI.
>
>Education is very much part of the answer. Easy access to vast amounts of 
>downloadable data has meant that a number of "corpus linguists" neither 
>know nor care about the niceties of corpus creation, and the whys and 
>wherefores of selecting and marking up data. Ease of access has become the 
>main criterion, potentially to the detriment of the discipline itself. 
>Easy solutions do not necessarily answer the most pertinent questions.
>
>It is true that all this takes time, but if we throw out all that is 
>time-consuming drudgery from corpus linguistics, we may find that we have 
>thrown out our text baby with the corpus bathwater and are only left with 
>ready-made corpora for ready-made answers.
>
>Back to some time consuming markup.
>
>Geoffrey
>
>***********************************************************
>
>Dr. Geoffrey C. Williams,
>Département Langues Etrangères Appliquées
>U.F.R. Lettres et Sciences Humaines
>4, rue Jean Zay
>B.P. 92116
>56321 LORIENT Cedex
>FRANCE
>
>tél : 33 (0) 2 97 87 29 68
>fax : 33 (0) 2 97 87 29 70
>
>email : Geoffrey.Williams at univ-ubs.fr
>
>http://www.univ-ubs.fr/crellic
>
>***************************************************
>
>
>----- Original Message -----
>From: "Mcenery, Tony" <eiaamme at exchange.lancs.ac.uk>
>To: "Simpson, Rita" <ritacsim at umich.edu>; "Christopher Brewster" 
><C.Brewster at dcs.shef.ac.uk>; <corpora at uib.no>
>Sent: Thursday, June 26, 2003 4:54 PM
>Subject: RE: [Corpora-List] Is the TEI a waste of time?
>
>
>Dear Rita,
>
>Yes, I have some sympathy with the point you make. The thing that has 
>attracted me to the TEI in the past, though, is once the effort is made to 
>get to grips with it (and it is daunting) there is usually a well thought 
>through solution contained in it for almost any problem situation you come 
>across in encoding a corpus! With that said, it is a clear theme of the 
>posts so far that there is, at the very least, an advocacy issue related 
>to the TEI in corpus linguistics, which is interesting.
>
>Best,
>
>T
>
>-----Original Message-----
>From: Simpson, Rita [mailto:ritacsim at umich.edu]
>Sent: 26 June 2003 14:09
>To: Christopher Brewster; corpora at uib.no
>Subject: RE: [Corpora-List] Is the TEI a waste of time?
>
>
>Interesting question...
>
> > There are two issues here:
> > 1. Ignorance and confusion. Most people have only a vague
> > idea what TEI
> > is or does or what it is good for. There would need to be a effort to
> > (re-) educate the potential users of TEI. Does TEI do something
> > different from XML? Absurd question I know but that is the kind of
> > confusion which I suspect exists.
> >
> > 2. Complexity. When it was introduced many people reacted
> > against it as
> > too complex. Now they have all adopted xml, rdf etc. which
> > are much more
> > complicated to use. So potential users' perception would now
> > be ripe for
> > a re-presentation of TEI.
> >
>
>Related to both of these issues is that of the documentation available
>to educate people & help potential users understand what TEI is, does,
>& is good for. A research assistant & I have recently been poring over a
>
>couple chapters of the TEI guidelines, looking for guidelines & relevant
>examples to add some markup to our already (mostly) TEI-conformant
>corpus
>markup scheme. Although the documentation is extensive, it is inadequate
>
>in many ways, missing examples, not very good at giving a larger picture
>to people who aren't sure if they need/want the TEI at all or who just
>need
>some pointers to a few relevant sections. If the only people who can
>read the documentation and make use of it are information/library
>science
>people who are specifically trained in that area, then it's no wonder
>linguists & others who are in the business of building corpora are not
>using it or promoting it.
>
>Rita Simpson
>
>------------------------------------------------------------------------
>-
>Project Director, Michigan Corpus of Academic Spoken English (MICASE)
>English Language Institute
>University of Michigan
>------------------------------------------------------------------------
>-

HD Dr. Elisabeth Burr
Romanistik
Institut für Fremdsprachliche Philologien
Fakultät 2: Geisteswissenschaften
Universität Duisburg-Essen
Standort Duisburg
Geibelstr. 41
47048 Duisburg

http://www.uni-duisburg.de/Fak2/FremdPhil/Romanistik/Personal/Burr/



More information about the Corpora mailing list