[Corpora-List] XML encoding database of tagged documents

Lou lou.burnard at computing-services.oxford.ac.uk
Mon Jun 5 19:29:02 UTC 2006


I know of no application of TEI which uses more than a very small 
proportion of the 600+ elements it defines in total (probably a bit more 
than 1%, but certainly less than 10!). The point about the TEI standard 
is that it is designed to be modular and customisable, so that you can 
use it to develop interchangeable resources. If I've understood your 
intended application right, you're talking about a kind of standoff 
annotation, which would allow you to create pseudo documents consisting 
of pointers into a separate text file: this is what the <span> element 
provides (probably not <milestone>s, since they are embedded within the 
text itself. A document containing such pointers is still, I think, a 
text document, and so can be described by a suitable subset of TEI.  

However, we probably shouldn't burden readers of this list with a 
theological debate! If you'd like to send me a sample of the kind of 
thing you have in mind, I'd be glad to make more concrete suggestions 
off list.

Another XML based standard you might consider in this context is topic 
maps which perform a similar kind of annotation function.

best wishes

Lou

Normand Peladeau wrote:

> Well!  TEI is a great standard but is much more that what I need. 
> Maybe 99% of what they propose would not be very useful for the kind 
> of application I am trying to do.
>
> I don't need to keep information about the text structure or about 
> linguistic or typographic features. The only element that I need to 
> keep inside the documents are user defined codes attached to text 
> segments. Those codes can be overlapping (the "milestone" element 
> proposed by TEI may offer a solution for this, but I'm not entirely 
> sure it handles all the situations pretty well, so some tests will be 
> needed). As for comments, they are not attached to the document itself 
> but to the user defined codes, so I'm not sure they are equivalent to 
> TEI <note> element.
>
> I have some clients in the market research industry and in legal firms 
> who are doing manual annotations of documents in databases and are not 
> at all interested in the kind of information normally provided by a 
> TEI compliant document.  What I am looking for is a more basic set of 
> XML standards that are used to import and export database containing 
> documents (but also numercial data, dates, etc.) and where the only 
> relevant elements in the documents are the user defined codes attached 
> to text segments (sometimes overlapping).
>
> Normand
>
>
>
>
>



More information about the Corpora mailing list