[Corpora-List] XML annotation guidelines

Simpson, Rita ritacsim at umich.edu
Fri Jun 6 13:35:15 UTC 2003


> Dear Corporist Colleagues,
> 
> We are in the process of converting our corpus of transcribed
> academic speech from SGML to XML, and adding additional annotation.
> Can anyone point us to some standards or (preferably) precedents 
> for XML-ized annotation of:
> 
> 1) POS tagging
> and
> 2) pragmatic markup (e.g., text segments manually identified as 'narrative',
> 'disagreement', 'request', etc.)
> 
> Within the TEI guidelines (P4), we've found some suggestions for the POS
> tagging, (but nothing yet for something like our pragmatic categories), e.g.
> 
> <s type="sentence">
>    <w ana="at">The</w>
>    <w ana="nn1">victim</w>
>    <m ana="gen">'s</m>
>    <w ana="nn2">friends</w>
> ...
> </s>
> 
> But somehow this seems a bit more verbose than it needs to be.
> Is this format standard, or are there other XML-style annotation
> formats in use?
> 
> Thanks much for any leads. We'd especially appreciate getting 
> pointers to specific sections of the TEI guidelines that we may be
> overlooking, or references to any user-friendly documentation
> (other than the TEI) -- the XCES seems to be lacking in this 
> respect at present.
> 
> Sincerely,
> 
> Rita Simpson & the MICASE team
> English Language Institute
> University of Michigan
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20030606/debdcb91/attachment.htm>


More information about the Corpora mailing list