[Corpora-List] annotation of aligned texts
Nancy Ide
ide at cs.vassar.edu
Sun Jul 21 23:41:41 UTC 2002
On Thursday, July 18, 2002, at 09:59 AM, pamela forner wrote:
> We are working with parallel texts aligned at word level and we are now
> facing the problem of encoding the alignment information. We’d like the
> annotation to be as conformant as possible to XCES standards for
> parallel texts alignment, but we only found examples at sentence level.
> Could anybody provide further information about XCES standards or is
> anybody aware of other accepted conventions for annotation of texts
> aligned at word level?
it is true that there are examples only for the sentence level in the
current (CES) documentation. However, we now have on-line (although as
yet unannounced) a suite of XCES schemas to replace the DTDs. Using
these, you can link to anything you want to--whether it is tagged (for
words, this would be with <w> tags as per the XCES doc conventions) or
not (in which case you use offset information in the xlink). Please have
a look at the new XCES schemas at http://www.xml-ces.org.
The schemas have not yet been made fully public for two reasons: (1) the
new schemas for spoken data are not as yet finalized; and (2) there are
some problems with various XML schema parsers, which are unfortunately
inconsistent in their ability to handle data encoded according to the W3
specs. This means that our use of various features is not always
accepted by a given parser, and we want to be able to make concrete
receommendations about parsers etc. before going public. However, the
XCES schemas as they exist now on the web site are reasonably robust,
and there should be no problem with "upward compatibility" once we
announce the official versions.
Please contact me or suderman at cs.vassar.edu (the schema developer) if
you have any problems with or questions about the schemas--we are
anxious to help out anyone who is using them!
Nancy Ide
=======================================================
Nancy Ide
Professor and Chair
Department of Computer Science, Vassar College
Poughkeepsie, NY 12604-0520 USA
Tel: +1 845 437-5988 Fax: +1 845 437-7498
ide at cs.vassar.edu
Chercheur Associe
Equipe Langue et Dialogue, LORIA/CNRS
Campus Scientifique - BP 239
54506 Vandoeuvre-les-Nancy FRANCE
Tel: +33 (0)3 83 59 20 47 Fax: +33 (0)3 83 41 30 79
ide at loria.fr
=======================================================
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 2638 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20020721/be636343/attachment-0001.bin>
More information about the Corpora
mailing list