[Corpora-List] Experience with linguistic annotation in separate files?
Scott James Cederberg
cederber at csli.Stanford.EDU
Fri Apr 18 03:19:57 UTC 2003
Hello,
Does anyone out there have some experience working with
corpora that have linguistic annotation (e.g. for part of
speech, syntax, multiword expressions, or word senses) kept in
files separate from the text itself?
This is the system recommended by the CES and XCES corpus
encoding standards, and the TEI guidelines also provide a
mechanism for putting tags in one document that indicate links
to another document.
I'm trying to get my mind around how best to enable software
to access corpus annotation in such a format. Ideally such
access could be provided using standard XML formats and tools,
like XPath and XSLT.
Any suggestions on how best to do this, pointers to software
or APIs that work with modular annotation, etc. would be
invaluable.
Thanks for your help.
Scott
--
Scott Cederberg
Researcher
Infomap Project
Computational Semantics Lab
Center for the Study of Language and Information (CSLI)
Stanford University
http://infomap.stanford.edu/
More information about the Corpora
mailing list