[Corpora-List] Experience with linguistic annotation in separate files?

Fri Apr 18 03:19:57 UTC 2003

Hello,

	Does anyone out there have some experience working with
	corpora that have linguistic annotation (e.g. for part of
	speech, syntax, multiword expressions, or word senses) kept in
	files separate from the text itself?

	This is the system recommended by the CES and XCES corpus
	encoding standards, and the TEI guidelines also provide a
	mechanism for putting tags in one document that indicate links
	to another document.

	I'm trying to get my mind around how best to enable software
	to access corpus annotation in such a format.  Ideally such
	access could be provided using standard XML formats and tools,
	like XPath and XSLT.

	Any suggestions on how best to do this, pointers to software
	or APIs that work with modular annotation, etc. would be
	invaluable.

	Thanks for your help.

							Scott

--
Scott Cederberg
Researcher

Infomap Project
Computational Semantics Lab
Center for the Study of Language and Information (CSLI)
Stanford University

http://infomap.stanford.edu/