<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=iso-8859-1" http-equiv="Content-Type">
<style>
pre {
white-space: pre-wrap; /* css-3 */
white-space: -moz-pre-wrap; /* Mozilla, since 1999 */
white-space: -pre-wrap; /* Opera 4-6 */
white-space: -o-pre-wrap; /* Opera 7 */
word-wrap: break-word; /* Internet Explorer 5.5+ */
}
body {
font-family: verdana;
font-size: 13px;
color: #192228;
}
</style>
</head>
<body bgcolor="#ffffff" text="#000000">
<div>
<div>Hi,</div><div><br /></div><div>I am in the process of compiling a series of domain corpora, and once the present text gathering phase is completed, of course i need to store the texts somehow. The texts need to be annotated with e.g. parts of spech and posssibly phrase boundaries for term extraction purposes. </div><div><br /></div><div>My questions are: Would it be wiser to store the texts as XML or in a relational database format?</div><div>Does a generally accepted corpus annotation XML-schema exist? And do tools for annotation of and search in such files exists?</div><div>How do you store your corpora?</div><div><br /></div><div>Any thoughts or ideas regarding the questions are very welcome :)</div><div><br /></div><div>Best,</div><div>Tine Lassen</div><div>Copenhagen Business School</div>
</div>
</body>
</html>