<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=iso-8859-1" http-equiv="Content-Type">

  <style>

     pre { 

  white-space: pre-wrap;       /* css-3 */

  white-space: -moz-pre-wrap;  /* Mozilla, since 1999 */

  white-space: -pre-wrap;      /* Opera 4-6 */

  white-space: -o-pre-wrap;    /* Opera 7 */

  word-wrap: break-word;       /* Internet Explorer 5.5+ */

}


body {

        font-family: verdana;

        font-size: 13px;

        color: #192228;

}


  </style>

</head>

<body bgcolor="#ffffff" text="#000000">


<div>

<div>Hi,</div><div><br /></div><div>I am in the process of compiling a series of domain corpora, and once the present text gathering phase is completed, of course i need to store the texts somehow. The texts need to be annotated with e.g. parts of spech and posssibly phrase boundaries for term extraction purposes. </div><div><br /></div><div>My questions are: Would it be wiser to store the texts as XML or in a relational database format?</div><div>Does a generally accepted corpus annotation XML-schema exist? And do tools for annotation of and search in such files exists?</div><div>How do you store your corpora?</div><div><br /></div><div>Any thoughts or ideas regarding the questions are very welcome :)</div><div><br /></div><div>Best,</div><div>Tine Lassen</div><div>Copenhagen Business School</div>

</div>


</body>

</html>