<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

<style type="text/css" style="display:none"><!-- p { margin-top: 0px; margin-bottom: 0px; } @font-face { font-family: SimSun; } @font-face { font-family: Calibri; } @font-face { font-family: Tahoma; } p.MsoNormal, li.MsoNormal, div.MsoNormal { margin: 0cm 0cm 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif; } a:link, span.MsoHyperlink { color: blue; text-decoration: underline; } a:visited, span.MsoHyperlinkFollowed { color: purple; text-decoration: underline; } span.EmailStyle17 { font-family: Calibri, sans-serif; color: rgb(31, 73, 125); } .MsoChpDefault { font-family: Calibri, sans-serif; } @page WordSection1 { margin: 70.85pt; }--></style>

</head>

<body dir="ltr" style="font-size:10pt;color:#000000;background-color:#FFFFFF;font-family:Tahoma,Geneva,sans-serif;">

<p>Marc Brysbaert wrote:<br>

</p>

<p><br>

</p>

<p>>> <span style="color: rgb(31, 73, 125); font-family: Calibri, sans-serif; font-size: 11pt;">For what it is worth, in my experience word frequency lists and N-gram lists are not a problem. </span></p>

<div><span style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 125);"><br>

</span></div>

<div><span style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 125);">I agree. I've distributed COCA/COHA word frequency (http://www.wordfrequency.info) and n-grams (http://www.ngrams.info) data for several years now, and I've never

 had any issues.</span></div>

<p><br>

</p>

<p>>> <span style="color: rgb(31, 73, 125); font-family: Calibri, sans-serif; font-size: 15px; background-color: rgb(255, 255, 255);">The big problem we are encountering is that currently there is no guidance about whether corpora can be shared. As a result,

 nearly all corpora assembled remain next to inaccessible, meaning that everyone has to collect their own corpus. This is a lot of needless work and also means that little cumulative work can be done.</span><br>

</p>

<p><br>

</p>

<p>I've also been distributing "full-text" data from 450 million word COCA and the 1.9 billion word GloWbE (http://corpus.byu.edu/glowbe) for a while now, and again no problems to this point. There is a "twist", though, in terms of how the full-text data has

 been slightly altered to avoid copyright problems:<br>

</p>

<p><br>

</p>

<p><a href="http://corpus.byu.edu/full-text/limitations.asp">http://corpus.byu.edu/full-text/limitations.asp</a><br>

</p>

<p><br>

</p>

<p>Best,<br>

</p>

<p><br>

</p>

<p>Mark D.<br>

</p>

<p><br>

</p>

<div id="Signature">

<div style="font-family:Tahoma; font-size:13px">

<div style="font-family:Tahoma; font-size:13px">

<p>============================================<br>

Mark Davies<br>

Professor of Linguistics / Brigham Young University<br>

<a tabindex="0" href="http://davies-linguistics.byu.edu/">http://davies-linguistics.byu.edu/</a></p>

<p>** Corpus design and use // Linguistic databases **<br>

** Historical linguistics // Language variation **<br>

** English, Spanish, and Portuguese **<br>

============================================<br>

</p>

</div>

</div>

</div>

</body>

</html>