<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none"><!-- p { margin-top: 0px; margin-bottom: 0px; } @font-face { font-family: SimSun; } @font-face { font-family: Calibri; } @font-face { font-family: Tahoma; } p.MsoNormal, li.MsoNormal, div.MsoNormal { margin: 0cm 0cm 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif; } a:link, span.MsoHyperlink { color: blue; text-decoration: underline; } a:visited, span.MsoHyperlinkFollowed { color: purple; text-decoration: underline; } span.EmailStyle17 { font-family: Calibri, sans-serif; color: rgb(31, 73, 125); } .MsoChpDefault { font-family: Calibri, sans-serif; } @page WordSection1 { margin: 70.85pt; }--></style>
</head>
<body dir="ltr" style="font-size:10pt;color:#000000;background-color:#FFFFFF;font-family:Tahoma,Geneva,sans-serif;">
<p>Marc Brysbaert wrote:<br>
</p>
<p><br>
</p>
<p>>> <span style="color: rgb(31, 73, 125); font-family: Calibri, sans-serif; font-size: 11pt;">For what it is worth, in my experience word frequency lists and N-gram lists are not a problem. </span></p>
<div><span style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 125);"><br>
</span></div>
<div><span style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 125);">I agree. I've distributed COCA/COHA word frequency (http://www.wordfrequency.info) and n-grams (http://www.ngrams.info) data for several years now, and I've never
had any issues.</span></div>
<p><br>
</p>
<p>>> <span style="color: rgb(31, 73, 125); font-family: Calibri, sans-serif; font-size: 15px; background-color: rgb(255, 255, 255);">The big problem we are encountering is that currently there is no guidance about whether corpora can be shared. As a result,
nearly all corpora assembled remain next to inaccessible, meaning that everyone has to collect their own corpus. This is a lot of needless work and also means that little cumulative work can be done.</span><br>
</p>
<p><br>
</p>
<p>I've also been distributing "full-text" data from 450 million word COCA and the 1.9 billion word GloWbE (http://corpus.byu.edu/glowbe) for a while now, and again no problems to this point. There is a "twist", though, in terms of how the full-text data has
been slightly altered to avoid copyright problems:<br>
</p>
<p><br>
</p>
<p><a href="http://corpus.byu.edu/full-text/limitations.asp">http://corpus.byu.edu/full-text/limitations.asp</a><br>
</p>
<p><br>
</p>
<p>Best,<br>
</p>
<p><br>
</p>
<p>Mark D.<br>
</p>
<p><br>
</p>
<div id="Signature">
<div style="font-family:Tahoma; font-size:13px">
<div style="font-family:Tahoma; font-size:13px">
<p>============================================<br>
Mark Davies<br>
Professor of Linguistics / Brigham Young University<br>
<a tabindex="0" href="http://davies-linguistics.byu.edu/">http://davies-linguistics.byu.edu/</a></p>
<p>** Corpus design and use // Linguistic databases **<br>
** Historical linguistics // Language variation **<br>
** English, Spanish, and Portuguese **<br>
============================================<br>
</p>
</div>
</div>
</div>
</body>
</html>