<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 14 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p.MsoPlainText, li.MsoPlainText, div.MsoPlainText
{mso-style-priority:99;
mso-style-link:"Plain Text Char";
margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
span.PlainTextChar
{mso-style-name:"Plain Text Char";
mso-style-priority:99;
mso-style-link:"Plain Text";
font-family:"Calibri","sans-serif";}
span.EmailStyle19
{mso-style-type:personal-compose;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoPlainText">We have just released a new corpus at <a href="http://corpus.byu.edu/">
corpus.byu.edu</a>, which may be of interest to some of you:<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText"><b><a href="http://corpus2.byu.edu/glowbe/">GloWbE<span style="font-weight:normal">: Corpus of
</span>Glo<span style="font-weight:normal">bal </span>W<span style="font-weight:normal">eb-</span>B<span style="font-weight:normal">ased
</span>E<span style="font-weight:normal">nglish</span></a></b><o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">This new corpus is <b>1.9 billion words in size</b>, and is based on 1.8 million web pages (including blogs) from
<b>20 different English-speaking countries</b> (US, UK, NZ, India, Hong Kong, etc). GloWbE is 4-5 times as large as COCA, and about 20 times as big as the BNC, and thus yields much richer data for some low-frequency constructions.<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">The real power of GloWbE, though, is the ability to see the frequency of any word, phrase, or grammatical construction in each of the 20 different countries. You can also compare any features in two sets of dialects, such as British
and American English (in more than 775 million words of text for just these two dialects). Or you could just limit your search to one or two countries (e.g. Australia (148 million words), South Africa (45 million), or Singapore (43 million)), and you'll still
be searching the largest online corpus for most of these twenty countries.<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<div style="border:none;border-bottom:solid windowtext 1.0pt;padding:0in 0in 1.0pt 0in">
<p class="MsoPlainText">This new corpus of World English adds nicely to the other corpora from corpus.byu.edu, which allow you to
<b><a href="http://corpus.byu.edu/variation.asp">examine variation</a></b> in English in ways that are perhaps not possible with other corpora:<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">-- historical: COHA, TIME, COCA (recent change), Google Books (Advanced)<o:p></o:p></p>
<p class="MsoPlainText">-- genres: COCA and BYU-BNC<o:p></o:p></p>
<p class="MsoPlainText">-- dialects: GloWbE, and side-by-side comparisons of corpora<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
</div>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif";color:black">============================================<br>
Mark Davies<br>
Professor of Linguistics / Brigham Young University<br>
<a href="http://davies-linguistics.byu.edu/">http://davies-linguistics.byu.edu/</a><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif";color:black">** Corpus design and use // Linguistic databases **<br>
** Historical linguistics // Language variation **<br>
** English, Spanish, and Portuguese **<br>
============================================<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
</div>
</body>
</html>