<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=utf-8"><meta name=Generator content="Microsoft Word 14 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
{font-family:inherit;
panose-1:0 0 0 0 0 0 0 0 0 0;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
h3
{mso-style-priority:9;
mso-style-link:"Heading 3 Char";
mso-margin-top-alt:auto;
margin-right:0cm;
mso-margin-bottom-alt:auto;
margin-left:0cm;
font-size:13.5pt;
font-family:"Times New Roman","serif";
font-weight:bold;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p
{mso-style-priority:99;
mso-margin-top-alt:auto;
margin-right:0cm;
mso-margin-bottom-alt:auto;
margin-left:0cm;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
span.Heading3Char
{mso-style-name:"Heading 3 Char";
mso-style-priority:9;
mso-style-link:"Heading 3";
font-family:"Cambria","serif";
color:#4F81BD;
font-weight:bold;}
span.EmailStyle19
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 90.0pt 72.0pt 90.0pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=EN-US link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Dear Alexander,<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>The 1000 most frequent words of most languages are mainly function words and their frequency distribution can be predicted with reasonable accuracy using the Zipf’s law. In a number of experiments we have conducted in the early ’00 for Modern Greek [1] we found that 90% of the 1000 most frequent words do not change even when we triple the size of the corpus (from 13Mwords to 33Mwords) and change considerably its topics and genres structure. So we are dealing probably with a lexical core which due to the grammatical character of its constituents (functional words) should be similar to most languages.<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Best<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>George Mikros<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>[1] Mikros, G., Hatzigeorgiu, N., & Carayannis, G. (2005). Basic quantitative characteristics of the Modern Greek Language using the Hellenic National Corpus. Journal of Quantitative Linguistics, 12(2-3), 167-184. doi: 10.1080/09296170500172478<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>____________________________<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>George K. Mikros<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Associate Professor of Computational and Quantitative Linguistics<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Department of Italian Language and Literature<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>School of Philosophy<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>National and Kapodistrian University of Athens<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Panepistimioupoli Zografou, GR-15784<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Athens, Greece<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Tel: +30 210 7277491, +30 6976111742<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Email: <a href="mailto:gmikros@isll.uoa.gr">gmikros@isll.uoa.gr</a> <o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Web: <a href="http://users.uoa.gr/~gmikros/">http://users.uoa.gr/~gmikros/</a> <o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'> corpora-bounces@uib.no [mailto:corpora-bounces@uib.no] <b>On Behalf Of </b>Alexander Osherenko<br><b>Sent:</b> Monday, October 10, 2011 2:23 PM<br><b>To:</b> corpora@uib.no<br><b>Subject:</b> [Corpora-List] Are frequency lists of the most languages equivalent?<o:p></o:p></span></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Hi all,<o:p></o:p></p><div><h3 style='margin:0cm;margin-bottom:.0001pt;mso-line-height-alt:9.0pt;vertical-align:baseline;border-style:initial;border-color:initial;outline-width:0px;outline-style:initial;outline-color:initial;font-style:inherit'><span style='font-size:12.0pt;font-family:"inherit","serif";color:#333333;background:white'><o:p> </o:p></span></h3><p style='margin:0cm;margin-bottom:.0001pt;mso-line-height-alt:9.0pt;vertical-align:baseline;border-style:initial;border-color:initial;outline-width:0px;outline-style:initial;outline-color:initial;font-style:inherit'><span style='font-size:10.0pt;font-family:"inherit","serif";background:white'>I am wondering if frequency lists of the most languages can be considered as equivalent. For instance, consider an English frequency list such as the BNC frequency list (<a href="http://www.linkedin.com/redirect?url=http%3A%2F%2Fwww%2Ekilgarriff%2Eco%2Euk%2Fbnc-readme%2Ehtml&urlhash=KPiq&_t=tracking_anet" target="_blank"><span style='color:#006699;border:none windowtext 1.0pt;padding:0cm;text-decoration:none'>http://www.kilgarriff.co.uk/bnc-readme.html</span></a>) and a German frequency list (<a href="http://www.linkedin.com/redirect?url=http%3A%2F%2Fgerman%2Eabout%2Ecom%2Flibrary%2Fblwfreq01%2Ehtm&urlhash=99CW&_t=tracking_anet" target="_blank"><span style='color:#006699;border:none windowtext 1.0pt;padding:0cm;text-decoration:none'>http://german.about.com/library/blwfreq01.htm</span></a>). The English frequency list starts with the definite article "the". The German one - with the definite article "der". Hence, the literal translation of the word "the" in German will result the word "der".<br><br>Of course, it is not always enough to translate directly. However, I wouldn't wonder if say 70%-80% of the most frequent words in the most languages can be considered as equal. Notice I don't say the words should be also ordered in the same manner. For example, word "of" always comes before the word "appear". Nevertheless, I anticipate that words "of" and "appear" are present in the most frequent words of the most languages in every possible order even if particular language uses the word "appear" more often than the word "of".<o:p></o:p></span></p><p style='margin:0cm;margin-bottom:.0001pt;mso-line-height-alt:9.0pt;vertical-align:baseline;border-style:initial;border-color:initial;outline-width:0px;outline-style:initial;outline-color:initial;font-style:inherit'><span style='font-size:10.0pt;font-family:"inherit","serif";background:white'><o:p> </o:p></span></p><p style='margin:0cm;margin-bottom:.0001pt;mso-line-height-alt:9.0pt;vertical-align:baseline;border-style:initial;border-color:initial;outline-width:0px;outline-style:initial;outline-color:initial;font-style:inherit'><span style='font-size:10.0pt;font-family:"inherit","serif";background:white'>Alexander<o:p></o:p></span></p></div></div></body></html>