<p>Hi there,</p>
<p>This discussion is quite interesting, but all in all the distinction between a language and a dialect is rather not about the technical definition, but it rather boils down to what people think about the varieties they use at particular point in time, which is obviously influenced by current politics, culture, tradition, history etc (actually 7 criteria determine by Bell are quite helpful to establish some framework for such discussions).</p>
<p>Nevertheless, the original question hasn't been answered yet - are corpora helpful in making such distinctions? The common belief is that language (or standard language) is the preferred dialect, which was chosen due to political, cultural and practical choices of language users at a particular point in time. Hence, as the standard variety is associated with power, it is a prestigious variety of language used in all domains of life. It follows that other varieties, which very often are vernacular varieties, are not used in all domains of life (e.g. only in L-domains) so logically they may be viewed as less prestigious varieties (e.g. Ranamal in Norway, which is not used in public education, political debates etc). So we have a problem here: if we decide to collect some corpora for the varieties in question (i.e. the ones we want to pronounce either languages or dialects), such corpora will represent language data used in different domains of life, which may stand for a poor basis for comparison and insufficient for making such distinctions (in cases when mutual intelligibility as well as a plethora of political, cultural, social and historical factors are insufficient to distinguish between languages and dialects). So the question remains: how can corpora be useful and what approach (in terms of exact methodology and corpus composition criteria) to take so that corpora are helpful in making a distinction between languages and dialects)? What say you?</p>
<p>Best regards,</p>
<p>Lukasz Grabowski</p>
<p> </p>