[Corpora-List] Can corpora help to distinguish a dialect and a language?

Mike Maxwell maxwell at umiacs.umd.edu
Tue Feb 16 02:50:01 UTC 2010


Angus B. Grieve-Smith wrote:
> Paula Newman wrote:
>> Possibly a better definition of the distinction between a dialect and a
>> language would focus on the amount of difference between the purported 
>> base language and the dialect, possibly indicated by mutual 
>> intelligibility.
 >
> How do you figure out which one is the "base language" and which one 
> is the "dialect"?

Since we're getting into serious discussion here (nobody took me up on 
my joke :-(), the implied answer to this rhetorical question (that there 
is no linguistic way to decide which is a (purported) base language and 
which is a dialect is of course the answer most linguists have given for 
a century.  A (purported) base language is generally just the prestige 
dialect, and it's usually the prestige dialect because its speakers are 
wealthier or have more political power (more ships and soldiers).

> Why not just have a measure of mutual intelligibility?  What do 
> labels like "language" and "dialect" add to it?

Measures of mutual intelligibility are of course what the better 
documented distinctions in the Ethnologue (www.ethnologue.com), and from 
that ISO 639-3, are based on.

Actually, in addition to the languages of the Ethnologue, ISO 639-3 
contains extinct languages and artificial languages, for which mutual 
intelligibility tests are largely impossible.  And there are certainly 
parts of the world where mutual intelligibility tests--or dialect 
surveys, as SIL calls them--have not yet been done.  But I believe most 
of the Ethnologue's distinctions among language varieties in Mexico, for 
example, were based on testing, as opposed to someone eyeballing 
wordlists.

Which brings us back to the OP's original question: "Can corpora help to 
distinguish a dialect and a language?"  It would be nice if they could, 
since in at least some cases corpus data would be easier to collect. 
OTOH, you can't collect corpus data for most languages of the world, 
since most of them are unwritten.
-- 
    Mike Maxwell
    What good is a universe without somebody around to look at it?
    --Robert Dicke, Princeton physicist

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list