[Corpora-List] Phonetic corpora typology

Mike Maxwell maxwell at umiacs.umd.edu
Mon Mar 8 13:39:06 UTC 2010


I hate to drag this out, but...

Bryar Family wrote:
> Yuri: RE: Language vs. Dialect The question is a marvelous one. I'm
> no expert,  but as any of the linguists on this list can tell you,
> the terms are politically defined, 

In popular use, that's true.  That is not (supposed to be) the way 
linguists distinguish language from dialect.

> ...and that no objective set of
> metrics involving isoglosses or other set of linguistic distinctions
> are going to be very helpful. 

IMO, the problem is not that there is no metric, but rather that there 
are many borderline cases (as linguists recognize).  Given that, there 
can be no metric with a non-arbitrary cutoff.  One common metric is 
mutual intelligibility, but intelligibility is a relative thing (more or 
less), not to mention that in practice it is often obscured by things 
like familiarity of speakers of one variety with the other, greater or 
lesser bilingualism, schooling in whichever variety is politically/ 
socially/ economically dominant, etc.

> The concepts of language vs. dialect
> need to be understood as localized social and political constructs

That's a different definition (not wrong, just different).

> None are based on anything but
> socio-political declarations. 

Not true, for example:

> For example, look at the
> ETHNOLOGUE.http://www.ethnologue.com/home.asp The Ethnologue and the
> accompanying SIL bibliography http://www.sil.org/ attempt to use
> objective linguistic metrics and have built an imposing academic
> citation index to buttress its decisions as to what is a language and
> what is a dialect. 

And in many cases these are based on actual testing of mutual 
intelligibility, on the ground, using stories and questions (with known 
answers) recorded in one area and tested in neighboring areas.

> This and the ISO language list are widely used
> references, but they are loaded with arbitrary delineations. 

Unavoidable, given the gradations present in the world, and of course 
recognized by linguists.

 > Based in
> part on the Ethnologue, Papua New Guinea is supposed to have
> literally hundreds of languages. However, a close examination of the
> Ethnologue reveals that "Gapapaiwa" and "Ghayavi" are held to be
> separate PNG languages, yet they have a "73%" lexical similarity". 

And?

> This declaration begs all sorts of questions. First of all, how is
> this "similarity" measured with such precision given these languages
> vary from village to village? Who knows! 

It is documented, and there are courses and books on doing this kind of 
testing.  So someone knows :-).  Several surveys measuring similarity 
are cited in Bryar's posting, answering his own question.

> On the other hand, "Galeya" and "Basima" [in PNG] 
 > are supposed to be dialects based on a purported 80%
> lexical similarity.

A few points: yes, it is quite possible that 80% lexical similarity 
would allow mutual intelligibility, while 73% would break it, although 
one could also ask how one defines the border of "mutual 
intelligibility."  But of course varieties of languages differ over more 
than their lexicons (and more than their phonology/ phonetics, which I 
believe is Yuri's method).  There's morphology, syntax,...  In this 
case, I doubt that the decision of dialect vs. language was decided 
purely on the basis of lexical similarity, although that's a 
quick-and-dirty method when you haven't had time to try more refined 
methods.

> Affiliated linguists have conducted
> various local field studies...
> ...Here is another
> conducted in Ethiopia: Gutt, Ernst-August. 1980. "Intelligibility and
> interlingual comprehension among selected Gurage speech varieties."
> http://www.ethnologue.com/show_work.asp?id=50110 ...
> Here the researchers conclude, "The Dobi dialect comprehension of
> Soddo is 76%, and Soddo speakers’ of Dobi is 90%." Thus similar
> levels of mutual comprehension make you a language in New Guinea and
> a dialect in Ethiopia! 

No, you're comparing two entirely different measures: lexical similarity 
in New Guinea, and comprehension in the other.

One final comment:
> Why is Standard Arabic "standard" given that
> far more people speak the Egyptian variety? 

An interesting question, given that MSA is no one's first language; in 
some sense, it's more like an Arabic Esperanto, or the modern use of 
Latin in Rome.
-- 
    Mike Maxwell
    What good is a universe without somebody around to look at it?
    --Robert Dicke, Princeton physicist

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list