Q: linguistic distance

Bobby D. Bryant bdbryant at mail.utexas.edu
Sun Jun 28 17:20:11 UTC 1998


----------------------------Original message----------------------------
Jacob Baltuch wrote:
 
> Some problems in defining a global complexity measure of a language
> have been mentioned. One of the problems seems to be the "incommen-
> surability" (to use Isidore Dyen's term) of measures of complexity
> for various sub-systems. I'm wondering if there is any thoughts out
> there on the problems involved in defining "linguistic distance"?
> (In the sense of a metric that would measure how different languages
> are from one another).
 
I have a specific proposal for this, and am in fact actively pursuing it.
But rather than discussing research that may not pan out, perhaps we should
discuss the broader issue your question brings up -- it may be beneficial
to have some general agreement about the meaning of such metrics *before*
people start using them to support specific claims.
 
In particular, what would such a metric tell us?  It is tempting to believe
that more closely related languages will be more similar under such a
metric, but there may be problems with this notion.
 
It is certain that measurements on an individual feature would be
unreliable indicators of relatedness, for instance if the chosen feature
happened to be "areal" rather than "ancestral".  Moreover, although it is
*tempting* to believe that a measurement across all the properties of a
language in aggregate, or at least across a sufficiently large subset of
such properties, would show smaller distances for more closely related
languages, it is not altogether *certain* that this is so.  (I would in
fact follow the temptation as my null hypothesis, but how would I validate
it if it led me to an outrageous conclusion and you challenged me on it?)
 
Even with measurements in hand and suitable assumptions about their
relevance for relatedness, problems would remain.  For example, I would
expect that a valid metric would show English as being more related to
French than German is, and likewise that English is more related to German
than French is.  But what exactly does that mean?  In particular, if we
worked exclusively with interlingual distances for these three languages
and tried to build a family tree by blind numerical methods such as
constructing a minimal spanning tree, the probable result is that English
would appear as the parent of French and German.  Adding additional
languages would of course provide separate evidence for a more nearly
correct tree, but it would still be quite difficult, perhaps impossible, to
"iron out" the conflicting claims offered by the various measurements.
Furthermore, if you view language change in terms of waves rather than
cladistics, the picture becomes murky in ways that are difficult even to
visualize.
 
Clearly, we should make use of any tools that come available; but they will
always have to be applied with every bit as much caution and well informed
judgment as have any of the traditional tools of the trade.  I anticipate
that we will have such metrics within about 20 years -- I hope to have some
very rudimentary results within 2-3 years -- but I also suspect that such
metrics will never count for more than the most tenuous circumstantial
evidence in cases where historical relations are not already well
understood by other means.
 
Bobby Bryant
Austin, Texas



More information about the Histling mailing list