Q: the 'only six' argument

Bobby D. Bryant bdbryant at mail.utexas.edu
Thu Aug 31 16:17:07 UTC 2000


----------------------------Original message----------------------------
Larry Trask wrote:

> So, my question: does anybody believe that any version of this
> statement is valid?  More precisely, do we have a number N and
> a set of criteria C such that the existence between two languages
> of N matches satisfying criteria C is enough to guarantee that
> the languages must be related?

Here are three different (independent) takes on the subject:


1) Arguably, it only takes a single "good" example.  That is, if C is good
enough to rule out chance resemblances, borrowings, etc., so that the only
remaining explanation is a common ancestral form, then (so far as I can see)
the languages must in fact be related even if they share only a single
ancestral form.  So surely the definition of C is going to be more
problematic than the specification of N is.

However, the "goodness" of C would surely depend on a demonstration of
regular sound correspondences between the N forms, and such a list of
correspondences would provide a tool for analyzing the remaining morphemes in
the two languages.  In practice, failure to turn up a large number of
additional matches after such an analysis would leave other linguists
skeptical about the correctness or completeness of your specific list of
correspondences, however robust C might be in the abstract.

Thus the problem moves again, from the goodness of C to the goodness of
someone's claims of having satisfied C.  For instance, if C's checklist
included that "there must be at least 12 regular sound correspondences" and I
set forward a mere 5 pairs of words with correspondences that satisfied C,
but could not produce any more pairs on demand, linguists would surely
question whether my correspondences were correct and complete, even if
everyone agreed on the definition of C itself.

In short, I don't think such a formalization of the problem in terms of N and
C is going to work in practice.  At some level you are always going to have
to pile on enough examples to convince your peers, which is of course the way
things have always worked.


2) More abstractly, I am skeptical that such a formulation exists in any form
that will correctly separate related pairs of languages from non-related
pairs.

I suppose it would be tolerable to use a weaker formulation that fails to
validate some pairs of languages that are in fact related, but the
formulation must *never* falsely validate pairs that are not related.  We
could surely satisfy that requirement trivially by using a fairly stringent C
and then setting N to a very large value (see take 3, below), but what we
need, if it is to be of any use in practice, is a formulation with a fairly
small N.

However, I submit this conjecture:

"For any reasonable C, and for any N small enough to be useful, it will be
possible to obtain a provably false positive result by applying the test to
some single language, L, and showing that C spuriously maps words in L onto
other words also in L in an inappropriate manner, at least N times, falsely
'proving' that L is related to itself other than by the identity relation."


3) Somewhat tangentially:

For questions that try to limit "how many?" from above or below, we have the
mathematically robust concepts of "none", "a bounded number", "an unbounded
number", and "an infinite number".  It seems to me that we need another
category between "bounded" and "unbounded", which might be described as
"trivially bounded by some excessive number, but difficult or impossible to
bound precisely or even closely".[note 1]  So far as I know, no such
mathematical concept exists.

Your problem seems to be an example of this.  For instance, if I cited 5000
"good" examples of matches between two languages, almost everyone would be
convinced that the languages were in fact related. But if I tried to  narrow
the required number down from 5000 to a precise/close bound, then that is
exactly the question you have raised.

Another example that will ring among linguists is the question of how many
center embeddings you can use in a sentence and still expect a hearer to be
able to parse it correctly.  If I claimed that no one could parse embeddings
5000 deep, surely everyone would agree.  But if I tried to bound the number
precisely/closely, we would likely get into a heated debate over what the
actual number should be.[note 2]

It seems to me that this problem of "precise/close boundability" defines a
general class of problems, and may well be worthy of study as a mathematical
concept.

Notes:

[1]  It is tempting to call this "soft bounding", but I reject that because
the name would seem to claim something about the nature of the phenomenon
itself, and I think we (or the mathematicians) should determine something
about that nature before giving it any such name.

[2]  Admittedly, the second example is weak because the parsable depth of
embedding surely varies with the individual hearer and with the content of
the sentence in question.  However, even if you abstract those variations
away I think you would discover that the same boundability problem arises.
(This problem has exercised me for some time, because grammarians like to
think grammar allows an unbounded number of such embeddings, but experience
shows that such a grammar differs greatly form citable forms.  It would
therefore be nice to have a conceptual category tighter than "unbounded", but
looser than some specific number such as "four".  We could, IMO, improve our
grammmatical theory by substituting that concept for the concept of
unboundedness in our grammars.)

Always a pleasure to hear from you on HISTLING,

Bobby Bryant
Austin, Texas



More information about the Histling mailing list