Refining early Basque criteria

ECOLING at aol.com ECOLING at aol.com
Tue Oct 12 17:18:35 UTC 1999


Concerning Larry Trask's list of criteria for potential candidates
for early Basque vocabulary lists:

>1. Early attestation

>The word should be recorded early.  I have proposed a cut-off date of
>1600, since the first substantial literature appears in the 16th
>century.  Someone else (Jon Patrick?) suggested 1700 instead.  This is
>reasonable: the 16th-century texts are not numerous; they are all
>written by clerics; and they are overwhelmingly religious, with many of
>them being translations.  The 17th-century literature, in contrast, is
>much more voluminous, and it includes the first lay writers, notably the
>important Oihenart.  I'm happy with 1700, though I suspect it won't make
>a great deal of difference.  But nothing later.

The details in the paragraph above suggest to me OBVIOUSLY
if you want early Basque, you use 1700 in preference to 1600,
because the 16th-century materials are so limited in content.
It is always possible to study any differences between 16th and 17th-century
equivalent grammatical morphemes, forms of the same words, etc.,
where those are attested in both centuries, but obviously much
non-religious vocabulary will be systematically disfavored by the earlier
cutoff date.

>2. Widespread distribution.
<snip>
>Now, I suggest counting a word as widespread if it is securely attested
>in at least four of these five groupings.
<snip>

As explained in a long and detailed message sent many days ago,
focused on sound-symbolic vocabulary ...
given the limited recording of sound-symbolic vocabulary,
an insistence on very wide distribution will have the effect of biasing
against this type of vocabulary,
and in this case will certainly bias against a variety of canonical forms,
in favor of canonical forms more uniform and more limited than they
actually were in very early Basque.
A systematic distortion, in other words,
in this case not merely a lack of particular lexical items, but even
a systemic distortion by changing the hypotheses of canonical forms.

Thinking of subject matters attested or not, we have the following,
which ties this issue back to the specifics of subject matter noted
by Larry Trask for 16th vs. 17th centuries:
If only two dialect areas have documents in certain subject matters,
then vocabulary specific to those subject matters will be systematically
excluded by requiring their attestation from more than two dialect areas.
This is obviously undesirable.  It suggests that a moderate position might
be to categorize documentary attestations by subject matter,
and vary the number of dialect areas required according to the number
of areas attesting documents in each subject matter.
Of course in practice, this can be done in another way.
Record ALL vocabulary items for a particular concept,
and study the UNIFORMITY of etyma for that concept,
without much regard AT FIRST for whether it comes from two or from five areas.
If variants for a particular concept cannot be established as loans from
neighboring languages, then remaining variety of non-cognate terms
argues against immediately positing any of the conflicting forms
as candidates for very early Basque (even though one or more
of them MIGHT be a direct descendant of very early Basque).
Additional argumentation would then be necessary, either way.
Of course things are not this simple,
but Larry Trask is an expert at using all of these varied sorts of
information.
More dialect areas of course gives additional security,
and perhaps additional phonological information.

>3. Absence from neighboring languages
<snip>
> I suggest that, if
>Agud and Tovar's etymological dictionary shows a widespread belief or
>suspicion among specialists that a word is borrowed, then it should be
>excluded -- even if the loan origin is not certain.  Caution is vital
>here, in my view.

Some would use "caution" in not throwing out things for which loanword
origin is merely suspected, for which the argument is not a strong one.
"Strong" is not the same as "certain".
Moderation in all criteria, as in all things.

>A decision must be made about the very few shared words which are
>thought to be of Basque origin.  For example, everybody believes that
>the Castilian and Portuguese words for `left (hand)' are borrowed from
>Basque <ezker>.  A policy must be adopted here, but such words are
>vanishingly few anyway, and the decision is most unlikely to have any
>significant consequences.

Would such examples be those in which the Castilian and Portuguese words
have no cognates in other Romance languages?  In such a case,
would not the identical sort of criteria dictate that they be excluded from
studies of early Castilian and Portuguese?  Of course, there is no necessary
contradiction here, because items of this sort could in principle be
excluded from BOTH sides of any puzzling sharing, in the approach
Larry Trask is taking.  Or they can be included on BOTH sides.
My own position would be simply to include them on both sides,
but with a note that they might be from either side, and if they are
from the Romance side, but limited to Iberian Romance, then we
must have an additional hypothesis that there was some innovation within
Iberian Romance, or else a borrowing from some third language family
related neither to Romance nor to early Basque.  Is there some gap in
that reasoning?  Because it seems to me to suggest that words limited
to Basque and to Iberian Romance (not found in other Romance languages),
are better assigned to early Basque than to early Romance,
since by definition of the situation they are not reconstructible to early
Romance.  But this is not certain, Occam's razor can suggest a route
to follow, but it cannot absolutely exclude the more complex case that
there was an extinct third language family from which a word was
borrowed both into Basque and into Iberian Romance.

***

>...it is well known that words like <ama>
>`mother' and <tu> `spit' occur in languages all over the planet.

That does not argue either for or against such words actually being
inherited from Proto-Basque.

It DOES make it difficult to use such words in trying to prove
a deep genetic relation between languages, because one must then
have sufficient knowledge of sound-symbolic forces to argue
something more specific is shared between particular languages,
not merely a vague resemblance.
That is quite a separate issue.

***

<snip>

>When -- as so often -- a word exists in
>several regional variant forms, what form should go into the list?
>My answer is that we should simply appeal to the known phonological
>prehistory of Basque, and use the form which can be reconstructed as the
>common ancestor.

I have great confidence that Larry Trask will almost always draw the
correct conclusions in such cases, given his knowledge of the
phonological history of Basque.  But it nevertheless should be clear
that there is a potential circularity, of exactly the kind pointed out by
Steve Long, that a theory of the historical development of a language
is used to select which forms are considered to have been in a proto-
language.  That virtually guarantees that a different theory of
historical development of the language cannot easily be developed
from data thus selected.  Elementary common sense.

That does not make this procedure wrong.
Because it is the totality of the COMBINATION of the attested data
and the hypothesized sound changes (etc.) which we evaluate,
in the long run.  But it does make this procedure less than absolutely
certain to give the correct results.  (Using terminology from other
fields, it is often possible to find a "local minimum" or solution
which is better than any nearby points (closely similar solutions),
yet which is not an absolute minimum, not the absolute best solution.
In our field,
changing BOTH some of the hypotheses about sound changes and
other historical developments AND some of the hypothesized proto-forms,
changing both together, in a co-ordinated fashion, may
yield a better solution.  Such shifts of paradigm do occur.

>Finally ... if anybody out there still believes that my primary criteria are
>somehow likely to skew the results in some phonological way, or if
>anybody thinks that there exist better criteria for the purpose of
>identifying the best candidates for native and ancient status in the
>language, let's hear about it.

I don't understand the word "still" here,
it should be evident that I do and have previously explained the
concrete reasons why.  There is no need to repeat the details here.
As far as I know, Larry Trask has not
argued against the reasons I gave.

I have repeatedly pointed to the problem of selection against
sound-symbolic vocabulary through accidents of limited recording,
having the effect of biasing our notions of canonical forms.
Using Larry's mention of the difference of subject matters
between 16th-century and 17th-century documentations,
it is easy to explain why using too early a cutoff in time,
or requiring too many or the wrong dialect attestations,
can systematically bias against vocabulary in certain semantic
fields, because these, like sound-symbolic items more generally,
were not within the subject matter favored by the documents.

***

Additionally, criteria for what are likely to be descendants of early
Basque forms are NOT THE SAME THING as criteria for
what are good items to use in any consideration of potential
external relationships of Basque.  I say this latter NOT because I hold
out any hopes for finding distant relatives of Basque in my lifetime,
but simply because mixing these two goals can distort the picture
of proto-Basque by excluding many items which were in fact
part of proto-Basque.

***

In addition to all of the criteria Larry Trask mentions, I think there
should be another criterion:  For each item on the basic 100-word
or 200-word Swadesh list, be sure to INCLUDE SOME vocabulary item
whose meaning matches that item.
Simply on the grounds that every language will have vocabulary
for such meanings, so reconstructing an early Basque without any
term for such a meaning is contra-indicated.
This is not a criterion for evaluating any particular proposed vocabulary
item in Proto-Basque, it is rather a global criterion which can be
used to evaluate the sum total of the judgments on individual candidates
for inclusion.  It can tell us that we have excluded too much,
and in what semantic ranges we should probably seek additional
candidates for inclusion.

I would bet there are many other criteria which might be added,
and balancing them all together to make decisions will yield
better results than using a simpler set of criteria and allowing
any one otherwise reasonable criterion to dictate inclusion or exclusion.
Larry Trask has shown his ability to use many criteria beyond the
simple set in discussing particular vocabulary items
(such as /sei/ or any other).

***

Best wishes,
Lloyd Anderson



More information about the Indo-european mailing list