Revised: Refining early Basque criteria

Thu Oct 28 19:35:16 UTC 1999

[The following message has been re-written,
at the request of the list moderator.
In addition, in order to avoid duplication,
I have cut parts of it which were better treated in
a message sent later, titled
"9 specifics on Including and excluding data".
Please see that message in conjunction with this one.
LA]

We are gradually reaching greater clarity concerning Larry Trask's criteria.
When he has said that he only wants to include the "best" cases,
that is quite different from trying to include only cases which are
highly likely to have been in early or proto-Basque.

So in a sense, I could immediately agree with ALL of his criteria if
our goal were only the *absolutely* "best" cases,
because that means that every reasonable criterion we can think of
should be satisfied maximally, otherwise by definition
the proffered example could not be a "best" case.
But I think linguists very rarely use that goal in such an absolute sense,
and Larry doesn't quite either.

Taking it literally,
it would have an unfortunate logical consequence,
if applied in a truly absolute sense.
Our data set would become hopelessly small,
because we could always discover that some members of our tentative
data set actually scored higher on some criterion than others,
so the "others" would have to be dropped as not absolutely "best".
This simply means we must use common sense.
But then "best" does not have an absolute meaning,
and we are really looking for "better" to some sufficient degree.
Thus, all practical considerations are relevant, not absolutes.

***

Trask believes the historical phonology of Basque is already *known*,
that there is only one viable such phonology, and he is comfortable using
it to exclude items.  This is not stated overtly in his criteria, but it did
emerge explicitly in some discussions with others not long ago.
But then the procedure which he proposes,
to select the "best" candidates for early Basque vocabulary,
*is indeed at least to some small degree circularly based
on a prior hypothesis about the historical phonology of Basque*.
Nothing wrong with pursuing that route,
because in the long run we do evaluate the totality of hypotheses
and data, and Trask's hypotheses about historical phonology are quite
likely correct in most respects ...,
but it DOES mean that the results almost *could not* lead to questioning
the hypothesis, since the data is selected by conformance to the hypothesis.
For those interested in the specifics, please go back to earlier
correspondence involving others,
it is beyond my competence as a non-specialist in Basque.

***

However, for other exclusions which are not directly stated by
Trask's criteria but are indirect consequences, refer to the discussion
of expressives, and why a criterion of wide distribution improperly
excludes them (because of biases against recording).
In turn, their exclusion will lead to a misstatement of canonical forms
for the language as a whole. (Trask has stated that indeed canonical forms
for expressives are different from those for other vocabulary *in Basque*,
where I was myself able only to say that this situation is highly likely,
since it does occur in many languages.)

Trask intends to use the result of his selection of "best" cases to
determine canonical forms, which result he will then use to select
further candidates for vocabulary of early Basque.
But he simply fails to respond to the point that his
canonical formulas may be biased by his starting point.
He only answers "not a problem for me"

This kind of result snowballs, has a domino effect on later stages
of investigation.  Sometimes good, sometimes bad (bad if some wrong
assumption slipped in anywhere in the process).

While his reply under this message title did mention my point
about the systematic bias in excluding sound-symbolic words,
he transited immediately to a discussion of attestation in only one
dialect in cases which were not sound-symbolic.  So he has still
not found any solution to this issue of systematic distortion of results.

An example of the kind of response:

[LA]
> I have repeatedly pointed to the problem of selection against
> sound-symbolic vocabulary through accidents of limited recording,
> having the effect of biasing our notions of canonical forms.
> Using Larry's mention of the difference of subject matters
> between 16th-century and 17th-century documentations,
> it is easy to explain why using too early a cutoff in time,
> or requiring too many or the wrong dialect attestations,
> can systematically bias against vocabulary in certain semantic
> fields, because these, like sound-symbolic items more generally,
> were not within the subject matter favored by the documents.

[LT]
>Possibly, but not a problem for me.

So it is not a problem if his results about canonical forms are wrong,
and he then uses those wrong results to select his data (wrongly in
some cases because of the initial error) from which he will draw further
conclusions?  I simply don't follow this failure to appreciate the
snowballing consequence of certain kinds of errors,
or perhaps rather the certainty that he already knows the answers.
Because that seems to me to imply that he is only seeking the "best"
*examples* to *illustrate* *conclusions he has mostly already drawn*,
not a reasonable selection of *very good* data from which to consider
drawing new conclusions.  He actually states the contrary,
but I believe he is not aware of the circularities.

***

Here is an example where I think Trask's emphasis on the absolutely "best"
vocabulary is somewhat out of the mainstream.

[Example was <uko> and <ukondo>, please see the other message
"9 specifics on Including and excluding data"
so it is all treated in one place, no duplication here.]

***

Trask has often missed the more subtle and sophisticated paragraphs
in previous messages, or simply answered:

[LT]
> This is becoming extremely abstract.

***

Notice the complete non-sequiturs here:

[LA, clarifying that including nursery words and expressives,
for the purposes of having a truly representative sample of
early Basque, is NOT the same thing as handling the difficulties
of reasoning about external comparisons, precisely because
expressives may not undergo all of the sound changes which
apply to other words.
Therefore, of course, arguments from the *difficulty*
of the latter task are not arguments to exclude such words.]

> It DOES make it difficult to use such words in trying to prove
> a deep genetic relation between languages, because one must then
> have sufficient knowledge of sound-symbolic forces to argue
> something more specific is shared between particular languages,
> not merely a vague resemblance.
> That is quite a separate issue.

Trask's reply:

[LT]
>Yes, but I don't think that <ama> 'mother' is
>"only vaguely" a nursery word, or
>that <tu> 'spit' is "only vaguely" an imitative word.

We were both assuming that these were very clearly
a nursery word and an expressive word, that was not at issue.

>[on words like Basque <ama> 'mother' and <tu> 'spit']
>
[LA]
>> That does not argue either for or against such words actually being
>> inherited from Proto-Basque.
>
[LT]
>Of course, but not the point.

If one is seeking words which are likely to be inherited from
Proto-Basque, and one indicates that some feature of them
does not argue either for or against such words
actually being inherited, it must by definition be relevant to the
method of finding words which are likely to be inherited from
Proto-Basque.

Trask has several times been explicit that he does not want
to include nursery words, and does not exclude them explicitly,
but is glad when his other criteria manage to exclude them.
I have not understood why he should be glad of this.

***

I do understand from Trask's most recent message that it is widely
suspected that there are words from third languages borrowed into
both early Basque and early Ibero-Romance (no other Romance),
and Trask wants to exclude such from his considerations.
This does skate on the edge of excluding not merely one or two,
but perhaps quite a number, of words which really were in early Basque,
but this kind of data DOES have a different status.  So tag it,
don't exclude it, would be my suggestion.

***

[on Swadesh list and borrowing of even basic vocabulary,
please see now instead the later message
"9 specifics on Including and excluding data".]

***

Trask had mentioned that the 16th-century texts were primarily religious.
That is a very strong bias of content, I would think against quite a
range of vocabulary from ordinary life.

Trask's further comments emphasizehis belief
that vocabulary in a number of topics is clearly borrowed:

[LA]
> Thinking of subject matters attested or not, we have the following,
> which ties this issue back to the specifics of subject matter noted
> by Larry Trask for 16th vs. 17th centuries:
> If only two dialect areas have documents in certain subject matters,
> then vocabulary specific to those subject matters will be systematically
> excluded by requiring their attestation from more than two dialect areas.
> This is obviously undesirable.  It suggests that a moderate position might
> be to categorize documentary attestations by subject matter,
> and vary the number of dialect areas required according to the number
> of areas attesting documents in each subject matter.
> Of course in practice, this can be done in another way.
> Record ALL vocabulary items for a particular concept,
> and study the UNIFORMITY of etyma for that concept,
> without much regard AT FIRST for whether it comes from two or from five
> areas.
> If variants for a particular concept cannot be established as loans from
> neighboring languages, then remaining variety of non-cognate terms
> argues against immediately positing any of the conflicting forms
> as candidates for very early Basque (even though one or more
> of them MIGHT be a direct descendant of very early Basque).
> Additional argumentation would then be necessary, either way.
> Of course things are not this simple,
> but Larry Trask is an expert at using all of these varied sorts of
> information.

[LT]
>Well, much of this is very reasonable,
>but only for a different task from the one I have in mind.

How would Larry characterize such a "different task"?
It seems highly reasonable to think about this when selecting
words which are likely to be inheritances from Proto-Basque.

***

Lloyd Anderson
Ecological Linguistics