Excluding Basque data

ECOLING at aol.com ECOLING at aol.com
Tue Oct 5 19:30:58 UTC 1999


In Larry Trask's recent reply on this subject,
I think most of the ground has been covered.
Larry does not feel that using a tagged database is superior to other methods,
though he does acknowledge that with a computer database one can
more quickly investigate alternatives. Some of us believe that is a very
substantial benefit, and that it does allow us more easily to search
for ways we may make biasing assumptions, and to explore alternative
hypotheses so the benefit is very substantial indeed.

People's bottom-line judgements simply differ on these things.
None of these judgements are a priori wrong,
and none of us has the right to assume that the alternatives are
a priori wrong.

>> Ruling out the question of bias in selection of data, by including
>> in the data set only that data which fit the criteria, as opposed to
>> grouping the data so that different analysis can be performed, does
>> deny the ability to analyze any assumptions which may, wittingly or
>> not, be embodied in the criteria.

>Sorry, but this makes no sense to me.

And to me, it is obvious, that flexibility in handling the data
under alternative assumptions sometimes makes all the difference,
between being able to question an assumption,
and being unable to question it effectively,
because of practical considerations of our thinking abilities,
or limitations of time, or whatever.

I think these kinds of issues have been adequately discussed,
and we are not likely to make further progress on them immediately.

***

Many of the rest seem to me to be preferences in choice of terminology,
which may have flavors we prefer one way or another.

***

On matters still worth discussing, at least for clarification:

>But, so far, neither you nor Jon Patrick nor anyone else has made a
>single substantive suggestion as to how my criteria bias the data.
>Instead, you just keep hinting vaguely that it might do so, but without
>any specifics.

No, I have repeatedly focused on one specific area where I could not merely
support the general concerns of others not to exclude too much,
but could add something from my own typological knowledge.
Namely...

Sound-symbolic words are
BOTH different in canonical forms
AND underrepresented in vocabularies etc.
and the coincidence of these two properties
means it is dangerous to exclude them.

As to sound-symbolic words,
Larry says he does NOT insist on their having the same canonical
forms as other vocabulary, does not exclude them on that basis.
(I surely thought he was using canonical forms as one of his
criteria for setting up his initial lists?  Was that not true?
I do not intend to go back into the extensive correspondence
to check on this.  Others can do so if they wish.)

In any case, another criterion is explicit or implicit in the
following:

>But, in fact, the vast majority of sound-symbolic items in Basque do
>*not* satisfy my other criteria.  Hence my approach will immediately set
>them apart from the words which are the best candidates for native and
>ancient status.  Once that's established, *then* these distinctive words
>can be investigated to determine their own phonological characteristics.

Notice in the way the middle sentence follows the first sentence
that Larry treats his approach (his criteria) as if his criteria were
pretty much the same thing as selection for "the best candidates for
native and ancient status".  Others of us believe that must always be
kept on the surface of awareness as an empirical question,
not taken for granted;  it is precisely a core question!

If what Larry says above about Basque is true of other languages
to a simlar degree, then should we conclude that sound-symbolic
items are NOT good candidates for native and ancient status,
in general?  That would seem to be implied by the paragraph
just quoted.  Yet to me that shows there must be something wrong
with the argument.

That is not to say we treat sound-symbolic
words exactly the same way as other words.
Just as the English "pavilion" from French "pavillon"
is the normal French development by the sound laws, and "papillon"
is a sound-symbolic form which has resisted a sound change,
so we might suspect Basque <pinpirin> may have not undergone all of the
sound changes which most of the vocabulary did in the history of
Basque, independent of the question whether it is a recent loan
or some primaeval vocabulary item inherited from 10,000 years ago.
So even the supposedly air-tight logic of sound laws cannot be
used unequivocally to include or exclude vocabulary from native
vs. borrowed categories.  A shocker, and not a wild card we want
to use without severe limits or controls, or else the entire enterprise
falls.  But a shocker we cannot escape by waving magic wands
or waving words.  It's a fact of reality.
(English "tiny", which went through the great vowel shift,
and "teeny", which did not go through the great vowel shift,
was retained or regenerated or reborrowed from a dialect,
would be a similar case,
unless the dialect-borrowing solution is adopted.)

(Trask says:
> I exclude sound-symbolic words like <pinpirin> `butterfly'
>from my initial list, not because I don't like their forms, but because
>they do not satisfy my primary criteria.  In the case of <pinpirin>, I
>exclude it because it is attested at all only in one small corner of the
>country.)

Among Larry Trask's other criteria were the distribution across all
of the dialects, not the occurrence in only a few.
As I took pains to point out in previous messages, that criterion is biasing.
However reasonable, even obvious, it may appear to a historical linguist
(including seeming obvious to me too, I may add)
it may still disproportionately bias against sound-symbolic words,
because of the spotty record of those
who record vocabularies in not collecting such words,
thereby reducing the number of dialects in which they are attested,
quite independently of whether they actually were used in those dialects.
The entire class of such words may not be recorded, or very few of them,
it is not simply that their recording is randomly slightly less full.

And this enormous underrepresentation can then indirectly lead to
initial conclusions on canonical forms which are
too simple and neat, too consistent,
including canonical forms which are
underrepresentative of sound-symbolic forms.

Such initial conclusions on canonical forms can then have
cascading secondary effects on inclusion or exclusion of additional
vocabulary in the lists.  Even if /bat/ (from the recent discussion of
/bade/ /bedere/ etc.) were the only form included with a final
stop consonant, being exceptional would not prove it is not part of
pre-Basque.  All (?) languages have more common and rarer forms,
and have peripheral forms, especially when the ambiguities of
fast speech and slow speech borderlines are considered,
and when a few short high-frequency items are considered.
Yet we must also pay attention to differences of canonical forms,
as they sometimes DO clue us in to different strata of vocabulary
which may be relevant in historical-comparative studies.

>I have proposed that obvious and recognizable sound-symbolic items, like
><tu> `spit', might reasonably be excluded at the outset.  But I'm not
>wedded to this, and I don't mind if others want to include them when
>they satisfy the other criteria.

How about if they do not satisfy the other criteria, or some of them,
and if the inclusion of such exceptional forms then
enters into the determination of what are true canonical forms,
and even what those other criteria should be,
and cycles back to affect judgements of what forms are exceptional or not,
or to what degree (frequency or structural),
and EVEN to affect which forms are included in the analysis?
It is indeed circular not in a bad sense, but should be recognized as
circular.

Larry says his criteria do not have any biases (I think he believes
they cannot, as he thinks he has formulated them), yet here he himself
says he is excluding a form, mentioning in this paragraph only that it
is sound-symbolic (as if that were a sufficient reason?
I do not want to assume that, but at least here no more was given).
I do not remember whether he gave any other reasons for
excluding <tu>?

Trask refers to this:
>...criticizing me for selecting criteria appropriate only
>to the task I have in mind, and not to other conceivable tasks that
>someone else might like to pursue.

Trask clearly believes his criteria are obviously
appropriate to the task he has in mind.
Others are not quite so certain that that is all his criteria do,
believe they may do some other things as well.

The matter of biasing assumptions is almost always very
difficult to analyze, because if it were easy, we would already
have solved it, by eliminating the unwarranted assumptions.

I have absolutely no doubts that Larry Trask's knowledge is in general
crucial to finding good candidates for proto-Basque forms,
but that is not the same thing as saying there may not be unknown biases
even in his work.  It's simply our status as fallible human beings
who are not omniscient.

>At the same time, no questions about the
>nature of Pre-Basque words can possibly be answered without first
>identifying the words that were present in Pre-Basque.

Seems self-evident, doesn't it?

Actually, this sort of statement is often not true of normal science.
We might answer some questions about the nature of pre-Basque words,
then make progress on identifying which words were present
in pre-Basque, then answer more questions about the nature of
pre-Basque words, including CHANGING some of the earlier
answers, then make more progress on identifying which words
were present in pre-Basque, excluding some we had previously
included and including some we had previously excluded.
It is the edifice AS A WHOLE which is ultimately evaluated.
There may not exist a step-by-step process
of getting there with anything like certainty along the way.
So we can use a step-by-step process WITHOUT assuming
certainty along the way.  Which is why it is so important to
be able to change assumptions EASILY.

>Adding to your
>database words that did not exist in Pre-Basque cannot offer the
>slightest assistance, and may well spoil the results.

Of course.  Everyone agrees in one sense, if they are mixed
with words which did exist in pre-Basque.
But who is omniscient enough to know in every case
which words did not exist in pre-Basque?
Larry is not claiming he is, explictly is not,
yet the method he proposes is stated in such absolute terms
("first" identifying the words that were present in
Pre-Basque) that it does appear to rely on omniscience.

It is simply not so straightforward.

>Is this such a
>difficult point to follow?

The last part about not wanting to add words that did not exist
in pre-Basque is not at all hard to follow, everyone agrees with it.

That does not logically force all of the other decisions that
Larry wishes to make in advance, though he is of course
right to make his own best attempt.

Lloyd Anderson
Ecological Linguistics



More information about the Indo-european mailing list