Excluding Basque data

Larry Trask larryt at cogs.susx.ac.uk
Thu Sep 30 08:33:03 UTC 1999


On Sun, 26 Sep 1999 ECOLING at aol.com wrote:

[on my statement that databases do not free us from choosing criteria]

> It is very different,
> because no "exclusion" of data need be a permanent exclusion,

And who has ever suggested that any exclusions should be permanent?
Certainly not me.  Read what I've written.

> because different users can choose different criteria  for proceeding,
> and because the same user can change his or her mind at different times
> and choose different criteria for proceeding.
> One does not have to be "right" on the first choice,
> there are no serious consequences for making an initial error.

But this is equally true regardless of the technology in use.
Look: databases do not save you from the consequences of your decisions.
They only allow you to investigate the consequences of making different
decisions more rapidly than working on paper does.  Convenient, of
course, but there is no point of principle here.

> With a paper method in which one cannot go back and change one's mind,

And who has ever suggested such an approach?

> there is a truly excessive focus on being "right" the first time round.

No, there isn't.  The emphasis is only on being *cautious* the first
time round.  But how does a database permit us to throw such caution to
the winds without fear of error?

> And disastrous consequences if one is not.

There are many ways of achieving disastrous consequences in scholarly
work.  And I cannot see that a policy of massive inclusion is a better
way of avoiding disaster than a policy of prudent exclusion.

> Only the omniscient would prefer to be unable to change one's mind?

And who has ever suggested that refusal to change one's mind is an
admirable characteristic?

Lloyd, where on earth are you getting this stuff from?

You appear to be trying to paint me as someone who makes up his mind in
advance what the truth is and then refuses to consider any evidence that
conflicts with this pre-selected truth.  And this is a travesty, as you
must see if you read my words.

> [LT]

>>       But, once more: I *never* exclude a word from my
>>       list because it doesn't match any generalizations about form which
>>       I may have in mind.

> [JP]

>>> I've never asserted that you did. However I do think that your
>>> criteria are designed to create an analysis that is more strongly
>>> consistent with the generalisations you "think you have a pretty
>>> good idea" about.

> [LT]

>>I flatly deny this, and I challenge you to back up your assertion.

> Like Jon Patrick, I believe that Larry Trask's criteria MAY IN EFFECT
> bias the results to favor hypotheses which he himself espouses.
> This DOES NOT MEAN that he is consciously aware of this,
> (nor that he is deliberately manipulating the data,
> as he seems to have inferred he was being charged with).
> Quite the contrary, it probably results from his being so convinced of
> certain hypotheses that he can scarcely conceive of them not being correct.
> Others may find it easier to conceive of that
> (as is so often true in research, nothing unusual here).

This is no answer to my challenge.

You assert that you personally believe that my criteria "may in effect"
bias the findings.  But you have signally failed to explain how this
result might come about.

So tell me: how do my principal criteria of early attestation,
widespread distribution, and absence from neighboring languages "have
the effect" of biasing my results on phonological form?

And how do the opposite policies have the effect of securely avoiding
any such bias?

> In normal science, we do not normally allow the proponent of a
> hypothesis to select the data which is to be included in evaluating
> the hypothesis (this statement just given subject to many
> clarifications, of course, but as a general statement it can stand
> this way).

Lloyd, forgive me, but this is a gross misrepresentation of my position.

I *am not testing any hypothesis* at this point.  In fact, *there is no
hypothesis* at this point.  The purpose of my criteria is *not* to
remove data from a pre-existing hypothesis, but rather to *find* the
data, in the first place, on which hypotheses or conclusions can be
based.

Remember, I'm interested in Pre-Basque words, and so I first have to
*find* these things.

> The ONE area in which I can from my own knowledge say that Trask's
> criteria, as I understand them from earlier messages, are indeed
> biasing, was in his aim at a consistent canonical form for all of
> the vocabulary he wanted to include in the data set of candidates
> for proto-Basque status, specifically that sound-symbolic vocabulary
> must have the same canonical forms as general vocabulary.

Lloyd, this is certainly not so.

I certainly do *not* expect sound-symbolic forms to have the same forms
as ordinary lexical items.  In fact, I am certain, from my own
investigations, that this is not so in Basque, and I have said so many
times in many fora.  Just ask the hopeful long-rangers whose comparisons
I have criticized.

I have proposed that obvious and recognizable sound-symbolic items, like
<tu> `spit', might reasonably be excluded at the outset.  But I'm not
wedded to this, and I don't mind if others want to include them when
they satisfy the other criteria.

But, in fact, the vast majority of sound-symbolic items in Basque do
*not* satisfy my other criteria.  Hence my approach will immediately set
them apart from the words which are the best candidates for native and
ancient status.  Once that's established, *then* these distinctive words
can be investigated to determine their own phonological characteristics.
No "permanent exclusion" here.

> It is simply a fact that there are languages which have different
> canonical forms for different strata of their own vocabulary, and
> especially this is known to be true for sound-symbolic vocabulary.

Yes, and Basque is one of them.  This is not a secret: read the relevant
passage in my book.

> Since proto-Basque is supposedly a real language,
> we should not make any assumptions IN ADVANCE which we know
> to be false when applied to languages in general.

Lloyd, I am becoming exasperated.  I am *not* making any such assumption
in advance.  I exclude sound-symbolic words like <pinpirin> `butterfly'
from my initial list, not because I don't like their forms, but because
they do not satisfy my primary criteria.  In the case of <pinpirin>, I
exclude it because it is attested at all only in one small corner of the
country.

Now, I happen to know that this small corner is a region distinguished
from the rest of the country by its fondness for expressive forms
beginning with <pin-> or <pan->.  But that's not the reason I exclude
it.  If it satisfied my criteria, it would be in the list.

Because of what I know about Basque expressive formations, I am certain
that <mutur> ~ <mustur> `muzzle, snout' is also an expressive formation.
But this word, almost uniquely among obvious expressive formations,
satisfies my criteria: it is attested throughout the country, or very
nearly so, and it is recorded as early as 1571 -- early enough, by my
criteria.  So, this word must go into my initial list.  I have no right
to exclude it merely because it looks funny to me, or because I have
already made up my mind that it is an expressive formation.  But I can
assure you that the word will stand out a mile from all the other words
in my initial list.

> The assumption that proto-Basque vocabulary fit a single consistent
> set of canonical forms, regardless whether it was general vocabulary
> or sound-symbolic vocabulary,
> is precisely that kind of an assumption which we should not make.

And I, for one, am certainly not making it.  Who is?

> Whether Trask's criteria bias the selection of data in any other
> ways I would not venture to say.  I have no knowledge of specifics
> which would be relevant.  The question is however a legitimate one,
> and should remain open to empirical scientific investigation. Trask
> should not feel that his honesty is being questioned, it is merely
> whether his assumptions are correct that may be in question.

And what assumptions would those be?

> Ruling out the question of bias in selection of data, by including
> in the data set only that data which fit the criteria, as opposed to
> grouping the data so that different analysis can be performed, does
> deny the ability to analyze any assumptions which may, wittingly or
> not, be embodied in the criteria.

Sorry, but this makes no sense to me.

You appear to be criticizing me for selecting criteria appropriate only
to the task I have in mind, and not to other conceivable tasks that
someone else might like to pursue.  And this makes no sense.

> Since we NEED NOT exclude potential data in a permanent way,
> because of the possibility of tagging data in databases,
> we will be better served by doing so.

We need not exclude any data "in a permanent way" *regardless* of
whether we are using a database or not.

> Not all of us need do so,
> and Larry Trask himself may choose not to do so,
> but he should gracefully acknowledge that normal science
> does treat bias in data selection as a normal question to be analyzed.

I am happy to acknowledge this.

But, so far, neither you nor Jon Patrick nor anyone else has made a
single substantive suggestion as to how my criteria bias the data.
Instead, you just keep hinting vaguely that it might do so, but without
any specifics.

Remember: I'm interested in looking for generalizations about the
phonological forms of words that were in the language 2000 years ago.
Now, tell me: in what *specific* respects do my criteria bias my
findings?

All I'm trying to do is to locate the words that most likely were *in
the language* 2000 years ago, on the perfectly reasonable ground that
words which were *not* in the language 2000 years ago can shed no light
on my task and may well constitute positive interference with it.

> Also, no one should be criticized for including additional data,
> appropriately tagged (= grouped, in different dimensions).

Nobody is to be criticized for making a reasonable stab at a question he
happens to be interested in.  At the same time, no questions about the
nature of Pre-Basque words can possibly be answered without first
identifying the words that were present in Pre-Basque.  Adding to your
database words that did not exist in Pre-Basque cannot offer the
slightest assistance, and may well spoil the results.  Is this such a
difficult point to follow?

[on the example of Icelandic]

> So again, Larry Trask's criteria are not fully neutral,
> NOT EVEN self-evidence for the purpose of determining what are the
> likely candidates for proto-Basque.  They include some assumptions
> which are not necessary.

And which assumptions would those be?

Lloyd, you keep accusing me of making gratuitous assumptions, but you
have yet to name even one such assumption.  Except, of course, your
claim that I maintain that sound-symbolic words must have the same
canonical forms as ordinary words, which is simply false.

Larry Trask
COGS
University of Sussex
Brighton BN1 9QH
UK

larryt at cogs.susx.ac.uk



More information about the Indo-european mailing list