Excluding Basque data

ECOLING at aol.com ECOLING at aol.com
Mon Sep 27 03:21:49 UTC 1999


In a message dated 9/23/99 6:12:54 PM, larryt at cogs.susx.ac.uk writes:

>Hence, while the compiling of the database may require no initial
>methodology, manipulating it certainly does require some initial
>decisions.  So I don't see how working with a database, instead of with
>paper, gets around the central issue we have been discussing: the choice
>of criteria for proceeding.

It is very different,
because no "exclusion" of data need be a permanent exclusion,
because different users can choose different criteria  for proceeding,
and because the same user can change his or her mind at different times
and choose different criteria for proceeding.
One does not have to be "right" on the first choice,
there are no serious consequences for making an initial error.

With a paper method in which one cannot go back and change one's mind,
there is a truly excessive focus on being "right" the first time round.
And disastrous consequences if one is not.
Only the omniscient would prefer to be unable to change one's mind?

>From a discussion between Jon Patrick and Larry Trask

[LT]
>       But, once more: I *never* exclude a word from my
>       list because it doesn't match any generalizations about form which
>       I may have in mind.

[JP]
>> I've never asserted that you did. However I do think that your
>> criteria are designed to create an analysis that is more strongly
>> consistent with the generalisations you "think you have a pretty
>> good idea" about.

[LT]
>I flatly deny this, and I challenge you to back up your assertion.

Like Jon Patrick, I believe that Larry Trask's criteria MAY IN EFFECT
bias the results to favor hypotheses which he himself espouses.
This DOES NOT MEAN that he is consciously aware of this,
(nor that he is deliberately manipulating the data,
as he seems to have inferred he was being charged with).
Quite the contrary, it probably results from his being so convinced of
certain hypotheses that he can scarcely conceive of them not being correct.
Others may find it easier to conceive of that
(as is so often true in research, nothing unusual here).

In normal science, we do not normally allow the proponent of a hypothesis
to select the data which is to be included in evaluating the hypothesis
(this statement just given subject to many clarifications, of course,
but as a general statement it can stand this way).

The ONE area in which I can from my own knowledge say that Trask's criteria,
as I understand them from earlier messages, are indeed biasing,
was in his aim at a consistent canonical form for all of the vocabulary
he wanted to include in the data set of candidates for proto-Basque status,
specifically that sound-symbolic vocabulary must have the same canonical
forms as general vocabulary.
It is simply a fact that there are languages which have different canonical
forms for different strata of their own vocabulary, and especially this is
known to be true for sound-symbolic vocabulary.
Since proto-Basque is supposedly a real language,
we should not make any assumptions IN ADVANCE which we know
to be false when applied to languages in general.
The assumption that proto-Basque vocabulary fit a single consistent
set of canonical forms, regardless whether it was general vocabulary
or sound-symbolic vocabulary,
is precisely that kind of an assumption which we should not make.

Whether Trask's criteria bias the selection of data in any other
ways I would not venture to say.  I have no knowledge of specifics
which would be relevant.  The question is however a legitimate one,
and should remain open to empirical scientific investigation.
Trask should not feel that his honesty is being questioned,
it is merely whether his assumptions are correct that may be in question.
Ruling out the question of bias in selection of data,
by including in the data set only that data which fit the criteria,
as opposed to grouping the data so that different analysis can be performed,
does deny the ability to analyze any assumptions
which may, wittingly or not, be embodied in the criteria.

Since we NEED NOT exclude potential data in a permanent way,
because of the possibility of tagging data in databases,
we will be better served by doing so.  Not all of us need do so,
and Larry Trask himself may choose not to do so,
but he should gracefully acknowledge that normal science
does treat bias in data selection as a normal question to be analyzed.
Also, no one should be criticized for including additional data,
appropriately tagged (= grouped, in different dimensions).

Sincerely,
Lloyd Anderson
Ecological Linguistics

PS:

Jens Rasmussen has just pointed out another example.
Considering what proportion of words in a dictionary
of one date existed in an earlier form of the same language,
Rasmussen points out that this is very different from language to language,
and he wrote:

>I guess English and Icelandic are both relatively
>extreme cases. Where Basque stands between the two poles must be looked
>into ...

So again, Larry Trask's criteria are not fully neutral,
NOT EVEN self-evidence for the purpose of determining what are the
likely candidates for proto-Basque.  They include some assumptions
which are not necessary.



More information about the Indo-european mailing list