Basque Criteria 10 -17 for inclusion
ECOLING at aol.com
ECOLING at aol.com
Fri Mar 3 05:43:14 UTC 2000
The discussion of "criteria" when seeking candidate vocabulary
which descends from earlier stages of a language is clearly
relevant whenever we seek such vocabulary,
is is relevant to much more than Basque.
Most of the particular suggestions made also apply to more than just Basque.
This message contains additional material to support
a more sophisticated and nuanced set of techniques for selecting items
to include as potential monomorphemic Old Basque lexicon.
The core concept underlying all of these specific suggestions
for modifying Trask's criteria is that we are dealing with gradient
and fuzzy matters, matters of degree, ones not susceptible to
sharp yes-or-no decisions. Though each criterion or perspective
by itself may give a clear result in special cases, it need not,
and the combination of criteria can give a complex of information
which we can weight different ways. If we "tag" the items
with multiple scores in a computer database, we can then change
the weightings as seems appropriate to consider different hypotheses.
This message adds criteria 10-17 to the earlier message titled
"9 specifics on Including and excluding data"
sent on 30th October, 1999.
Larry Trask recently again challenged that he had not received
specific alternative suggestions for criteria for inclusion
of words when seeking ancient native monomorphemic lexical items
for Basque. That statement is false on its face,
I simply refer to the message just mentioned from 30 October, 1999,
which should be in our archives, with the additions here.
And Trask's statement is also false on its face based on Trask's own
message of 9th December, 1999, in which he admitted that he had
received specific suggestions:
>OK. Lloyd Anderson has raised a number of specific points concerning
>my criteria for assembling a plausible list of Pre-Basque words.
>His posting is too long to address at one go, so I'll try to deal with it
>in a series of postings, one for each point raised.]
No need to go over the same ground again.
Let's start with a point of agreement:
>So, Lloyd, are you now agreeing to the following?
> Expressive formations should be subject to no special treatment
> at all, but must be treated just like all other words, according
> to exactly the same criteria, whatever those are.
>Yes or no?
Of course I agree with this, and always did agree.
I have only protested against criteria which had biases against
expressives or other strata of vocabulary,
not advocated that we should have some a priori criteria
to include specifically expressives or any other words
even if they did not satisfy other *reasonable, legitimate, unbiased*
criteria. I emphasize the latter part deliberately,
because criteria cannot stand as valid judges of other matters
unless the criteria themselves are first judged.
***
I'll continue here with principled, fully general criteria
which differ from Trask's:
***
10. bias against longer words
Trask has stated the goals of his collection this way:
>native, ancient and monomorphemic lexical items. That's all.
>I think I've been pretty explicit about this.
Yet in the same message where he stated this, he also
replied as follows:
[LA]
>> The exclusion of expressives,
>> *or systematically of any other group of words*
>> (such as the longer words, as noted above),
>> through any aspect of the sampling procedure,
>> would of course tend to invalidate such general validity.
[LT]
>No; not at all. Polysyllabic words are excluded by definition: they are
>not relevant to my task.
I'll assume that was a typo, "polysyllabic" instead of "polymorphemic",
because otherwise polysyllabic words are *not* excluded by the definition
of goals just quoted. They certainly are *not* excluded automatically
by any principles of reliable historical reconstruction!
***
11. "polymorphemic" not intended synchronically
But it may not be a typo, it may rather reveal some interdependence of
criteria in Trask's thinking.
Assuredly, polysyllabic words are more likely to be polymorphemic,
and the skilled and knowledgeable analyst may be able to segment
many polysyllabic words into etymologizable parts. Some of these
results must certainly reflect the psychological reality of the morphemes
for the speakers.
But if we take the concept seriously in a synchronic sense,
then we must recognize that also in ancient languages,
even proto-languages, there may be many words
which demonstrably once were morphemically composite,
which yet for the speakers of the proto-language were single morphemes.
Trask does not seem to recognize this problem.
Most or all languages we know of do contain historically polymorphemic
vocabulary items which are synchronically monomorphemic.
So we must be willing to reconstruct such words for any proto-language
also. This is the error of over-analysis, over-segmentation.
Trask (in another message today) classes English "vixen" as
"Bimorphemic in English",
I do not understand a synchronic basis for his doing so.
It was polymorphemic at one time, but surely not in English now.
There is no other word in the American Heritage Dictionary beginning
with "vix-", and there is no English feminine ending "-en" sufficiently
salient
that I can think of a word with it right off hand, though that may be my
personal mental limitation of the moment.
(I could only think of "oxen", "oven", "coven", "maven", "raven",
"maiden".)
So "vixen" is not even as decomposible as the
famous "cranberry" where at least "-berry" is obvious.
This has been the pattern of Trask's remarks on a host of other items,
where he classes them as polymorphemic if he can *etymologize*
them as multiple morphemes, not if they are polymorphemic
in a sychronic analysis of the language itself.
That tends to exclude words illegitimately by my understanding
of the goals Trask has stated for himself. As we use "polymorphemic"
more and more loosely, we make the restricted monomorphemic
set included by a set of criteria less and less representative of
the language as a whole. Representativeness of the language as a whole
is of course not Trask's aim when he states his goal explicitly
including the criterion "monomorphemic".
But it is a relevant way to evaluate how he states his conclusions,
and it is my distinct impression that he very often states his goals
without that limitation, as if his results could then have a wider
validity, as if he had not restricted himself to monomorphemic words only.
Here is one, from Trask's message of 9th December, 1999,
quoted more fully elsewhere in this message today:
>for assembling a plausible list of Pre-Basque words
Notice that this did *not* specify
"a plausible list of monomorphemic Pre-Basque words".
It might be inferred from context, and he has stated
the monomorphemic criterion elsewhere, but as I said,
I think he tends to drop that limitation, and therefore to
end of in effect claiming a wider validity of his eventual
conclusions than is warranted by the severe limitations
he imposes on his data.
***
12. Reduplications are not polymorphemic unless the
unreduplicated form also occurs.
This is elementary. The term "reduplication" is rather often
applied to words that are primary, merely because they have
the same consonant or even syllable as their first and second.
It is often applied to nursery words and expressive words.
But "dad", "mom", "mommy", "daddy", etc. are not polymorphemic,
by a careful use of the criteria for morpheme division.
Not even "mama", despite "ma" which seems synchronically
to be a shortening of "mama" not the reverse.
Trask has not explicitly said, so far as I know, that
reduplications are polymorphemic, but I suspect he has tended
to think of them that way. I'll be happy if this is not the case.
***
13. Words in ancient Basque, vs. words descended from words
in ancient Basque? Identity requirements popping up when only historical
descent is relevant
>Obviously expressive words hardly ever satisfy
>my criteria, from which the most appropriate conclusion appears to be
>that *these particular words* are not ancient.
[That is not the most appropriate conclusions
if the criteria are biased against expressives, even indirectly,
as i believe I have shown Trask's are]
>Of course, Pre-Basque
>doubtless possessed *some* expressive words, but there is no evidence to
>support a claim that these were identical to the modern ones.
Identity is not the requirement, surely.
Here again, a typo: Presumably Trask means that they were not
the ancestors of the modern ones? Because if identity is required,
we have yet another criterion introduced which was not implied by
his definition of goals, and which would further reduce the vocabulary
which he admits into his collections. He *has* used this kind of
wording at other points, as in the discussion of whether a range
of forms which show partial resemblances to each other,
as for 'butterfly', warrant the assumption of some proto-form.
For 'butterfly', he argued there was too much variation,
that none of the forms was ancient.
That is not the relevant question, a relevant question is
rather whether any of the forms *descend from* antecedents,
which were part of Basque at an earlier stage.
This may seem like nit-picking, but I think it is not,
or I would not mention it. I believe it is merely one of the
steps Trask takes which lead his sample to be rather unrepresentative,
not merely of ancient Basque, but even of ancient Basque
monomorphemic words, Trask's expressed goal.
Taking off from Trask's use of the word "modern" just above,
I do not accept any sharp temporal cutoff date
as a legitimate part of historical linguistic inquiries
in attempting to determine which words descend from
ancestors in their language, because of the demonstrable
occurrence of systematic biases in exclusion of some sorts
of items from written attestations, dependent on culture
and other factors...
On Basque words for 'badger', Trask today expressed
what I regard as a more inclusive attitude about historical descent
rather than identity of words,
though still excluding these words by their date of attestation.
After some considerable discussion of others' hypotheses, he says:
>Who knows? Not sure what to do with this, but it looks too fishy to go
>straight into the list. Anyway, not recorded before 1745, and therefore out,
>even though I agree at once that the numerous and peculiar regional variants
>point to a much older word.
The last clause is for me sufficient to justify study and inclusion
in any list of the best candidates. Date of attestation by itself is a
very minor influence. Even lack of regular sound correspondences
is very minor, given the knowledge that irregular historical descent
is not rare.
The same phrasing, applied to words for 'butterfly',
as I hope to show in future discussion of them,
after Trask has had his opportunity to comment on my first analysis,
would appropriately suggest that they "point to a much older word",
just as for words for 'badger'. That discussion will of course
be based on the facts of the words for 'butterfly' and of the
patterns of sound changes and irregularities in Basque, etc..
***
14. Use of patterns dominant in Basque to downgrade words
which do not fit the dominant patterns.
(This objection, pointing to an alternative to Trasks application
of his criteria, may have been part of the earlier list of nine,
though I do not at present recall that it was;
but because new concrete examples make it relevant,
I highlight it here. It has at least new application now.)
One of Trask's objections to inclusion of words in his lists,
which influences him to regard them as loanwords or
inventions, is that no native Basque words have two voiceless
stops, or voiceless stops beginning the first syllable.
(I may not have stated that exactly right, but the general
point is clear.)
On Basque 'chick', which he points out is the only word
for a small animal (from a list) which is *not* formed with
the suffix -(k)ume 'offspring' and thus polymorphemic,
he writes:
>'chick' is the obviously imitative
> <(t)xito> ~ <(t)xita> ~ <txitxa>.
>This last word will probably meet my
>criteria, but will stand out a mile.
Well, but if it is included, then at least that word
with two voicelss stops is presumptively part of ancient Basque
(if it meets other legitimate evaluations to a sufficient degree).
And if that one is included, then others with two voiceless stops
must not be downgraded on that basis, nor must it be argued
on that basis alone that they are non-native, borrowings or even
recent inventions.
But then one of the words for 'butterfly',
"pitxilota", is also a good candidate for inclusion.
Its four-syllable status does not by itself prove it to be
non-monomorphemic, though it may be.
Nor do the two voiceless stops prove it to be a loan.
***
15. The use of mere suspicion to exclude items,
or of occurrence in neighboring languages,
even when absent from the closest relatives of the
neighboring languages which are not proximal to the
language of focus (Basque).
Trask writes, concerning "bill" (of bird) that the form
<moko>, variant <mosko> may possibly qualify on his other
criteria, but continues:
>But the widely held belief in a
>Romance origin will probably disqualify it.
A belief, no matter how widely held, should not be
considered relevant at all. Evidence is relevant.
Perhaps Trask had some which he did not mention
because it did not seem germane or important at the moment.
But this may possibly go along with
Trask's exclusion of items which occur *only*
in Iberian Romance and in Basque.
For such a distribution of attestations, lacking any other evidence,
I think standard linguistic methodology dictates a conclusion
that the item was in early Basque and borrowed into Romance,
rather than the other way round;
or else perhaps in a "substrate", borrowed into both Ibero-Romance
and Basque, but with no reason to prefer this second explanation.
As has been pointed out by careful historical linguists,
supposed substrates should not be appealed to without direct
evidence of their existence, they are a wildcard.
***
16. In addition to all the other restrictions, there is an implicit one
against verbs, because
>1. No ancient Basque verb is monomorphemic.
>A native verbal root is a bound morpheme,
>and hence no ancient verb will make my list.
While there is certainly nothing wrong with Trask seeking
the canonical forms of native ancient monomorphemic
lexical items, it would be appropriate to join any conclusions
drawn with the point that of course no verbs are included at all.
So the validity of any conclusion becomes yet again more
narrowly limited.
Trask has been explicit about this fact of Basque verbs
this is really a point about evaluating
whether the results of Trask's criteria can be representative
of an interesting portion of Basque. Every restriction of
course reduces the range of any conclusions. The exclusion
of verbs also does so, whether specified explicitly or not.
It is well known that verbs can have different canonical forms
from non-verbs, in some languages, so it is important to point
out very prominently this kind of exclusion of verbs,
if one is studying canonical forms.
I do not wish to claim more on this point than literally just that.
***
17.
This is in one sense not a new criterion,
but in another sense it is, and it is convenient to refer to it
with a new number. It is an example of one noted long ago.
Range of distribution among dialects should be *relative to* the
number of dialects which can be included in the sample.
One reason a dialect cannot be included is that no word was
recorded for the concept in question.
Another reason, almost the same one in effect,
is that a loanword has replaced whatever the dialect would have
had otherwise. The last point is what makes this item a new item.
Trask writes:
>For example, 'pine tree' is the Latino-Romance loan <pinu>
>almost everywhere, while the eastern dialect Roncalese has <ler>
>and its neighbor Zuberoan has <leher> in some varieties.
>It is highly possible that <leher> ~ <ler> represents an indigenous
>word displaced almost everywhere by the loan word,
>but I can't be sure of this, and the word does not qualify for
>inclusion.
Admittedly any item could be better if attested in more rather than
fewer dialects, but in this case, the form <leher> ~ <ler>
is attested in 100% of the dialects where there is no loanword <pinu>.
100% is a rather high number. Very different from a case in which
a non-loanword is attested in place of <pinu>.
The problem of criteria for the "best",
but not really meaning the "best", recurs here.
Lloyd Anderson
Ecological Linguistics
More information about the Indo-european
mailing list