Excluding Basque data

Wed Oct 13 14:16:46 UTC 1999

We may be getting closer to some agreement on issues of fact
(as distinct from preferences).

Trask writes:

>On Tue, 5 Oct 1999 ECOLING at aol.com wrote:

>> Sound-symbolic words are
>> BOTH different in canonical forms
>> AND underrepresented in vocabularies etc.
>> and the coincidence of these two properties
>> means it is dangerous to exclude them.

>An interesting point, but I'll be taking issue with it below.

As far as I understand the rest of his message,
Trask does NOT take issue with it in what follows.
He does say it doesn't matter whether the sound-symbolic
words have different canonical forms from other vocabulary
or not.  I here quote from a later part of his message.

>> In any case, another criterion is explicit or implicit in the
>> following:

>[LT]
>>> But, in fact, the vast majority of sound-symbolic items in Basque do
>>> *not* satisfy my other criteria.  Hence my approach will immediately
>>> set
>>> them apart from the words which are the best candidates for native and
>>> ancient status.  Once that's established, *then* these distinctive words
>>> can be investigated to determine their own phonological characteristics.

>> Notice in the way the middle sentence follows the first sentence
>> that Larry treats his approach (his criteria) as if his criteria
>> were pretty much the same thing as selection for "the best
>> candidates for native and ancient status".  Others of us believe
>> that must always be kept on the surface of awareness as an empirical
>> question, not taken for granted;  it is precisely a core question!

>Something crucial has been omitted here.

I am not sure I can figure out what has been omitted.
My point remains.  Trask appears to be equating his criteria
with "best candidates for native and ancient status".

>> If what Larry says above about Basque is true of other languages
>> to a simlar degree, then should we conclude that sound-symbolic
>> items are NOT good candidates for native and ancient status,
>> in general?  That would seem to be implied by the paragraph
>> just quoted.  Yet to me that shows there must be something wrong
>> with the argument.

>My criteria are devised with Basque in mind.  Other cases may call for
>a different approach, notably in respect of my third criterion.
>Whether sound-symbolic words are generally not ancient, I don't know,
>but I have no particular interest in this question anyway.  I'm only
>interested in the Basque case.

This does not respond to my point.  My point was that this appears to
be a reductio ad absurdum of the approach, because it seems to be
implied that sound-symbolic items are not good candidates for native
and ancient status.  That conclusion must I think be false,
UNLESS one means by it circularly that words which undergo
reformations not in accordance with the sound laws applicable to
the bulk of the vocabulary, reformations entirely internal to the
language in question, or even words which persist unchanged despite
sound changes which apply to other vocabulary, are not native or ancient.
To me, it is simply that these words are subject to a different set of
sound changes (or lack thereof), they are no less native for sure,
and arguably no less ancient since their antecedents in direct line of
descent existed in an earlier form of the language.

To doubt that last part seems to be to doubt that earlier forms of
various languages had sound-symbolic words, or if they did,
to doubt that those words are in any reasonable sense cognate
(parent) to any of the current sound-symbolic words, that is,
that sound-symbolic words are so unstable as to prevent any
reasonable sense of inherited vocabulary from being applicable.
I think most linguists would reject that conclusion.  Perhaps
there is some way of avoiding it, but it seems to me to follow
logically.

>But the point is not whether <pinpirin> is borrowed or not (I'm pretty
>sure it's not), but whether it's *ancient* or not.

>Now, I consider it most unlikely that the severely localized word
><pinpirin> has been in the language for millennia, all that time
>violating the ordinary phonological structure of the language and
>refusing to participate in otherwise categorical phonological changes.

I don't so quickly come to that judgment.
It appears to be rather common for sound-symbolic words.

The example of French "papillon" giving rise by regular sound
change to "pavillon" when in the meaning of English pavillion,
not sound-symbolic, but being retained unchanged as "papillon"
in its sound-symbolic (extended sense, movement-symbolic?)
sense of "butterfly"

[Trask asked for the limits on when words are resistant to
sound changes which apply to other vocabulary, because
I had explicitly noted that we don't want that to be applied
loosely so as to reduce our rigor.  My reply is partly
that using the term "sound-symbolic" in its most general sense
does give the limits.  The other part is that we have to empirically
determine what those limits are, by discovering examples.
Therefore, the difference between "pavillion" and "butterfly"
seems to be a difference between two meanings, one not
in the sound-symbolic domain, the other in that domain.
As an aside, I will add that I have been interested in this
problem for a very long time, and have discovered it also in
the historical changes of deaf communities' "signed languages",
where sometimes in a pair of etymologically related signs,
the sign with the more concrete meaning retains its form, while the
sign with the more abstract meaning undergoes changes
of execution, what we would refer to as reductions and simplifications.]

>> It's a fact of reality.

The fact that this makes our task harder does not argue for or against
the validity of the statement that inherited "sound-symbolic" words
sometimes do not undergo sound changes.  They are nevertheless
inherited.

Trask suggests that the following example is wrong.
I should have said that I took it on the authority of
Dwight Bolinger, a linguist specializing in English linguistics
who was a president of the Linguistic Society of America,
who believed that "teeny" was regenerated (I think that was
his word).
(He was also labeled a "premature anti-fascist" for his volunteer
participation in the war against Franco in Spain -- that doesn't
prove he was right about "teeny" and "tiny", of course,
but I thought people might like to know that.)

[LA]
>> (English "tiny", which went through the great vowel shift,
>> and "teeny", which did not go through the great vowel shift,
>> was retained or regenerated or reborrowed from a dialect,
>> would be a similar case,
>> unless the dialect-borrowing solution is adopted.)

[LT]
>No.  The earlier `tine' went through the GVS normally and produced
>`tiny', as expected.  The form `teeny', in all likelihodd, is a later
>re-formation, derived from sound-symbolic factors.  The OED tells me
>that `teeny' is nowhere recorded before 1825 -- long after the GVS --
>and suggests that it probably originated in nursery language.

We are not too far apart here, except that Trask should have said
"Yes" to the first sentence, which he was actually agreeing with.
In this next sentence he could have said "No" or "But not" or whatever.
His "in all likelihood" should be emphasized,
that is, we really don't know for sure.
I gave three possible scenarios.
But the outcome of each of them is the same.
What is now SPELLED "teeny"
is pronounced rather similarly to what was earlier SPELLED "tine",
when the final "e" was still pronounced and the "i" was pronounced
as in "machine".
So was it retained or re-formed much later?
We know that spelling changes lag behind speech.
And we know that first attestations which we happen to have evidence
for may be later than first usages, often by a large time span.
So the conclusion is not obviously the one Trask prefers.

Trask does not mention the case of French "pavillon / papillon".
Does he believe that "papillon" was lost and then regenerated,
and thus "not ancient" or even "not native"?
I assume he would not want to claim either of the latter two.
If not, then use that example instead of "tiny / teeny".

[LA]
>> Among Larry Trask's other criteria were the distribution across all
>> of the dialects, not the occurrence in only a few. As I took pains
>> to point out in previous messages, that criterion is biasing.
>> However reasonable, even obvious, it may appear to a historical
>> linguist (including seeming obvious to me too, I may add) it may
>> still disproportionately bias against sound-symbolic words, because
>> of the spotty record of those who record vocabularies in not
>> collecting such words, thereby reducing the number of dialects in
>> which they are attested, quite independently of whether they
>> actually were used in those dialects. The entire class of such words
>> may not be recorded, or very few of them, it is not simply that
>> their recording is randomly slightly less full.

>> And this enormous underrepresentation can then indirectly lead to
>> initial conclusions on canonical forms which are too simple and
>> neat, too consistent, including canonical forms which are
>> underrepresentative of sound-symbolic forms.

[LT]
>At last a point of substance!  I thought I was never going to see one.
>OK.

I thought I had made exactly this consequence clear many times,
even if not in so many words.
I had at least stated the conclusions of it.
But I'm glad if we are now understanding each other.

[LT]
>Let's assume this point is valid.  What are the consequences?

>Well, either sound-symbolic forms conform to the canonical forms of
>ordinary lexical items, or they do not.  If they do, there is no
>problem.

But  Trask has said previously that the
expressive vocabulary in Basque DOES differ in canonical forms
from other vocabulary, so he believes the first alternative does not
apply.  Here is his second alternative:

>If they don't, then, assuming that many of them get into my
>list in the first place, I'm going to have two sharply distinct groups
>of words obeying different rules.  Also no problem.

But Trask himself argues AGAINST the latter case occurring.
He actively wants to prevent it "in the first place",
and only to include them later. He says that his preference is
to exclude nursery words etc.
He really does want  to prevent the inclusion of nursery and expressive
forms.  He believes these forms do not follow what he regards as the
normal sound laws and that they violate
the normal canonical forms (his comments on "pinpirin").
Since lack of attestation may correlate with this, use of the criterion
of lack of sufficiently wide attestation DOES tend to exclude forms
of certain formal types.
He says that he wanted to exclude by an explicit criterion,
but that he is not too unhappy if others don't want that particular
exclusionary criterion, (? because he believes that ?)
his other criteria will exclude most nursery words anyhow.

The assumption that "many" of them will get into Trask's list in
the first place is exactly what much of this discussion has been about,
namely,
his criteria will tend to prevent many of them from getting into his list.
Notice again his strong antipathy towards the word "pinpirin":

>Now, I consider it most unlikely that the severely localized word
><pinpirin> has been in the language for millennia, all that time
>violating the ordinary phonological structure of the language and
>refusing to participate in otherwise categorical phonological changes.

(even aside from the fact that "millennia" is not required to reach
the level of the early Basque of the 16th century which Trask
otherwise prefers as his starting point for data, to project
further backwards)

>Perhaps I haven't made it clear that I am also very interested in
>characterizing the expressive formations.  But I first want to
>characterize the forms of ordinary lexical items, before I turn my
>attention to the expressive formations -- for one thing, because it's
>easier to see what's special about expressive formations if I already
>know what ordinary words look like.

In that case, the best possible way is to include expressive formations
in the data set from the very beginning, mark the ones we are
reasonably sure are "expressives" because of their semantics,
and notice what may be different about them,
eventually perhaps slightly revising our notions of which
items should be considered "expressives" (hopefully not in
a circular way, simply because they resist sound changes,
but even simply listing those which do resist sound changes
would be useful, if we can do that).

>So: apply my primary criteria; get a list of candidate ancient words;
>determine their phonological properties; then look at expressive
>formations (mostly excluded from my list by my primary criteria) and
>identify the differences.  Now: what exactly is wrong with this?
>And what *different* procedure could give better results?

This explicitly states that the expressives would be mostly
excluded from his list by his primary criteria.  That comes very
close to contradicting the possibility that "many" of them could
be included "in the first place".  We can't have it both ways.

What is wrong with that, he asks?
Answered in my preceding paragraphs.
The human mind is known to be better at marginally distinguishing
similar items put before it in comparison,
than it is in properly categorizing
similar items put before it without overt contrast.

[LA]
>> All (?) languages have more common and rarer forms,
>> and have peripheral forms,

[LT]
>Sure.  But how can I tell that a particular form is rare unless I first
>determine what the common forms are?

By contrasting them explicitly, as just pointed out.

[LT]
>But what I'm trying to do is precisely to identify the damn
>strata in the first place.

Again, best done by including them in the data set,
and learning how to mark them as belonging to different strata,
gradually with increasing accuracy.

Larry Trask should use whatever sequence of investigations
he is most comfortable with.  But he should also be careful
that he does not allow the order of his investigating various
strata, an order of his own choosing, not a property inherent
to the language itself, to bias his conclusions.
That is, in part, what we have been discussing.

Trask wants to draw firm conclusions from his initial steps
with his initially selected strata of the vocabulary,
and it appears he would not be eager to change those conclusions
from the later results of investigating other strata.

>I have also explained that, in addition, I would prefer to exclude
>obvious nursery words and obvious imitative words at the outset, for
>excellent reasons.  But I don't mind if others prefer not to do this.
>It isn't going to make much difference anyway, since very few of these
>words will satisfy my primary criteria.

Why should the "primary" criteria be systematically selective of
one stratum of NATIVE vocabulary against another stratum of NATIVE
vocabulary.  Should that not be considered a defect in criteria
which are claimed to be ideal for identifying the best candidates
for native and ancient vocabulary?

Rather, the criteria should be advertised for what they then are,
criteria for identifying ONE stratum WITHIN the native and ancient
vocabulary, a stratum excluding nursery and expressive words
(and excluding vocabulary in those semantic domains and
in those subject matters not dealt with in earliest documents,
as discussed in another message).

If the criteria are stated fully explicitly for what they are,
then the conclusions drawn from them will have their
inherent limitations made more explicit.
That will be a courteous service to those who might want
to use the results.  It of course means the results are less
sweeping or definitive.  Such are the good consequences
of being clear and open about what one is doing.

I will be very glad if it turns out that we are getting somewhat closer.

Sincerely,
Lloyd Anderson
Ecological Linguistics