Excluding Basque data

Larry Trask larryt at cogs.susx.ac.uk
Thu Oct 7 10:59:53 UTC 1999


On Tue, 5 Oct 1999 ECOLING at aol.com wrote:

> Sound-symbolic words are
> BOTH different in canonical forms
> AND underrepresented in vocabularies etc.
> and the coincidence of these two properties
> means it is dangerous to exclude them.

An interesting point, but I'll be taking issue with it below.

> As to sound-symbolic words,
> Larry says he does NOT insist on their having the same canonical
> forms as other vocabulary, does not exclude them on that basis.

That's right.

> (I surely thought he was using canonical forms as one of his
> criteria for setting up his initial lists?  Was that not true?

Certainly not.  As I have made clear repeatedly, my primary criteria are
early attestation, widespread distribution, and absence from neighboring
languages.  Nothing remotely phonological there.

I have also explained that, in addition, I would prefer to exclude
obvious nursery words and obvious imitative words at the outset, for
excellent reasons.  But I don't mind if others prefer not to do this.
It isn't going to make much difference anyway, since very few of these
words will satisfy my primary criteria.

> In any case, another criterion is explicit or implicit in the
> following:

[LT]

>> But, in fact, the vast majority of sound-symbolic items in Basque do
>> *not* satisfy my other criteria.  Hence my approach will immediately set
>> them apart from the words which are the best candidates for native and
>> ancient status.  Once that's established, *then* these distinctive words
>> can be investigated to determine their own phonological characteristics.

> Notice in the way the middle sentence follows the first sentence
> that Larry treats his approach (his criteria) as if his criteria
> were pretty much the same thing as selection for "the best
> candidates for native and ancient status".  Others of us believe
> that must always be kept on the surface of awareness as an empirical
> question, not taken for granted;  it is precisely a core question!

Something crucial has been omitted here.

However, for about the fifth time: if you think my criteria are
imperfect, then *what other criteria* do you propose for the task of
identifying the best candidates for native and ancient status in Basque?
How about an answer to this question, at last?

> If what Larry says above about Basque is true of other languages
> to a simlar degree, then should we conclude that sound-symbolic
> items are NOT good candidates for native and ancient status,
> in general?  That would seem to be implied by the paragraph
> just quoted.  Yet to me that shows there must be something wrong
> with the argument.

My criteria are devised with Basque in mind.  Other cases may call for a
different approach, notably in respect of my third criterion.

Whether sound-symbolic words are generally not ancient, I don't know,
but I have no particular interest in this question anyway.  I'm only
interested in the Basque case.

> That is not to say we treat sound-symbolic
> words exactly the same way as other words.
> Just as the English "pavilion" from French "pavillon"
> is the normal French development by the sound laws, and "papillon"
> is a sound-symbolic form which has resisted a sound change,
> so we might suspect Basque <pinpirin> may have not undergone all of the
> sound changes which most of the vocabulary did in the history of
> Basque, independent of the question whether it is a recent loan
> or some primaeval vocabulary item inherited from 10,000 years ago.
> So even the supposedly air-tight logic of sound laws cannot be
> used unequivocally to include or exclude vocabulary from native
> vs. borrowed categories.

In Basque, borrowed words develop phonologically just like native words,
modulo date of entry into the language.

But the point is not whether <pinpirin> is borrowed or not (I'm pretty
sure it's not), but whether it's *ancient* or not.

Now, I consider it most unlikely that the severely localized word
<pinpirin> has been in the language for millennia, all that time
violating the ordinary phonological structure of the language and
refusing to participate in otherwise categorical phonological changes.
But that's not the point.

The point, yet again, is that it is a waste of time to try to focus, at
the outset, on sweeping up everything that *might* be native and ancient
in Basque -- even highly implausible cases like <pinpirin>.  The point
is to identify those words which have the *strongest* claim to being
native and ancient -- hence my criteria.

> A shocker, and not a wild card we want to use without severe limits
> or controls, or else the entire enterprise falls.

OK.  And just what "severe limits and controls" would those be?
*How about an answer to this?*

I've already proposed my severe limits and controls.
What are yours?

> But a shocker we cannot escape by waving magic wands
> or waving words.

And just what "magic wand" do you think I'm waving?

> It's a fact of reality.
> (English "tiny", which went through the great vowel shift,
> and "teeny", which did not go through the great vowel shift,
> was retained or regenerated or reborrowed from a dialect,
> would be a similar case,
> unless the dialect-borrowing solution is adopted.)

No.  The earlier `tine' went through the GVS normally and produced
`tiny', as expected.  The form `teeny', in all likelihodd, is a later
re-formation, derived from sound-symbolic factors.  The OED tells me
that `teeny' is nowhere recorded before 1825 -- long after the GVS --
and suggests that it probably originated in nursery language.

> Among Larry Trask's other criteria were the distribution across all
> of the dialects, not the occurrence in only a few. As I took pains
> to point out in previous messages, that criterion is biasing.
> However reasonable, even obvious, it may appear to a historical
> linguist (including seeming obvious to me too, I may add) it may
> still disproportionately bias against sound-symbolic words, because
> of the spotty record of those who record vocabularies in not
> collecting such words, thereby reducing the number of dialects in
> which they are attested, quite independently of whether they
> actually were used in those dialects. The entire class of such words
> may not be recorded, or very few of them, it is not simply that
> their recording is randomly slightly less full.

> And this enormous underrepresentation can then indirectly lead to
> initial conclusions on canonical forms which are too simple and
> neat, too consistent, including canonical forms which are
> underrepresentative of sound-symbolic forms.

At last a point of substance!  I thought I was never going to see one.
OK.  Let's assume this point is valid.  What are the consequences?

Well, either sound-symbolic forms conform to the canonical forms of
ordinary lexical items, or they do not.  If they do, there is no
problem.  If they don't, then, assuming that many of them get into my
list in the first place, I'm going to have two sharply distinct groups
of words obeying different rules.  Also no problem.

Perhaps I haven't made it clear that I am also very interested in
characterizing the expressive formations.  But I first want to
characterize the forms of ordinary lexical items, before I turn my
attention to the expressive formations -- for one thing, because it's
easier to see what's special about expressive formations if I already
know what ordinary words look like.

So: apply my primary criteria; get a list of candidate ancient words;
determine their phonological properties; then look at expressive
formations (mostly excluded from my list by my primary criteria) and
identify the differences.  Now: what exactly is wrong with this?
And what *different* procedure could give better results?

> Such initial conclusions on canonical forms can then have
> cascading secondary effects on inclusion or exclusion of additional
> vocabulary in the lists.  Even if /bat/ (from the recent discussion of
> /bade/ /bedere/ etc.) were the only form included with a final
> stop consonant, being exceptional would not prove it is not part of
> pre-Basque.

I agree with this.  But, remember: I am not excluding this word, because
it satisfies my primary criteria.

> All (?) languages have more common and rarer forms,
> and have peripheral forms,

Sure.  But how can I tell that a particular form is rare unless I first
determine what the common forms are?

> especially when the ambiguities of
> fast speech and slow speech borderlines are considered,

Not of central importance here, I'd say.

> and when a few short high-frequency items are considered.

Yep, but note particularly that wording: "a few".

> Yet we must also pay attention to differences of canonical forms,
> as they sometimes DO clue us in to different strata of vocabulary
> which may be relevant in historical-comparative studies.

No doubt.  But what I'm trying to do is precisely to identify the damn
strata in the first place.

That's enough for now.

Larry Trask
COGS
University of Sussex
Brighton BN1 9QH
UK

larryt at cogs.susx.ac.uk



More information about the Indo-european mailing list