9 specifics on Including and excluding data (1)

Larry Trask larryt at cogs.susx.ac.uk
Mon Nov 1 15:27:15 UTC 1999


[ Moderator's note:
  Mr. Trask sent the 9 individual messages he mentions below, but in order to
  relieve some of the backlog, I have taken the liberty of combining them into
  a single digest-like message.
  --rma ]

--------------------------------------------------------
Date: Mon, 01 Nov 1999 15:27:15 +0000
From: larryt at cogs.susx.ac.uk (Larry Trask)
Subject: Re: 9 specifics on Including and excluding data (1)
Message-Id: <E11iJMZ-00037v-00 at rsunx.crn.cogs.susx.ac.uk>

OK.  Lloyd Anderson has raised a number of specific points concerning my
criteria for assembling a plausible list of Pre-Basque words.  His posting is
too long to address at one go, so I'll try to deal with it in a series of
postings, one for each point raised.

But first recall what I'm trying to do.  I'm interested in determining the
morpheme-structure conditions for Pre-Basque lexical items.  Note: *not* the
phonotactics of word-forms generally, but the morpheme-structure conditions of
monomorphemic lexical items.

In order to undertake this, of course, I must first assemble the best possible
list of monomorphemic words which were most likely present in Pre-Basque, and I
must put them into the forms which they most likely had at the time.

>  I have repeatedly expressed my suggestions for improving criteria.
>  That *includes* dropping some.

>  This message is not a mere repeat listing of what has been posted
>  previously.
>  To make it more useful, I haved restated some crucial parts
>  which Trask missed in referring to them, as well as adding further
>  *explanations* and *examples*, which most readers will see
>  as merely details implied by what was already stated.

>  Number one.
>  Counteracting biases of documentation by subject matter.
>  Previously stated, as Trask now agrees,
>  though his restatement makes it appear rather trivial,
>  losing its principled basis and therefore greatly reducing its reach.

>  Number one is not merely the 1700 rather than 1600 cutoff date,
>  but was based on a more principled suggestion
>  that we should avoid biasing by the sheer accident of the limited
>  nature of available documentary evidence for particular time periods.
>  In attempting to find the oldest native Basque vocabulary,
>  there will be semantic domains which are essentially excluded
>  by such sheer accidents, and for these we can take the earliest
>  documentary evidence available which covers those semantic
>  domains, not quite "whatever the date", but with considerable
>  leeway in accepting dates later than 1700 if necessary to get
>  documentation for a particular subject matter.
>  The point was NOT the date (1700 vs. 1600),
>  the point was to avoid the accidents of exclusion.
>  Its implications are both much broader and much more specific.

I have already explained that I am not wedded to a cut-off date of 1600, and
that I am prepared to consider 1700 instead, though certainly nothing later.

The material available before 1600 includes a sizeable body of medieval
fragments -- words, names, phrases -- plus a long personal letter, a volume of
poems (religious, secular, amatory), some religious works (mostly
translations), a dictionary of a western variety, and a volume of proverbs.
The 17th century adds a lot more religious works (including some which are
original, not translations), an unpublished dictionary, a textbook of one
dialect of the language, one or two practical handbooks, another collection of
proverbs, the histories and poems of Oihenart, and a few miscellaneous items.

Will it make any difference which date we choose?  Maybe it will, but I remain
to be persuaded of this.  Most of the strongest candidates for native and
ancient status appear to be recorded very early: for example, <behi> 'cow'
(1562), <buru> 'head' (1042), <gorri> 'red' (15th century), <sei> 'six' (1415),
<gizon> 'man' (15th century), <ate> 'door' (15th c.), <handi> 'big' (1262), and
so on.  It is far from obvious that adding another century to the database will
make any great difference.  And it's even less obvious that any resulting gain
will not be badly offset by the addition of a large number of words of late
origin in Basque.  We're already pretty far away from AD 1: I am not eager to
move even further away merely in order to collect a handful of overlooked
words.  As always, I want the *strongest* candidates, not every possible
candidate.

As for particular semantic domains, I have already commented in an earlier
posting.

First, most of the Basque words in specific domains like seafaring, law,
religion and technology are either obvious loan words or obviously
polymorphemic, and hence of no relevance to my task.  There is no point in
worrying about them.

Second, I find it impossible to believe that the native and ancient words
peculiar to any particular domain, insofar as there are any, should
*systematically* differ in form from other words.  If, as I presume, Pre-Basque
had morpheme-structure cnstraints (all languages do), then there is no reason
to suppose that these constraints varied according to the meanings of the
words.  Is there any language on earth in which, say, kinship terms or color
terms or agricultural terms are systematically constructed according to
different phonological rules from other words?  No?  Then why should I worry
about this in the case of Basque?

It is doubtless inevitable that my list of the strongest candidates will mostly
consist of what might reasonably be called 'basic vocabulary'.  It is hardly
likely that specialist terms from particular subject areas will feature
prominently in my list.

If Lloyd still wants to query this, then I suggest that he should identify some
particular semantic domains of the kind he has in mind, and we can take it from
there.

Larry Trask
--------------------------------------------------------
Date: Mon, 01 Nov 1999 15:40:24 +0000
From: larryt at cogs.susx.ac.uk (Larry Trask)
Subject: Re: 9 specifics on Including and excluding data (2)
Message-Id: <E11iJZI-0003ku-00 at rsunx.crn.cogs.susx.ac.uk>

OK.  Part 2.

>  Number two.
>  Breadth of attestation required made proportionate to
>  breadth of documentation by subject matter.
>  Previously stated.
>  Not noted by Trask in the message to which I am replying.

I'm afraid I don't find it easy to follow this.

Most of the early Basque texts are translations of religious documents or works
of Christian apology.  The remaining few were listed in my last posting.  Many
other conceivable subject areas are not explicitly treated at all before the
late 19th or 20th century -- far too late for my purposes.  For example, Basque
traditions of household management and of inheritance turn up incidentally in
some early texts, but are not overtly treated as such before the late 19th
century -- at least in Basque.  Some earlier materials exist in Spanish, and
may provide some attestations of individual Basque words, but I don't expect a
lot here.  Anyway, once again, few of the relevant terms are both native and
monomorphemic.  I really don't see any ground for concern here.  And, as
before, I can see no reason to suppose that native words in different semantic
areas might be constructed according to different phonological rules.

What earthly *difference* does it make what semantic domain a word belongs to?
I'm interested in phonology, not in semantics.

>       I also proposed a still more refined approach in which the
>  number of dialects we wish to have represented would vary
>  precisely in order to counteract the accidents of preservation of
>  documents in particular subject matters in only some dialects.
>  If for example documents referring extensively to colors
>  were only attested in three dialects, then attestation in only two
>  dialects might count as sufficient to satisfy adequately
>  the criterion of breadth of attestation.

This is not the case with color terms.  In fact, it does not appear to be the
case with any semantic domain I can think of.

Anyway, this proposal strikes me as impossibly complex in practice.  Words
cannot be exhaustively assigned to semantic domains.  For example, to what
semantic domain should we assign <heriotza> 'death', or <bertze> 'other'?

Larry Trask
--------------------------------------------------------
Date: Mon, 01 Nov 1999 15:55:37 +0000
From: larryt at cogs.susx.ac.uk (Larry Trask)
Subject: Re: 9 specifics on Including and excluding data (3)
Message-Id: <E11iJo1-0004Wr-00 at rsunx.crn.cogs.susx.ac.uk>

OK; part 3.

>  Number three.
>  Breadth of attestation.
>  Previously stated.
>  Not noted by Trask in the message to which I am replying.

> I suggested very early that attestation in all dialects was not required.

And I have never required any such thing.

In fact, I am tempted to insist on this requirement as a maximally strong
criterion.  But I've held back, if only because some dialects are less well
described than others.

>  Some intermediate would be appropriate, though I did not give
>  a particular number.

But *I* did: at least four out of the five dialect groupings I proposed.

>  Even without a particular number, this is still a specific suggestion.
>  Can it be made still more specific?  Of course.  Almost anything can be.
>  In the example just above, for example,
>  I took two out of three dialects as sufficient.

Er -- two out of *which* three?  This sounds to me like two out of the nine
recognized dialects, or perhaps two out of the five groupings I have proposed.

>  Three out of five would also be a reasonable criterion
>  (not as a cutoff, but as a sufficient *minimum* on a criterion
>  of measured degree of breadth of distribution).
>  If only two dialects are available (for the relevant subject matter),
>  I would personally take one as sufficient for a *minimum*.

But I know of *no* subject matter which is treated only in two dialects.  Bear
in mind that the early Basque literature does not offer us a wealth of topics.
This is not very surprising.  How many topics are overtly treated in the Old
English literature before 1066?

>  Remember that by suggestion number seven,
>  all of this information is kept, by tagging on the lexical item,
>  so we can still distinguish cases later if we wish.

But this is not a point of principle: it's only a procedure.  Even if I start
with a vast tagged corpus, I still have to choose the words which will go into
my initial list, and exclude all the others.  As a matter of principle, it
makes no difference whether the excluded words are sitting on a computer
database or merely sitting in the dictionary: all that matters is that they are
not in the initial list.

Larry Trask
--------------------------------------------------------
Date: Mon, 01 Nov 1999 17:23:37 +0000
From: larryt at cogs.susx.ac.uk (Larry Trask)
Subject: Re: 9 specifics on Including and excluding data (4)
Message-Id: <E11iLBB-00075F-00 at rsunx.crn.cogs.susx.ac.uk>

OK; number 4.

>  Number four.
>  Morphemic composites as evidence for their parts.
>  This one is a recent refinement, in response to the example
>  of <uko> 'forearm' included in <ukondo> 'elbow'.

>  The mainstream would I think have included <uko>
>  on the basis of <ukondo> almost without question,
>  because the parts of the (compound?) are transparent,
>  and therefore the root from which it is formed must be
>  at least as ancient or more ancient than the compound.

I don't query the reasoning, but I don't know whether "the mainstream" would
recognize the sparsely recorded <uko> on the basis of its compound <ukondo>,
*for the purposes I have in mind*.  As I think I've made clear, I don't reject
out of hand the inclusion of <uko> in my list, even though it fails to meet my
primary criteria, since it is pretty clearly present in <ukondo>.  But I can
imagine that some other people might object to this -- in fact, I know of one
or two who definitely do ;-) -- and anyway there are potential pitfalls with
this.  See below.

>  I would not have dreamed it was necessary to state explicitly
>  that morphologically complex items can give evidence
>  for the earlier use of their morphemic parts,
>  since I assume linguists generally take it for granted
>  (except in a few special cases like back-formations).

"Can give evidence" -- sure.  But "license for the purposes I have in mind" --
maybe not.

For example, Basque <izotz> 'frost, ice' almost certainly contains <(h)otz>
'cold'.  But what on earth is the first element?  It might be <ihintz> 'dew',
but this is far from certain, and so I doubt that the universal existence of
<izotz> should be taken as licensing anything in particular.

>  In a case in which there is strong support from
>  inclusion of a root in a compound or derivative
>  in another dialect, it can even be possible to include
>  a form attested (as bare root or stem) only in one dialect.

Ah, but I'm not talking about roots or stems: I'm only talking about free
lexical items.

As it happens, Basque, like English, is a language in which roots and stems are
not commonly distinguished from each other or from free forms.  But there is
one complication: word-formation (both compounding and suffixation).  Basque
word-formation is subject to a number of phonological rules which apply *only*
in this domain, and not otherwise -- for example, not in inflection, and not
within monomorphemic words.  Consequently, the form exhibited by an item inside
a polymorphemic word is not, in general, identical to its form as a free word.
Therefore combining forms cannot be trusted as evidence for the forms of free
words in Pre-Basque.

For example, it is far from obvious that the eastern word <ekhi> 'sun' consists
of <egun> 'day' plus <-ki> noun-forming suffix.  We happen to have good
evidence in this instance that the analysis is correct, but we don't always
have such good evidence, and so it seems wise to me simply to exclude
polymorphemic words from initial consideration altogether, including their
parts.  Why should I give myself extra chances to go wrong when I don't have
to?

>  IF (note IF) we were using the criterion of three dialects
>  out of five, then we would merely need <uko> in one
>  dialect and <ukondo> in two other dialects to reach the
>  criterion of a minimum of three dialects for the root <uko>,
>  though of course that would be only two dialects for the
>  compound <ukondo> so the composite form itself
>  would not exceed *this* minimum if it were
>  attested in only two.

Clarification: <uko> is not a root, but a free noun.  Its regular combining
form should be <uka->, and precisely this is attested in the 16th-century
variant <ukaondo>, since replaced everywhere by the contracted <ukondo>, except
that a few western varieties have the variant <ukando>, and that the Gipuzkoan
dialect has the extraordinary variant <ukalondo>, which requires a good deal of
analysis and is probably a partly distinct formation.

Anyway, I doubt that it will make much difference whether we include <uko> or
not.  If it did make a difference, then my whole project would be in jeopardy,
since I can't place reliance on results which are highly sensitive to the
inclusion or exclusion of a single word.  But I'm not expecting that.

>  The exclusion of multimorphemic items is a very strong bias against
>  the result being a representative cross-section,
>  even of the *roots* of a normal language
>  (for those normal languages which do have multimorphemic items).

So what?  I'm not *interested* in roots: I'm interested in lexical items.

Furthermore, I'm not interested in polymorphemic words.  It is perfectly clear
that, in Basque, polymorphemic words are not constructed according to the same
rules as native monomorphemic words.  For example, <ukondo> is a legitimate
word, but it would *not*, I am now rather confident, be a possible form for a
monomorphemic word.

Compare English.  Monomorphemic native English words absolutely do not permit
certain consonant clusters, such as /ph/, /th/, /kh/, /nh/, /ts/, and /St/,
among others.  But polymorphemic words permit these clusters: 'uphill',
'hothouse', 'inkhorn', 'unharmed', 'cats', 'fished', and so on.  Hence an
account of morpheme-structure constraints for English would exclude these
clusters, even though they occur in words.  Basque is much the same here.

>  While the *end goal* may be a list of morphemes or even root morphemes,

It is not.  My goal is to characterize the morpheme-structure constraints
applying to native, ancient and monomorphemic Basque lexical items.  Not roots,
and not morphemes in general: free monomorphemic lexical items.  So it makes
sense to me to choose such items as data, and to exclude items of other kinds.
If I want to characterize ducks, then I choose ducks to work on, and I exclude
even the most fascinating and significant chickens.

(An aside: am I the first person ever to write 'significant chickens'?) ;-)

>  the data used to obtain these should of course include multi-morphemic
>  items.  To do otherwise is an arbitrary, unjustified bias against the
>  normality of languages which do contain multimorphemic words,
>  and some morphemes including some roots occur only in such words.

No.  I absolutely disagree.  The existence of the English word 'bits' /bIts/ is
definitely not evidence that English permits monomorphemic words of the form
/bIts/.

And the doubtless true observation that certain morphemes are attested only
within polymorphemic words is neither here nor there.  Recall: my goal is to
find the *best* candidates for my purpose, not to find *all possible*
candidates.

Larry Trask
--------------------------------------------------------
Date: Mon, 01 Nov 1999 17:31:33 +0000
From: larryt at cogs.susx.ac.uk (Larry Trask)
Subject: Re: 9 specifics on Including and excluding data (5)
Message-Id: <E11iLIr-0007D6-00 at rsunx.crn.cogs.susx.ac.uk>

OK; part 5.

>  Number five.
>  Balanced use of criteria, each alone not decisive.

>  This one has been made explicit only recently, as soon as I became
>  consciously aware of how near Trask comes to saying that each
>  criterion must be satisfied independently of the others,

"How near"?  I thought I'd said this explicitly.

My criteria are independent.  A word that fails to satisfy *any one* criterion
is excluded, even if it satisfies all the other ones.

>  of what he perhaps means by "best" examples, rather than merely
>  very good candidates for early Basque.

Well, insofar as we can distinguish "best" from merely "very good", I
definitely want to find the best ones.  Why would I want to do anything else?

>       Numbers two and four are examples of the
>  INTERACTION of criteria, that no criterion by itself should be
>  determining of inclusion or exclusion.  I took this for granted,
>  but now make it explicit.  Combine the "scores" from several
>  criteria, make a balanced decision.  That is specific, and can be
>  made more so.  It is fairly common practice in comparative linguistics
>  to have combined lists, those proposed cognates which seem perfect
>  both on sound correspondences and on semantics, those which
>  are perfect on sound correspondences but slightly odd on semantics,
>  and so on, with greater detail and elaboration.  No reason not to
>  do that here also.

But also no reason that I can see to *do* it here.

Of course, if I can't compile a list of reasonable length using my criteria,
then I might be forced to resort to something like this.  But I'd prefer to
avoid it if I can.  And I think I can.  I think my primary criteria will still
leave me a list of a few hundred words.  And I'll be very surprised if that's
not enough to identify morpheme-structure constraints with some confidence --
particularly since those constraints show every sign of having been pretty
restrictive in Pre-Basque.

Larry Trask
--------------------------------------------------------
Date: Tue, 02 Nov 1999 09:35:05 +0000
From: larryt at cogs.susx.ac.uk (Larry Trask)
Subject: Re: 9 specifics on Including and excluding data (6)
Message-Id: <E11iaLJ-0001kx-00 at rsunx.crn.cogs.susx.ac.uk>

OK; part 6.

>  Number six.
>  Avoiding biases against expressives.
>  Previously stated, as Trask agrees,
>  though he very much misrepresents the content of this one.

[LT]

>> I have seen none, except for Lloyd's suggestion  ...[one above, and]
>> and his insistence that sound-symbolic words
>> should be self-consciously added to the list according to no specified
>> criteria.

> This is most emphatically NOT what I suggested.
> I was explicit that I suggested dropping or modifying criteria which
> had the *effect* of biasing selection against any category of words,
> that I happened to be qualified to talk about why a bias against
> sound-symbolic words might distort any conclusions about
> canonical forms.
> That is quite another matter from self-consciously insisting
> on adding expressives.

OK, Lloyd -- if I've misunderstood your position, then I apologize.  But it
certainly *looked* as though you were proposing to deliberately add expressive
formations which failed my primary criteria.  See further below.

But, if you're not going to add these things, then how precisely do you propose
to get them in without wrecking my criteria?  Recall that expressive formations
in Basque are rarely attested early and are usually confined to small areas.

[snip passing remark about an earlier point]

[LT]

>> but dismissed the second
>> as lacking in specifics and intrinsically circular.

> As Trask restated it, I would agree that self-consciously adding
> expressives to the list would be unprincipled,
> if that were done merely for the purpose of adding expressives.
> But as reiterated above, that was most emphatically NOT what
> I proposed.  I proposed rather eliminating artificial barriers to
> their inclusion, through accidents of more limited attestation
> and the interaction of supposed criteria for number of dialects
> required in attestations.

In what respect are my criteria "artificial"?

I have proposed no criteria which specifically target expressive formations for
exclusion.  Instead, I have merely applied the same criteria to all words.

Now, if you're going to relax the criteria sufficiently to pick up words like
<malmutz> 'fluffy, insubstantial', which is pretty much confined to a single
dialect (Bizkaian) and nowhere recorded before 1802, then how are you going to
exercise any reasonable control at all over the membership of the initial list?
Aren't you just going to open the floodgates to most of the words in the
language, thereby defeating the whole point of the exercise?

> If expressives are attested only in one dialect,
> then only one dialect would be sufficient as a bare minimum
> satisfaction on that criterion of distribution.
> (an instance of suggestion number two above,
> not at all specific to expressives).

The distribution of what I regard as expressive formations varies from the
whole language (for a very few) to a single small area (probably the large
majority).

But look.  Are you suggesting that expressive formations should be singled out
for special treatment?  That they should be deliberately added to my list even
though they fail my primary criteria badly?  Before you accuse me again of
misrepresentation, look at your words above: it certainly looks to *me* as
though that is exactly what you are suggesting.

If you are, then tell me: how do I decide in advance which words are expressive
formations and which are not, so that I can decide which ones to add to my list
in defiance of my criteria?  Isn't this utterly circular?  Remember, one of my
ultimate goals is to characterize explicitly the differences in formation
between expressives and ordinary words.  And I can't hope to do that if I pick
the expressive formations *a priori* -- now can I?

And, if you are not doing this, then what exactly *are* you suggesting?  That I
should include in my list *every* word attested in "only one dialect"?  That
means that I will wind up listing every word recorded in the language at all,
and so I won't even have a list.

Lloyd, what on *earth* is this about? I can't follow it.

>  In fact, I gather from some other remarks
>  by Trask quite recently, that there are numerous alternative
>  words for "butterfly".

There are several different words, attested at different times and in different
places.  All of them appear to me to be expressive formations of one kind or
another.  Not one of them is either recorded early or found throughout the
larger part of the language.

>  If we had a full set of these displayed
>  for us, who knows what we might learn about whether
>  any particular forms should be considered inherited from
>  early Basque?

Oh, I can list them, if you like.  But how will merely staring at them allow us
to learn anything at all?

Anyway, I can state with some confidence that not one of these words is old in
Basque.  We might as well stare at the names of different shapes of pasta, or
at the names of the animals in the African rain forest, and try to decide which
ones are ancient in English.

>  And about our own thinking about criteria
>  for inclusion and exclusion.  Good examples have a way of
>  revealing paradoxes of thinking, or otherwise sharpening
>  our thinking.

I have already put a good deal of thought into my criteria, thought which is
based on my 25 years or so of studying Basque.  I have yet to see on this list
any different criteria which strike me as superior, or even just as good.

Larry Trask
--------------------------------------------------------
Date: Tue, 02 Nov 1999 09:41:38 +0000
From: larryt at cogs.susx.ac.uk (Larry Trask)
Subject: Re: 9 specifics on Including and excluding data (7)
Message-Id: <E11iaRd-0001wP-00 at rsunx.crn.cogs.susx.ac.uk>

OK; part 7.

>  Number seven.
>  Tagging of items, rather than inclusion and exclusion
>  Previously stated.
>  Not noted by Trask in the message to which I am replying.

As I have now said several times, a tagging approach is not a point of
principle but only a procedure.  You can tag words to your heart's content,
but, sooner or later, you are going to have to decide which words should be
counted as strong candidates for native and ancient status and which should
not.  Until you have done this, all you have is the entire known vocabulary of
the language with lots of colored flags attached.  You have no list, and you
can obtain no results.

>  In redefining where on the continuum to draw the line for
>  "best" examples (since to be meaningful we must recognize
>  that is what anyone does by choosing or adjusting their criteria),
>  we can gain the benefits of more information and lose nothing.
>  Any information that someone might have used in a criterion
>  dictating exclusion can be included in a computer database
>  as a tagging of the individual items.  Additional information
>  can also be added as tagging.  The benefits of being able to
>  consider alternative hypotheses so quickly and easily were
>  discussed, and the fact that some questions will simply not be
>  asked if it is too difficult to ask them.

Again, this is procedure, and not principle.  Choosing three or six or fifteen
different sets of criteria and listing the words produced by each set of
criteria may or may not be an admirable procedure.  But, in the end, you must
choose the criteria you are going to go with, and get on with the real work.
Right?

Larry Trask
--------------------------------------------------------
Date: Tue, 02 Nov 1999 10:02:56 +0000
From: larryt at cogs.susx.ac.uk (Larry Trask)
Subject: Re: 9 specifics on Including and excluding data (8)
Message-Id: <E11iamF-0002nN-00 at rsunx.crn.cogs.susx.ac.uk>

OK; part 8.

>  Number eight.
>  Slight global preference to include basic vocabulary,
>       unless provably borrowed.
>  Previously stated.
>  Not noted by Trask in the message to which I am replying.

But I *did* reply to it in one of my postings.

>  The use of the Swadesh list or other list of *relatively* more
>  basic vocabulary could be used to give an extra point or fraction
>  of a point to items of basic vocabulary, perhaps causing some
>  of them to be included which otherwise would not rate highly
>  enough on the balanced combination of other criteria.
>  The principled basis for this is that languages do have basic
>  vocabulary, that basic vocabulary is, statistically only now,
>  relatively more resistant to replacement by loanwords,
>  and that the positing
>  of a set of vocabulary for an early form of a language should
>  probably include lexical items for most such basic vocabulary.

No doubt.  But how does this help?

The native English words for such basic senses as 'face', 'mountain', 'river'
and 'animal' have been entirely lost in favor of loan words.  How would a study
of early English benefit from including these words in the list, even if only
"fractionally", merely because they are basic?

The key word in the paragraph above is "statistically".  The English word
'face' is not "statistically" more than zero percent native English because
words for 'face' are not frequently borrowed -- now is it?  If we stick to
monomorphemic words -- and I do -- then a word is either 100% native or zero
percent native.  My criteria are designed to locate the ones which are 100%
native with as much reliability as possible.  How on *earth* would I benefit
from counting five or ten percent of items of basic vocabulary which are not
native?

I find this utterly mysterious.

>  This would not overrule clear cases of *known* borrowings,
>  in such case we might indeed appropriately have a "trump"
>  criterion for exclusion, but it should be used as a "trump"
>  only when *known* is meant very strictly, not mere speculation.
>  Trask's example of "mountain" is probably such a case,
>  to be excluded as an obvious loan.

The majority of loans into Basque can be identified as such with 100%
certainty, and must of course be excluded.  But there remains a residue of
cases which are less certain: words which are probably loans, words which are
very possibly loans, words which have been regarded as loans by a few
specialists.

I have maintained, and I still maintain, that the only responsible policy to
adopt is to exclude any word for which a loan origin has been seriously
defended by knowledgeable specialists.  I know that this policy will exclude a
few genuinely native words, but the opposing policy would fail to exclude a
larger number of loan words -- an outcome which is clearly very much worse.

>  But that does not contradict using this criterion
>  to evaluate whether we may have exluded too much, overall.

What do you mean by "excluding too much"?  This makes no sense to me.

For the 87th time, I am *not*, at this stage, trying to locate every native and
ancient word recorded anywhere.  I am merely trying to identify a sizeable body
of words which have strong claims to being native and ancient, so that I can
examine their phonological forms.

>  In effect, this suggestion shifts the burden of proof slightly,
>  so that to exclude an item of basic vocabulary we need stronger
>  evidence than we would for non-basic items.

Really?  Why?

The observation that words for 'face' are borrowed less often than words for
'cauliflower' has *no bearing* on whether any *particular* word for 'face' or
'cauliflower' has been borrowed or not.

>  What exact proportion of a Swadesh list
>  might we want to be sure is included?
>  I do not presume to know,
>  and there certainly are differences among languages in the
>  proportion of basic vocabulary which is native.

Exactly so.

>  But even if
>  not precisely quantified, this criterion is specific and has a
>  principled basis.  That basis relies on the idea that we are
>  evaluating our criteria for their appropriateness, just as we
>  are using them to evaluate items for inclusion or exclusion
>  as within the bounds of "best" candidates for early Basque.

I'm sorry, but I must reject the proposal absolutely.  The meaning of a word is
*not* a reliable guide to its possible native and ancient status.  We cannot
extrapolate from universal statistical tendencies to specific conclusions about
individual words.  And it is definitely individual words that I am interested
in, not statistical tendencies.

Larry Trask
--------------------------------------------------------
Date: Tue, 02 Nov 1999 10:17:26 +0000
From: larryt at cogs.susx.ac.uk (Larry Trask)
Subject: Re: 9 specifics on Including and excluding data (9)
Message-Id: <E11ib0I-0004S6-00 at rsunx.crn.cogs.susx.ac.uk>

At last, bleary-eyed readers (if any remain): part 9 (the last).

>  Number nine.
>  Avoiding cascading errors,
>       not insulating steps in the reasoning.
>  Previously stated.
>  Not noted by Trask in the message to which I am replying.

Er -- "cascading errors"?  How so?  My primary criteria are all independent,
and I am picking out individual words.  Where is there any scope for "cascading
errors"?

>  It is important to avoid circularity, by not artificially insulating
>  steps in the reasoning process, by not allowing selection of data
>  to be dictated by the hypotheses one has, more than absolutely
>  necessary.

Fine.  But, tell me, Lloyd: how exactly do my criteria of early attestation,
wide distribution and absence from neighboring languages force me to choose
words (or not) that are "dictated" by any hypotheses that I may be publicly or
privately entertaining?  I think this question is rather important, but I
haven't seen you responding to it so far.  How about now?

>  This was stated first in regard to canonical forms,
>  because of the likelihood that the initial selection under Trask's
>  criteria would bias against expressives which (Trask indicated)
>  do indeed have some different canonical forms from other vocabulary.

If they do -- and I agree that, in Basque, they do -- then this fact will
emerge clearly from my investigation.  If I find -- as I expect to -- that a
word of the form *<mutur> was phonologically impossible in Pre-Basque, then the
modern <mutur> 'snout, muzzle' will be, to some extent, exposed as an
expressive formation.  Now, this is just the kind of result that I am hoping to
obtain.  So what's the problem?

>  In other words, we should avoid excluding these from the
>  beginning,

OK, Lloyd -- gotcha.  Your "these" clearly refers to "expressives", right?
Therefore you are now explicitly proposing that we should take steps from the
beginning to include a whole bunch of expressive formations which fail my
primary criteria, just so my initial list will include a bunch of expressive
formations.  Right?

And how does this square with your protestation of misrepresentation?

>  so that the initial results will include a full range of
>  native canonical forms, and will not bias later work circularly to
>  incorrectly exclude items on the basis of a narrow set of formulas
>  for canonical forms, merely because almost no examples of such
>  canonical forms typical of expressives happened to be included
>  at stage one.

Lloyd, in spite of his protestations, is here *clearly* demanding that I should
find some way of (a) identifying expressive formations *a priori*, before I've
done anything else, and then (b) guaranteeing that lots of these go into my
initial list, *regardless* of how badly they may fail any or all criteria.
This is exactly the position I imputed to him before he accused me of
misrepresenting him.  Own up, Lloyd: I have not misrepresented you in the
slightest.  Rather, I have presented your position accurately, and dismissed it
as untenable.

I rest my case. ;-)

Larry Trask
COGS
University of Sussex
Brighton BN1 9QH
UK

larryt at cogs.susx.ac.uk

--------------------------------------------------------
End of Digest
*********
--------------------------------------------------------



More information about the Indo-european mailing list