>> One day I hope to compile a list of the
>> Basque words that can reasonably be regarded as monomorphemic, as native
>> and as having been in the language for at least 2000 years.  I don't
>> expect that list to contain more than several hundred words.

> We are working on this problem here at the moment. Myself and two
> Doctoral Philology students from the Basque Country. We are working
> on monosyllables first and think that in total it is likely to be a
> few thousand.

Well, this sounds optimistically high to me, I confess.  My admittedly
unsystematic experiments lead me to believe that the number of
monomorphemic items which are good candidates for native and ancient
status is unlikely to be more than a few hundred.  Of course, I exclude
verbs, since native verbal roots are never free forms.  If we agree to
include verbal roots, that will add several hundred more.  I'll be
genuinely surprised if you can find even 1500 strong candidates.
But I'll wait and see.

[on Azkue's preface]

> So you would seem to agree then that the Azkue list contains all the
> words Azkue thought were native euskara ( as well as others of
> course).

Yes, though that is not the point we were discussing earlier.

> I assume we are using the term "native" to mean words not
> derived from any other language at least in the last 2000 years.

More or less.


>> Sorry; this does not follow.  The mere existence of a word in modern
>> Basque is no guarantee that it is either native or ancient.

> I'm quite confused by this response. If they can't be identified as
> loans (my comment) you say there is no guarantee they are native or
> ancient. I guess I have to say, true.

OK; we agree.

> But surely your statement kills all work dead in the water.

No; not at all.  There's a lot of room between `certainly borrowed' and
`certainly native', and nothing is guaranteed in advance to be native
and ancient.

> It seems you are saying we need "guarantees" that words are of some
> class/type/characterisitic to persue any analysis. Note my statement
> doesn't assert such words are "guaranteed" to be native or ancient,
> but that they do represent evidence worthy of pursuing.

No, I don't require advance guarantees: none are available in any case.

I propose the following.  Let's compile a list of the Basque words each
of which meets the following minimal requirements.

	(1) There is no reason to suppose it is polymorphemic.

	[The great bulk of the words in any Basque dictionary are
	*transparently* polymorphemic, and can be excluded at once.]

	(2) It is found throughout the language, or nearly so.

	[Since the better dictionaries assign words to the conventional
	dialects, it is easy to formalize this requirement as we
	see fit.]

	(3) It is attested early.

	[Let's say before 1600, which is `early' for Basque.]

	(4) There is no reason to believe that it is shared with
	languages known to have been in contact with Basque.

	[Subjective, and hard to formalize, but I believe that doubtful
	cases are few enough to constitute only a minor problem.]

To these I would like to add two more, though these are not essential:

	(5) It does not appear to be a nursery word.

	(6) It does not appear to be of imitative origin.

Now, (5) would exclude only a very few words not excluded by the other
criteria, notably <ama> `mother' and <aita> `father', while (6) would
exclude a much larger number of items which would be automatically
excluded in any serious comparative work, like <miau> `meow', <mu>
`moo', <be> `baa', <din-dan> `ding-dong', <tu> `spit', and probably also
<usin> `sneeze'.  This last sounds roughly like oo-SHEEN, and, in my
view, is too likely to be imitative to be included in any list.

Having compiled the list, let us then examine it and determine the
phonological characteristics of the words on the list.  I am fairly
confident that I can guess what the results will be, at least in their
main lines, though there will doubtless be a few surprises in the

[on my English examples]

> I wonder if we are talking at crossed purposes here. I'm not saying
> the known history of a word should be ignored. If we have such
> information then it should be used.


> But I am saying that words we have today that are not identified as
> having any alternative history could be fairly considered for
> informing about early euskara.

Here I can't agree.  I have the gravest reservations about including
words not recorded before 1871, or before 1935; about including words
found nowhere but in Larramendi's dictionary or in Hiribarren's
dictionary; about including words recorded only in one small area; about
including words reported only by the Dutch linguist van Eys or only by
the Spanish polymath Hervas y Panduro; about including all sorts of
things which, in my view, are deeply suspect for one reason or another.

I believe it is not enough merely to exclude obvious loan words: we must
be *far* more discriminating, or we are going to wind up with a list
containing more junk than genuine native and ancient words.

*Once* we have a list of the most plausible candidates for native and
ancient status, *then* we can consider judiciously whether further words
might plausibly be added to it.  But we have to start by being as
rigorous as possible, not by tossing in everything that isn't obviously

> Let's take the opposite scenario. In the Azkue list there are 9854
> words of which 1436 have modern orthography that can't be mapped
> into the orthogrpahy you use for describing early euskara.

Not sure I understand this.

> Of the 8318 words that do fit your orthography, 5022 can be found in
> one of the modern lexicons/dictionaries of Aulestia, Kintana, Morris
> or XUXEN. if we analyse these words and show that they strongly
> conform to the Michelena description of early euskara and/or your
> restating of it, will you consider that result irrelevant to
> determining the merit of that description, especially given the
> phonological conservatism of euskara?

Yes, I would.

Let me cite just a few Basque words from Aulestia's dictionary:

	<galgo> `greyhound'
	<baldin> `if'
	<berba> `word'
	<nabo> `turnip'
	<apo> `toad'
	<adore> `courage'
	<gona> `skirt'
	<gisu> `plaster'
	<bortitz> `strong'
	<idatzi> `write'
	<haizkora> `ax'
	<soro> `field'
	<ohore> `honor'
	<oilo> `hen'
	<zeru> `sky'

And so on, and so on, for god knows how many more.  Now, every single
one of these words conforms *perfectly* to our ideas of what native and
ancient Basque words can look like, without the slightest complication.
And yet, in every case, I can adduce overwhelming evidence that the word
is not native, or is not ancient, or is not monomorphemic.

Conformity to pre-existing ideas about possible phonological forms is
not nearly good enough: such an approach must inevitably sweep up huge
numbers of words which demonstrably should not be there, and, by
implication, very many more which really (if not demonstrably) should
not be there.

> I would also like to point out there could be a further difference
> in our perspectives due to different but unspoken methodologies and
> goals. I am interested in describing the stochastic or probabilistic
> characteristics of word formation in euskara.

By `word formation', I suppose you mean what I would call
`morpheme-structure constraints'?  Well, fine if you're only interested
in modern Basque, but I myself am interested in the morpheme-structure
constraints of Pre-Basque, and therefore I don't want to count as
evidence anything that doesn't appear to be a strong candidate for a
Pre-Basque word, in its earliest reconstructible form.

> Whilst getting a legitimate set of words to anlayse is important the
> presence of a few doubtful words does not necessarily destroy such a
> description as a legitimate probabilistic statement.

Agreed, but I worry about that phrasing `a few words'.  If we are not
maximally rigorous, I fear that what we'll get is a whole mountain of
words that shouldn't be there -- more improper words, in fact, than
proper ones, which will surely ruin any stochastic approach.

> I do not wish to deny the importance of systematic study of each
> word but noisy data doesn't invalidate a probabilisitic study nor
> necessarily pre-determine it to being unable to say something useful
> about the structure of the data.  Production of a putative core word
> list of early euskara will be a spin-off of this work.

Excellent, but I'd be very cautious about including modern words.


>> I don't see why.  A dictionary of modern English is of no direct
>> relevance to ascertaining the nature of Old English, and a dictionary of
>> modern Basque is of no direct relevance to ascertaining the nature of
>> Pre-Basque.  You might as well try to find out what Latin was like by
>> reading a dictionary of modern French.

> This of course assumes there is little relatedness between the two
> and there is not a systematic development from one to the other. The
> already phonological conservatism of euskara suggests that the words
> in Azkue have greater validity for studying early euskara than
> modern english for studying middle english.

Not necessarily.  Phonological conservatism is only one factor among
several.  It matters little that Basque phonology has been conservative
if not much of the Pre-Basque lexicon still survives.

> and Michelena's work as meritorious as it is, is not the last of the
> story and the Azkue list can help us add to that, I'm sure.

Well, I'll look forward to seeing what you come up with.

[on my distinction between `native' and `ancient']

> I'm a little mystified with this in the context of euskara. As far
> as I have heard the dominant ancient external influence has come
> from Latin with suggestions of a small number of items from Celtic
> and a few others. Once these are identified then is not everything
> ancient also native because there is nothing else left but ancient
> native words, or am I missing something?

Once ancient loans are identified -- if they can be -- then everything
ancient remaining is also native, in some sense.  But there remains the
*big* problem of determining what is ancient to begin with.  The great
bulk of the Basque vocabulary is not ancient, just as the great bulk of
the modern English vocabulary is not ancient.

