Plosive-liquid clusters in euskara borrowed from IE?

Fri Aug 27 09:44:32 UTC 1999

[ moderator re-formatted ]

My apologies for my lateness in replying to this message but life has been
pandemonium here. This is my repsonse to Larry Trask's criteria for the
admission of words in an analysis of early basque. Some methodological issues
are relevant to all language analyses.

     Of course, I exclude
    verbs, since native verbal roots are never free forms.

This surprised me. Do you mean "sar zaitez" (Come in!) is not legal euskara
and that "sar" is not a word. While the dictionary form of the word is "sartu"
I will use "sar" in the analysis.

    If we agree to
    include verbal roots, that will add several hundred more.

Which I have always included in the work.

    I'll be
    genuinely surprised if you can find even 1500 strong candidates.
    But I'll wait and see.

We have finished collecting and classifying the monosyllables. I now just have
to pull the list together.

We have discussed you proposal on how to accept words and give the following

    I propose the following.  Let's compile a list of the Basque words each
    of which meets the following minimal requirements.

    	(1) There is no reason to suppose it is polymorphemic.
    	[The great bulk of the words in any Basque dictionary are
    	*transparently* polymorphemic, and can be excluded at once.]

    	(2) It is found throughout the language, or nearly so.

    	[Since the better dictionaries assign words to the conventional
    	dialects, it is easy to formalize this requirement as we
    	see fit.]

Don't agree - the forms of words in certain dialects are important to
understanding their history.

    	(3) It is attested early.

    	[Let's say before 1600, which is `early' for Basque.]

Don't agree - the corpus of materials at that time is tiny. for example there
are words in corpora predating Larramendi's dictionary(1747) which he didn't
include. In line with what you say below, Larramendi's dictionary constructed
many neologisms which we do not intend using.

    	(4) There is no reason to believe that it is shared with
    	languages known to have been in contact with Basque.

    	[Subjective, and hard to formalize, but I believe that doubtful
    	cases are few enough to constitute only a minor problem.]

Agreed -but we recognise this may not be entirely clearcut.

    To these I would like to add two more, though these are not essential:

    	(5) It does not appear to be a nursery word.

The comment by a list member indicated nursery words can provide useful
information. I would propose to analyse them separately to understand their
commonness with the main group.

    	(6) It does not appear to be of imitative origin.

Ditto for nursery words - with the extra comment that imitative words
sometimes do have a true word associated with them, e.g. taup : taupa

    Now, (5) would exclude only a very few words not excluded by the other
    criteria, notably <ama> `mother' and <aita> `father', while (6) would
    exclude a much larger number of items which would be automatically
    excluded in any serious comparative work, like <miau> `meow', <mu>
    `moo', <be> `baa', <din-dan> `ding-dong', <tu> `spit', and probably also
    <usin> `sneeze'.  This last sounds roughly like oo-SHEEN, and, in my
    view, is too likely to be imitative to be included in any list.

Anything that is debatable is included in the anlaysis. One's person's opinion
 amongst a collection should not have the right of veto in word selection. The
default case  is acceptance - there has to be a substantial reason for
withdrawing a word from the analysis list.

    Having compiled the list, let us then examine it and determine the
    phonological characteristics of the words on the list.  I am fairly
    confident that I can guess what the results will be, at least in their
    main lines, though there will doubtless be a few surprises in the

We shall go forward on this basis. I intend producing a list of ALL the Azkue
words with a commentary on whether they have been included in the analysis and
the reasons for their rejection. However monosyllables are first. It is most
important to get them processed as they are needed to assist in identifying
potential compounds of 2-, 3-, 4-syllbale words.

    [on my English examples]

    > I wonder if we are talking at crossed purposes here. I'm not saying
    > the known history of a word should be ignored. If we have such
    > information then it should be used.


    > But I am saying that words we have today that are not identified as
    > having any alternative history could be fairly considered for
    > informing about early euskara.

    Here I can't agree.  I have the gravest reservations about including
    words not recorded before 1871, or before 1935; about including words
    found nowhere but in Larramendi's dictionary or in Hiribarren's
    dictionary; about including words recorded only in one small area; about
    including words reported only by the Dutch linguist van Eys or only by
    the Spanish polymath Hervas y Panduro; about including all sorts of
    things which, in my view, are deeply suspect for one reason or another.

I think you position is extremely conservative. My response is as above.

    I believe it is not enough merely to exclude obvious loan words: we must
    be *far* more discriminating, or we are going to wind up with a list
    containing more junk than genuine native and ancient words.

If you can demonstrate they are junk then they can be removed otherwise they
are observed data and should not to be culled unjustifiably -that is just
managing the data to get the result one wants and is  poor statistical method.

    *Once* we have a list of the most plausible candidates for native and
    ancient status, *then* we can consider judiciously whether further words
    might plausibly be added to it.  But we have to start by being as
    rigorous as possible, not by tossing in everything that isn't obviously

So we will reach a position where we proffer a list and you may cull it - it
will be interesting to see the differences.

    > Let's take the opposite scenario. In the Azkue list there are 9854
    > words of which 1436 have modern orthography that can't be mapped
    > into the orthogrpahy you use for describing early euskara.

    Not sure I understand this.

In your book you use an orthography for describing early euskara. I have
extracted into a sub-list all the words that can't be early euskara on the
sole principle they don't conform to that orthography e.g. words with <x>,<j>
... There are 1436 of them. Of course this is not to deny the words can be
traced back to an earlier form. Wen we have the reconstructed forms we will be
able to use them.

    > Of the 8318 words that do fit your orthography, 5022 can be found in
    > one of the modern lexicons/dictionaries of Aulestia, Kintana, Morris
    > or XUXEN. if we analyse these words and show that they strongly
    > conform to the Michelena description of early euskara and/or your
    > restating of it, will you consider that result irrelevant to
    > determining the merit of that description, especially given the
    > phonological conservatism of euskara?

    Yes, I would.

    Let me cite just a few Basque words from Aulestia's dictionary:

    	<galgo> `greyhound'
    	<baldin> `if'
    	<berba> `word'
    	<nabo> `turnip'
    	<apo> `toad'
    	<adore> `courage'
    	<gona> `skirt'
    	<gisu> `plaster'
    	<bortitz> `strong'
    	<idatzi> `write'
    	<haizkora> `ax'
    	<soro> `field'
    	<ohore> `honor'
    	<oilo> `hen'
    	<zeru> `sky'

    And so on, and so on, for god knows how many more.  Now, every single
    one of these words conforms *perfectly* to our ideas of what native and
    ancient Basque words can look like, without the slightest complication.
    And yet, in every case, I can adduce overwhelming evidence that the word
    is not native, or is not ancient, or is not monomorphemic.

This is not on the point of discussion. I have already said known non-native
words need to be deleted. However I also say, "a suspicion" that it is  not
native is not sufficient to reject the word from analysis, otherwise one is
potentially loading the dice.

    Conformity to pre-existing ideas about possible phonological forms is
    not nearly good enough: such an approach must inevitably sweep up huge
    numbers of words which demonstrably should not be there, and, by
    implication, very many more which really (if not demonstrably) should
    not be there.

I'm using it in the other  direction, that is, conformity to  pre-existing
phonological forms is a means of separating the original list to give us a
smaller set of words to begin working with. This operation does not override
the criteria of removing known loans.

    > I would also like to point out there could be a further difference
    > in our perspectives due to different but unspoken methodologies and
    > goals. I am interested in describing the stochastic or probabilistic
    > characteristics of word formation in euskara.

    By `word formation', I suppose you mean what I would call
    `morpheme-structure constraints'?  Well, fine if you're only interested
    in modern Basque, but I myself am interested in the morpheme-structure
    constraints of Pre-Basque, and therefore I don't want to count as
    evidence anything that doesn't appear to be a strong candidate for a
    Pre-Basque word, in its earliest reconstructible form.

I understand your motivation and the demand for rigour you are placing here,
however my concern is that such a method can very readily be arbitrary and
will be applied to favour the working hypotheses  of the scholars, thus
producing a self fulfilling prophecy. I prefer the alternative approach
founded in observational disciplines like psychology, land surveying, etc.
which say you only reject observations when you have very good reasons to.

    > Whilst getting a legitimate set of words to anlayse is important the
    > presence of a few doubtful words does not necessarily destroy such a
    > description as a legitimate probabilistic statement.

    Agreed, but I worry about that phrasing `a few words'.  If we are not
    maximally rigorous, I fear that what we'll get is a whole mountain of
    words that shouldn't be there -- more improper words, in fact, than
    proper ones, which will surely ruin any stochastic approach.

I think we agree in spirit on the need for rigour -it is just one dimension of
rigour we have different concerns about.

    > I do not wish to deny the importance of systematic study of each
    > word but noisy data doesn't invalidate a probabilisitic study nor
    > necessarily pre-determine it to being unable to say something useful
    > about the structure of the data.  Production of a putative core word
    > list of early euskara will be a spin-off of this work.

    Excellent, but I'd be very cautious about including modern words.

I am most concerned about not including modern words hence my starting point
is Azkue and nothing more recent in terms of the source of words. However I
have used modern wordlists to assist in filtering the initial Azkue list, in
an attempt to identify the Azkue words that have some doubt of veracity
attached to them, that is words in the Azkue list that are not found in modern
sources are partitioned off for separate analysis.


    >> I don't see why.  A dictionary of modern English is of no direct
    >> relevance to ascertaining the nature of Old English, and a dictionary of
    >> modern Basque is of no direct relevance to ascertaining the nature of
    >> Pre-Basque.  You might as well try to find out what Latin was like by
    >> reading a dictionary of modern French.

    > This of course assumes there is little relatedness between the two
    > and there is not a systematic development from one to the other. The
    > already phonological conservatism of euskara suggests that the words
    > in Azkue have greater validity for studying early euskara than
    > modern english for studying middle english.

    Not necessarily.  Phonological conservatism is only one factor among
    several.  It matters little that Basque phonology has been conservative
    if not much of the Pre-Basque lexicon still survives.

I guess that is one of things we are trying to get to -  a putative early and
by implication Proto-Basque lexicon.

    > I'm a little mystified with this in the context of euskara. As far
    > as I have heard the dominant ancient external influence has come
    > from Latin with suggestions of a small number of items from Celtic
    > and a few others. Once these are identified then is not everything
    > ancient also native because there is nothing else left but ancient
    > native words, or am I missing something?

    Once ancient loans are identified -- if they can be -- then everything
    ancient remaining is also native, in some sense.  But there remains the
    *big* problem of determining what is ancient to begin with.  The great
    bulk of the Basque vocabulary is not ancient, just as the great bulk of
    the modern English vocabulary is not ancient.

If you can't demonstrate a word is borrowed and it has the morphology of an
old word then it is an old word. To say it is possibly not old is a truism and

