minimal pairs (was: PIE e/o Ablaut)

Robert Whiting whiting at cc.helsinki.fi
Sun Nov 5 18:51:56 UTC 2000


On Sat, 21 Oct 2000 "Ross Clark (FOA LING)"
<r.clark at auckland.ac.nz> wrote:

<snip>

>RW> On Fri, 28 Apr 2000 Ross Clark <r.clark at auckland.ac.nz> wrote:

>RC>> I trust that we share the assumptions that (i) we are
>RC>> talking about the synchronic phonology of modern English, and
>RC>> (ii) the reality that we are trying to get at is what is in
>RC>> speakers' heads.

>RW> First, let me assure you that we are indeed talking about
>RW> the synchronic phonology of modern English.  About the second
>RW> point, I am much less sanguine.  I think it is possible to
>RW> describe how language works linguistically (some areas are easier
>RW> than others), but I don't think that we are in a position to say
>RW> what goes on in a speaker's head to produce language.  While it
>RW> would certainly be nice to know, I think that the cognitive
>RW> processes that produce language are beyond our reach at the
>RW> moment.  Speakers themselves don't know how they produce
>RW> language, so you can't find out by just asking them.  So the only
>RW> reality that we can get at is the language that speakers produce.

>RW> Historical grammar, when we have a written record, is based
>RW> on empirically verifiable facts.  Synchronic grammar is based on
>RW> a hypothesis about how native speakers produce their language.  A
>RW> hypothesis is not a fact.  It is an explanation put forward to
>RW> account for observable facts.  People tend to forget this and
>RW> consider synchronic grammar a fact.  But whatever reality may
>RW> exist in the speaker's head just can't be gotten at at the
>RW> present time with our present knowledge.  In general, I agree
>RW> that what we are trying to get at is the reality in speakers'
>RW> heads, but it is a roundabout road that we have to take and we
>RW> have to have a realistic picture of language before we are likely
>RW> to get there.

>I'm not sure what you mean by "historical grammar" here.

By "historical grammar" I mean the changes in grammar through
time, studied systematically.  By grammar I mean those features
of language that one will usually find in a standard grammar
book:  phonology, morphology, and syntax, but generally not
including pragmatics and lexicon.

>If it is an account of changes in the language, then it is
>surely an account of transitions from one synchronic grammar to
>another, and hence subject to the same epistemological
>vulnerability you attribute to the latter.

Hardly.  At any given period the speakers of a language will
produce a corpus of language.  For this to be useful for
historical purposes it must be recorded somehow -- either as
texts or by being recorded as a spoken corpus by a competent
field linguist.  The recorded corpus of language becomes a body
of data.  This data can be minutely compared with similar data
from other periods and the differences noted in detail.  Consider
the following diagram:

  speakers     +     rules      -->      Old English
                                               |
  speakers     +     rules      -->      Middle English
                                               |
  speakers     +     rules      -->      Modern English

Synchronic (generative) grammar is concerned with the horizontal
rows.  Historical grammar is concerned only with the final
column.  Since we don't know what goes on in speaker's minds,
everything to the left of the arrows is an assumption.  We assume
that speakers start from a certain point and apply a certain set
of rules to arrive at the corpus on the right of the arrows.  But
everything to the right of the arrows is data.  When we move from
one set of data to another we also formulate rules to account for
the changes.  The difference is that that the historical rules
are not based on assumptions about how speakers produce language,
but on observed differences in the language that speakers have
produced.  Synchronic grammar is simply cut out of the loop in
historical grammar.

I am not trying to say by this that historical grammar is all cut
and dried and that synchronic grammar is all guesswork.  Far from
it.  Historical rules still have to be deduced, but the
difference is that the forms on either side of the rules are
known and thus there is a qualitative difference in the
derivation of historical rules and synchronic rules.  Now when we
are talking about phonetics (pronunciation) based solely on
written records (as opposed to a phonetic transcription in IPA
made by a linguist) there will always be uncertainties because
writing systems vary in their fit to the phonological system of
the language they express.  Still, there are enough strong clues
in the writing (spelling variants; puns; and, in poetry,
alliteration, rhyme, and meter) to narrow down pronunciations
with reasonable precision.

>So, in your view, we have, on the one hand, "the language that
>speakers produce" -- texts, I guess. Presumably the "empirically
>verifiable facts" on which you see historical grammar as being
>based are of the same sort.

Precisely.  This is what I have said.  The language that speakers
produce synchronically is exactly the same language that we
arrive at by tracing its historical development.  The outcomes of
both methods are the same.  But the historical rules and the
synchronic rules that produce it will not necessarily be the same.

>On the other hand we have "the cognitive processes that produce
>language", "whatever reality may exist in a speaker's head",
>which is at present unknowable. Yet apparently we are able to
>describe "how language works linguistically", "a realistic
>picture of language". And what criteria do we have as to what is
>"realistic" in this description? Is this what was once called the
>Hocus Pocus view -- that a linguistic description is simply an
>efficient and elegant description of a body of data, without
>reference to any possible mental reality?

Realistic means based on realia -- things that actually exist as
opposed to mental constructs.  When you start talking about
"mental reality" you are getting into a murky area of philosophy.
Is there such a thing as "mental reality"?  This is something
that can be discussed endlessly and inevitably inconclusively.
Language is essentially a mental construct, one that is shared
among its speakers.  Because of the transitory nature of sounds,
unless language is recorded somehow it exists only in the minds
of its speakers (and hearers).  When the last speaker of a
language dies, unless that language has been recorded somehow, it
is gone forever.  There is no hope of reconstructing it.

There are currently some 6000 to 10,000 or more languages in the
world (depending on who is doing the counting and what criteria
they are using for identifying different languages), many, if not
most, of which have not been recorded.  And who knows how many
thousands of other languages have existed since the use of human
language began and have disappeared without a trace.

So one of the urgent tasks of linguists is recording languages
that have no written records in the hope that the more languages
that are available for study, the greater the possibility of
obtaining a "realistic picture of language" and thus determining
"how language works linguistically."  So long as everything to
the left of the arrows in the diagram above is uncertain, the
only clue to it is in what is to the right of the arrows.  Thus
the constant (and usually frustrating) search for linguistic
universals in the hope that such universals will provide clues to
universals of human cognition and thus help explain what goes on
on the left side of the arrows.

>RC>> The rest of your post is entirely dependent on the further
>RC>> assumption that native speakers of modern English (in general,
>RC>> not just linguists) distinguish "foreign" from "native" words,
>RC>> and that the words I listed with /th/ in voiced environments are
>RC>> marked as "foreign". Since I don't share this assumption, I would
>RC>> like to know what evidence leads you to it. Do you have any such
>RC>> evidence, other than the fact that by excluding these hundreds of
>RC>> words you can arrive at a nice phonological generalization?

>RW> Let me answer this from back to front.

>RW> You ask what evidence I have that native speakers
>RW> distinguish by rule the [th] of loan words from the [dh] of
>RW> native words other than hundreds of examples and the fact that it
>RW> produces a nice phonological generalization.

>This is not quite what I said. I said that in order to arrive at
>your nice generalization you need to *exclude* hundreds of
>English words, and I asked what independent evidence you had that
>these words are somehow marginal to the structure of English?

I don't remember saying that these words were marginal to the
structure of English and I don't remember your saying it
previously.  I didn't say these words were marginal to the
structure of English.  I said that these words follow a different
rule from native English words and that foreign words don't
necessarily follow the same rules as native words.  So these
words aren't marginal to English -- there are too many of them --
they just follow different rules.

But I can see that I shouldn't expect connections to be made when
these connections have not been expressly stated.  So let's start
again.  Grimm's Law says that certain PIE consonants shift in a
certain way in Germanic.  But there are a large number of
exceptions.  One exception is that unvoiced stops don't shift
when the stop is the second part of a voiceless cluster (thus
/sp/, /st/, and /sk/ remain unshifted).  However, even with this,
there are still a large number of exceptions to Grimm's Law.
Verner's Law says that when the PIE stress fell after the
consonant then the consonant shifted in a different way.  So
Verner's Law in effect *excludes* hundreds of Germanic words from
the effects of Grimm's Law thereby producing a nice phonological
generalization.  What independent evidence is there that Verner's
Law correctly excludes these words?  Only the fact that when it
can be checked, the PIE stress does fall after the consonant in
those cases where the consonant does not shift according to
Grimm's Law.  Of course this might just be coincidence, and some
people might not accept Verner's Law and ask what independent
evidence there is for it other than that when you exclude these
words using this rule you get the nice phonological generalization
known as Grimm's Law.

>RW> This seems rather like asking what evidence I have for
>RW> Grimm's Law or Verner's Law other than hundreds of examples and
>RW> the fact that they provide a nice phonological generalization.

>No, because you are not presenting this as a historical law. As
>a historical explanation of why /th/ occurs in certain places and
>/dh/ in others, it is not in dispute.

Historical law - historical explanation -- What's the difference
in your mind?  What happened happened.  If the explanation can be
stated as a rule and is not in dispute, what is the problem?  The
outcome of the historical processes that produced the modern
language and the synchronic (generative) processes that produce
the modern language are the same: the modern language.  The
generative rule that produces [th] in these words may be
(probably is) different from the historical rule (explanation),
but there must be a rule or else there wouldn't be a pattern.  It
is a basic premise of generative grammar that alternations must
be explained by rules.

<snip of evidence that speaker can keep track of foreign words
for several centuries>

>The general possibility is not in question; the question is
>about particular examples. Hock offers evidence in the Old Irish
>case, namely the failure of p-initial words "for some time" to
>participate in the lenition process. This would certainly mark
>them as exceptional in that respect. But note that he does not
>argue that the mere fact of having initial p- is evidence of
>their "foreign" status.

And as I have said, it doesn't matter whether it is considered
"foreign" or not.  It is sufficient that it is recognized as
following a different rule.

>RW> If not, how can one account for the un-Anglicized
>RW> pronunciation of 'chanson' in English at least 400 years after
>RW> it entered the language (first attested in 1601 according to my
>RW> dictionary).

>Actually, my American Heritage dictionary gives a fully
>anglicized pronunciation /S'æns at n/ (initial sh-, rhymes with
>"Manson"), but I admit I've never heard it. The word has probably
>been re-introduced once or twice since 1601, but the persistence
>of the nasal-vowel pronunciation is no mystery: the word refers
>to specifically French things (medieval epic poems or 19th-20th
>century popular songs), and it is used almost entirely by people
>who have some familiarity with French and could tell you (if you
>would allow this as data) that it's a French word.

I believe that I said exactly the same thing concerning 'chanson'
in another posting.  What I said there was:

   But 'chanson' has been in the language for about 400 years and
   it still has its French pronunciation.  It simply resists
   Anglicization because it is not known to most naive native
   speakers.  People who know this word are likely to know that
   it has a French pronunciation and to know why.

I would have to make an exception among naive native speakers to
include the AHD, but otherwise I think we agree that people who
know this word are likely to know French as well.  So long as
English and French are in contact (i.e., have bilingual speakers)
it is even possible that this word is reborrowed by every
generation.  And, of course, I would accept a speaker's
explanation that the pronunciation of this word as French is
because it is a French word as data.  But I would be skeptical if
a speaker told me that its pronunciation was because it was a
medieval term or because it referred to music or poetry.

>RW> But ultimately, it is unimportant whether speakers can still
>RW> recognize foreign words after several centuries or not. The
>RW> pronunciation rules are marked in the lexicon.  The native
>RW> speaker learns these rules and follows them.  It is these rules
>RW> that produce the pattern, not the speaker's perception of the
>RW> words as native or non-native.

>RW> For other evidence that native speakers distinguish by rule
>RW> the [th] of loan words from the [dh] of native words I offer the
>RW> pattern created by the presence of intervocalic [th] and [dh] in
>RW> English words.

>This is not "other evidence". This *is* your evidence.

Then my position is unassailable.  As with Verner's Law, the fact
that the rule by which the exclusions are made is valid and leads
to a valid generalization is sufficient.

<snip of validation of distribution pattern of intervocalic [th]
and [dh] in English>

>RW> One thing that can be noticed is that there are only 15
>RW> lemmata with intervocalic [th] that can be marked as non-native
>RW> (and 5 of these are clearly derivative).  You say that there are
>RW> hundreds of these words in the language (and so there are, if not
>RW> thousands).  This makes one point quite clear.  While there may
>RW> be quite a large number of loan words in English, the core
>RW> vocabulary -- the most commonly used words -- remains primarily
>RW> native.  Thus a search of 6318 lemmata with over 800 occurrences
>RW> in a corpus of over 6 million words turns up only 15 (10 if you
>RW> eliminate derivatives) of these lemmata.

>I'm not sure this elaborate demonstration was necessary. Nobody
>disputes that the pattern exists in words of OE origin. Nor is it
>any secret that non-OE words are less frequent in basic
>vocabulary than in the lexicon at large.

Perhaps, but it is nice to have data to back up one's opinions.

>RW> This further suggests that the knowledge of many of the
>RW> lemmata with intervocalic [th] is a function of vocabulary size.
>RW> The larger one's vocabulary the more such lemmata (not simply as
>RW> an absolute, but as a percentage of the total) one is likely to
>RW> know.  Vocabulary size is also correlated with educational level.
>RW> Hence by the time that one has acquired a large number of such
>RW> lemmata, one is likely to be sufficiently educated to realize
>RW> that such words are loans.  Poorly educated native speakers are
>RW> not likely to realize that these words are loans.  They may even
>RW> mispronounce them; but then, dictionaries are not written (nor
>RW> often consulted) by native speakers of this educational level.
>RW> Thus my test of the pronunciation of these words may not be
>RW> entirely accurate, since I am relying on my own pronunciation and
>RW> on the pronunciation given by the dictionary.  As a turnabout, if
>RW> you have evidence that native speakers regularly mispronounce
>RW> these words because they don't know that they are loans, that
>RW> would be germane.

>I'm puzzled by you asking me for this evidence.

And I'm puzzled by your being puzzled.

>As I understand your position, you are claiming that there is a
>single phoneme, say /th/, with a realization rule that says
>(among other things) that /th/ is realized as [dh]
>intervocalically, except in certain exceptional words.

Close.  I claim that there was once a single phoneme /th/ in
English.  Then that this phoneme became voiced to [dh]
intervocalically without creating any new contrasts (/th/ now
has allophones [th, dh]).  Then that any words that came into the
language with intervocalic [th] after this sound change operated
retain [th] intervocalically (this is not necessarily restricted
to loanwords; it could also include new coinings -- there just
don't seem to be any with intervocalic [th]).  There is nothing
particularly exceptional about words with intervocalic [th]
except that the vast majority of them came into the language
after the sound change.  There are a few native words with
intervocalic [th] (for reasons that can be accounted for), but
there are no loanwords that originally had intervocalic [th] that
now have [dh].  This is exceptional, at least to the extent that
there are no exceptions.

>Only the more educated and literate speakers may realize that
>these exceptional words are loans. (Surely, however, this is not
>a function of the number of such words in one's vocabulary, but
>of things one reads or is taught.)

What I said was "Vocabulary size is also correlated with
educational level."  What, then, is the difference between
"educational level" and "things one reads or is taught"?  The
more things one reads or is taught, the higher one's educational
level, and the more likely one is to have a larger vocabulary and
hence the more likely one's vocabulary is to contain a larger
percentage of loan words or neologisms.  By educational level I
do not refer to passing thorough required fixed stages of
education without having learned anything.  As Henry Fielding has
Tom Jones say:  "it is as possible for a man to know something
without having been at school, as it is to have been at school
and to know nothing."  It has been my experience that many
graduates of high school and some colleges and universities do
not have a particularly high educational level.

>Meanwhile, how do we know that children and the less educated
>actually formulate the rule this way? If they did, one would
>expect that errors consisting of pronouncing [dh]
>intervocalically in words which should have [th] would be common.
>I don't recall any such tendency from my experience as a native
>speaker of English, but surely it's you that should be looking
>for such evidence.

My point is that we don't know how any speaker actually
formulates the rule.  What goes on on the left side of the arrows
is a mystery.  All we know is that there is a pattern, the
historical reason for the pattern, and that, synchronically,
patterns must be accounted for by rules.  My further point is
that it doesn't matter that we can't formulate the synchronic
rule.  There must be one, or the pattern wouldn't be there.

And I must confess that I did not consider the effect of
mispronunciation in the same way that you do.  You said that
native speakers don't recognize these words with intervocalic
[th] as loans.  My assumption was that if they don't recognize
them as loans, then they would mispronounce them by treating them
as native words and pronouncing intervocalic <th> as [dh].  This
would support your contention.  But your assumption seems to be
that if they do recognize them as loans then they would see them
as words in need of nativization and would mispronounce them in
an attempt to nativize them.  This would support what you see as
my contention.

But it is not my contention that native speakers recognize these
words as loans.  That is, as you so readily agree, merely the
historical explanation for why these words have intervocalic
[th], and what you have assumed is my contention.  I will quite
as readily agree that I don't really know how native speakers
recognize the words as exceptional.  The more educated may
recognize them as loans, the less educated may simply memorize
the correct pronunciation as exceptions (as they memorize the
exceptions in plural forms such as 'foot' / 'feet'.  Since
neither of us have noticed any tendency to mispronounce these
words the point about what such mispronunciation would prove is
moot.  But I do maintain that there is some synchronic rule that
maintains this pronunciation or else *all* loans with
intervocalic [th] wouldn't still have this pronunciation.

>By the way, 6318 words is a pretty basic vocabulary. And your
>list doesn't include some words (arithmetic, ether, various
>personal names) that were part of my vocabulary and that of all
>my contemporaries by the age of 10.

I apologize for the inadequacies of the list, but it is not
really "my" list.  All I did was extract words with intervocalic
<th> from somebody else's list.  As such 'arithmetic' would not
appear in "my" list because it does not have intervocalic <th>
(but I must report that the original list did not have
'arithmetic' either and therefore it had fewer than 800
occurrences in the six-million-plus words of the corpus).  To my
knowledge, personal names do not occur in the list because they
are not lemmata.

>RW> The best evidence that the pattern is created by rule by the
>RW> speakers of the language is the fact that the pattern exists.
>RW> For if there is a pattern in a language, synchronically it must
>RW> be created by its speakers because that is the only place that
>RW> language comes from.  If the pattern is not created by rule, then
>RW> it is simple coincidence.

>No, this is where you go wrong.

I presume that "this is where you go wrong" is your way of saying
that your opinion is different from mine. :)

>The pattern has been created by historical changes in the
>language. That is what creates the distribution of [dh] and [th]
>that is the input to each speaker's language-learning task. There
>is no general principle that speakers must recognize (consciously
>or unconsciously) any such pattern, or make use of it in their
>rule-governed language behaviour.

No, the modern language is not created by the historical events
that brought it about.  The modern language is created by what
goes on on the left side of the arrows in the diagram above.
This is the assumption of generative grammar.  And it is a basic
premise of generative grammar that patterns must be accounted for
by rules.  If you have evidence, rather than just an intuition,
that linguistic patterns are not created by rules, then please
share it.  It will revolutionize synchronic grammar.

Children learning a language as a native speaker do not learn the
history of the language.  Children learning a language do not
memorize rules that lead to grammatically correct utterances
(this comes later, in school, after the child has essentially
already learned to speak the language; but if the child never
goes to school, it will still speak the language).  Children
learn the language by imitation of what they hear and by internal
analysis of what they hear through deduction, induction and
abduction to arrive at their own rules of grammar.  In doing
this, they make a lot of mistakes.  Sometimes these mistakes are
corrected, either externally, or internally, by better imitation
or analysis.  Sometimes they are not.  In essence the synchronic
grammar is recreated by each generation of native speakers (with
a little help from their ancestors and their peers).  This is one
reason why living languages constantly change and dead languages
(those with no native speakers) don't.  But to say than native
speakers learn their language by memorizing historically created
patterns flies in the face of all theories of generative grammar
and L1 acquisition.  L2 (and higher) acquisition may work this
way, but, unless accompanied by total immersion in the language,
it seldom leads to native fluency (and often not even then).

RW> You said above that

RC>>    The rest of your post is entirely dependent on the further
RC>>    assumption that native speakers of modern English (in general,
RC>>    not just linguists) distinguish "foreign" from "native" words,
RC>>    and that the words I listed with /th/ in voiced environments
RC>>    are marked as "foreign". Since I don't share this assumption,
RC>>    ...

>RW> You are saying that this pattern can't be based on the fact
>RW> that words in English with intervocalic [th] are overwhelmingly
>RW> loan words because the speakers of the language are unaware that
>RW> these are loans.  As I mentioned above, this is not just a weak
>RW> argument, it is a spurious one.  It may be true that most
>RW> speakers do not know that these are loans, but that is irrelevant
>RW> to the phonemic analysis of the language.  Were it not, then most
>RW> linguistic analyses would have no validity because speakers of a
>RW> language are in general woefully ignorant of the linguistic
>RW> mechanisms of their language.

>Actually, I am arguing that English speakers do not consider
>words like "author", "Ethel", or "method" to be unusual in any
>way.

First, let us dispense with the idea that there is nothing in any
way unusual about 'Ethel'.  You can't say that 'Ethel' is not
unusual in any way because it is unusual from the moment it stops
being a word in the language and becomes a personal name.  It is
no longer a lemma of the language, it becomes part of a special
corpus, the corpus of personal names.  Generally, you won't find
personal names listed in the lexicon (dictionary), although they
may be collected in a separate section.  If personal names are
listed in the lexicon, they do not have a meaning, only a
functional label: "masc. PN", "fem. PN", or the like.

In addition to being lexically exceptional, personal names, are
also phonologically exceptional.  Personal names are, well,
personal.  People identify with their names in more ways than
one.  When someone has been called something all his life (his
name), he will resist changing it; even if the sounds in his name
change in the language he will resist introducing these changes
into his name.  If others mispronounce his name, he will correct
them.  Personal names, then, have the potential to withstand
phonetic changes that take place elsewhere in the language with
the result that such changes often bypass personal names.

So linguistically, practically everything is unusual about
'Ethel'.  Historical linguists know that personal names are a
fruitful hunting ground for archaisms -- forms that have long
disappeared from the rest of the language.  Even small children
know that names are different.  When my daughter was about 3, she
got a new doll.  She soon began saying "sisibamakin".  When I
asked her what "sisibamakin" was she said it was her doll.  "But
what does it mean?", I asked.  "It doesn't mean anything, it's
her name," was her reply.  So speakers don't really treat
personal names as a normal part of language.  They are simply
labels without linguistic connotations.  They stand outside both
the normal lexicon and normal phonological rules.

So if someone tells you that his or her name is Abraham or Avram
or Ethel or Stanley or Ahmed or Rumpelstiltskin, you don't
question how it fits into the phonology of your language.  You
just accept it because that's his or her name.

>Your only basis for disagreement seems to be that your
>single-phoneme analysis requires them to be marked as exceptions.
>Other evidence one might imagine, such as deviant morphophonemic
>behaviour, acquisition difficulties, or the existence of more
>"nativized" variants, appears to be entirely lacking.

Yes, they are only marked by having intervocalic [th] where the
vast majority of native words have intervocalic [dh].  But the
very fact that all loanwords that came into the language with
intervocalic [th] still have this pronunciation is sufficient to
show that speakers must know some rule that preserves this
pronunciation.  Otherwise, words with intervocalic [th] and
intervocalic [dh] wouldn't fall into such neat piles.  There is
no need to nativize words with intervocalic [th] because there
are native words with intervocalic [th] (which are again
exceptions that can be accounted for).  Speaker can either
memorize the pronunciation of each of these words or they can
find some way of generalizing a rule so that they don't have to.
Uneducated speakers probably do the former, and those with more
education or experience probably do the latter.  But either way,
they don't get any help from the orthography.

>Excuse me. I'm tired, and I see there are at least a dozen
>paragraphs left. I'll snip them for now, and perhaps we can
>return to them another time, if this has not drifted too far
>off-topic for IE.

Well, at least we're talking about an IE language.  And I for one
find it interesting.

Bob Whiting
whiting at cc.helsinki.fi



More information about the Indo-european mailing list