35.727, Review: The Typological Diversity of Morphomes: Herce (2023)

Sat Mar 2 20:05:04 UTC 2024

LINGUIST List: Vol-35-727. Sat Mar 02 2024. ISSN: 1069 - 4875.

Subject: 35.727, Review: The Typological Diversity of Morphomes: Herce (2023)

Moderators: Malgorzata E. Cavar, Francis Tyers (linguist at linguistlist.org)
Managing Editor: Justin Fuller
Team: Helen Aristar-Dry, Steven Franks, Everett Green, Daniel Swanson, Maria Lucero Guillen Puon, Zackary Leech, Lynzie Coburn, Natasha Singh, Erin Steitz
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Justin Fuller <justin at linguistlist.org>
================================================================

Date: 02-Mar-2024
From: Michael Maxwell [mmaxwell at umd.edu]
Subject: Morphology, Syntax, Typology: Herce (2023)

Book announced at https://linguistlist.org/issues/34.2138

AUTHOR: Borja Herce
TITLE: The Typological Diversity of Morphomes
SUBTITLE: A Cross-Linguistic Study of Unnatural Morphology
PUBLISHER: Oxford University Press
YEAR: 2023

REVIEWER: Michael Maxwell

SUMMARY

As the title suggests, this book is a discussion of morphomes.  In
particular, this work takes a typologically wider view than most
discussions of the topic, which have mostly emphasized morphomes in
Romance languages.  The author summarizes his goal (pg.263): "...to
advance our understanding of precisely which conditions and forces are
operating when unnatural morphosyntactic patterns do manage to get
established and successfully replicated in a language."

I will start with a diversion, by defining "morphome", lest some
readers consider my spell checker (along with me) to be defective.
Morphemes (with an 'e' in the second syllable) are usually considered
to be minimal sequences of phonemes that bear some meaning, to include
roots and affixes.  Immediately there are issues, such as affixes that
consist of suprasegmentals, discontinuous affixes (like circumfixes)
and roots (as known from Semitic languages), perhaps zero (null)
affixes, and so on.  Morphemes may also take the form of allomorphs,
which are often phonologically conditioned, although they may also be
lexically conditioned (as in inflection classes).  All this should be
familiar to most readers of this review.

Crucially for the present discussion, affixal morphemes are usually
considered to have a meaning which reflects natural classes of
morphosyntactic features.  Inflectional affixes which mark person and
number, for instance, typically indicate a contiguous part of the
inflectional paradigm: a language may have one suffix marking singular
and another marking plural for all persons; or a first person suffix,
a second person suffix, and a third person suffix; or affixes marking
a combination of features, such as first person singular, and so
forth.  Likewise, where stems have allomorphs, those allomorphs are
usually phonologically distributed, or else distributed (like affixes)
according to natural morphosyntactic classes.  (In the latter case,
the stem allomorphs are sometimes treated as a single allomorph
modified by an infix, suprafix, or some other kind of affix.)

One might expect that this is not only commonplace, but that it is the
only way languages work--that each affix (I will return to stems
momentarily) represents a single set of feature values, i.e. "this
feature value AND this feature value AND...", such as [+A -B].  One
would be wrong.  There are many ways this expectation is violated.
Perhaps most familiarly, there are affixes which mark the "elsewhere"
case; that is, a paradigm will encode some set of feature
combinations, but only one or a few of these combinations will be
marked by a single affix, and the remaining feature combinations will
be marked by another affix.  English present tense verbal morphology
comes close to this: there is one -s suffix that marks the third
person singular present tense, while all other present tense person/
number combinations are either unmarked or marked by a null affix,
depending on your theory.  Shuar (a Chicham language of Ecuador) is
even clearer; there are distinct possessive suffixes for first person
singular and second person singular, plus an additional suffix for
everything else.

More startling are affixes which encode more than one set of
morphosyntactic features, like [[+A +B -C] OR [+A -B +C]].  This is a
form of syncretism ("cases in which the same phonological string is
used to express distinct combinations of morphological
features"--Albright and Fuß 2012), although sometimes analyzed as two
or more distinct but (accidentally) homophonous affixes.  An example
would be Latin neuter second declension nouns, where the suffix for
the nominative, vocative and accusative is -um, while the remaining
cases--genitive, dative and ablative--have other suffixes.  Assuming a
simple feature system, this -um suffix would encode
     [Number singular
       [ [Case nominative] OR
         [Case vocative] OR
         [Case accusative]
       ]
     ]
Crucially, this disjunction of nominative or vocative or accusative
case appears to be a non-natural class, i.e. there is nothing that
these three cases have in common that the remaining cases do
not--hence the need for the "OR".  (One could of course argue that
some funky features make this a natural class, or that Latin had three
different but homophonous -um suffixes.)

Perhaps still more surprising are instances where stem forms--not
affixes--are used in paradigm cells which do not constitute natural
classes.  A commonly cited example is in Spanish, where for a subset
of verbs, all person/ number combinations of the present subjunctive,
plus the first person present indicative, contain a velar consonant at
the boundary between the stem and the suffix, which consonant is not
found in the rest of the paradigm: 'konoses', "you (sg.) know
(indicative)", but 'konosko' "I know (indicative)", 'konoskas' "you
(sg.) know (subjunctive)". (I use a phonemic rather than orthographic
transcription of non-Castilian dialects.)  Again, the portion of the
paradigm where this happens is clearly not a natural class; if it were
only the present subjunctive forms, it would be natural, but the
inclusion of the first singular present indicative results in a
non-natural class.  Under the usual analysis, the velar consonant is
treated as part of the stem, so the stem allomorph containing the
velar consonant is a morphome.  An alternative analysis is that the
velar consonant is part of the suffix, but this would still be a
morphome because of its distribution.  This is only one of several
such morphomes in Spanish and in Romance languages more generally,
which seem surprisingly stable over centuries of language change.

So much for a brief explanation of the concept of morphome.  The
Romance language morphomes have been well studied, and their
historical evolution is reasonably well understood.  What the book
being reviewed here brings to the table is brief descriptions of
morphomes in other languages, indeed 79 languages across a wide
variety of language families.

The brief first chapter introduces the concept of morphome in rather
more detail than I have done, mentioning some of the issues that will
come up, such as what a natural class is.  The much longer second
chapter discusses problems in identifying morphomes.  Since the
concept of morphome is pre-theoretical, belonging more to typology
than to any particular theory of morphology or phonology, Herce casts
a wide net, highlighting some of the questions that will come up when
deciding whether a paradigm in some language does or does not contain
morphome(s).

The third chapter, "Morphomes in diachrony", discusses how morphomes
come about through language change.  A number of different origins are
described and illustrated by short case studies, with sound changes
being perhaps the most common cause.  The diachronic origins are
revisited for most of the 79 languages of the next chapter, being
omitted when there is not enough data (e.g. for language isolates).

The title of the fourth chapter, "Morphomes in synchrony" (playing off
the title of the preceding chapter), is mostly a "database" (more on
that term in my evaluation, later) of morphomes in 79 different
languages of many different language families.  The chapter begins
with a brief description of Herce's criteria for inclusion
(recapitulating some of the discussion in the second chapter).  The
bulk of the chapter consists of descriptions of morphomic patterns of
individual languages, illustrated by slices of the paradigms, with
cells containing morphomes highlighted (more on this formatting
later).  And the chapter ends with discussion of quantifying various
properties of the languages' morphomes, including statistical
properties abstracted across the language sample.  One take-away here
is that some morphomic patterns, expressed as disjunctions of
morphosyntactic features, are much more common than others.

The fifth (very brief) chapter, "Implications," brings up some
theory-based considerations, such as the place of morphosyntactic
features and the resulting non-natural classes in morphology.
(Interestingly, similar questions about features and non-natural
classes have arisen in phonology, see for example Mielke 2008.)

The final chapter, "Conclusions", summarizes findings based on the
data of Chapter Four, such as the fact that morphomes are found in
many language families (not just Romance languages, where they had
been widely studied); and the fact that morphomes are diachronically
resilient, that is, they appear to last for generations of
speakers--perhaps militating against the notion that they are somehow
peripheral to language.  (Probably the last part of Chapter Four and
all of Chapter Five could have been combined with the contents of
Chapter Six.)

EVALUATION

There is already a substantial literature on morphomes, much of it
concerning the synchronic issues, dating back to before Mark Aronoff
coined the term around 1994; in the older literature, it often comes
under the rubrics of "irregularity", "exceptions", "rule features" and
"diacritic features" in phonology and/or morphology (see e.g.
Zonneveld 1978, and Harris 1978).  In terms of stem-based morphomes,
much of that discussion concerned Romance languages--to be sure, an
important topic.  The added value of this book is that it presents
morphomes in many other languages.  What the book does not attempt to
do is to devise a theoretical explanation for the synchronic analysis
of morphomes, although the synchronic place of morphosyntactic
features (and thus the theory of such features) is briefly touched on
in in Chapter Five.  This focus on the data seems to me a perfectly
laudable goal.

That said, there is considerable discussion in this book of the
diachronic origins of morphomes in many of the 79 languages examined.
I am not a historical linguist, but most of the explanations appear at
least plausible to me.

One thing that I found odd--although I understand the motivation--is
that for the most part Herce does not distinguish lexical (root, stem,
whole word) morphomes from affixal morphomes.  To me, this distinction
is crucial, since in most cases of affixal morphomes, there will be
one or at most a couple such morphomes, and the homophonies can
therefore often be argued to be unimportant accidents (syncretisms)
which need not be explicitly addressed in the grammar.  Whereas with
lexical morphomes, there is generally a much larger number, and some
account must be made.

For example, in the Spanish case I brought up earlier, if the velar
consonant is part of the suffix, then there are handful of such
morphomes: the first person singular present indicative '-ko', and the
present subjunctive affixes, which are at most five, and which could
possibly be reduced to one (-ka) under a more agglutinative analysis.
(There are also voiced and unvoiced allomorphs, but these are
phonologically predictable.)  Whereas there are dozens of verbs whose
stems take different forms in different parts of the paradigm.
(Depending on your theory of phonology, the velar consonant might also
be epenthetic, belonging neither to the stem nor the affix.)  In fact
in many cases, the affix vs. stem distinction is quite clear; another
morphome in Spanish has to do with diphthongization of the
stem-internal vowel, and it would take quite a contortion to call this
monophthong--diphthong alternation an affix.

Moreover, in the case of affixal morphomes, the puzzle is why the same
phonological string is used in distinct paradigm cells; whereas in the
case of stem morphomes, the puzzle is the opposite: why the same
phonological string is not used in all paradigm cells.  Hence I would
have categorized the instances of morphomes as stem/ root vs. affix
vs. ambiguous, with possibly a separate category for suppletive whole
word morphomes.

Some languages' descriptions are less clear than others; for example,
I found the discussion of the Biak language confusing until I
consulted the original source (Heuvel 2006).  Herce refers to vowel
length, but it is not obvious in his table (4.78) that there is vowel
length--it turns out that the forms are cited in the Biak orthography,
which uses an acute accent mark to represent length, a fact mentioned
in the original source.  Herce's discussion also says an epenthetic
vowel as unique to certain forms, but this is in fact phonologically
predictable (Heuvel 2006: 27).  (Herce's discussion also refers to
this paradigm's affixes as suffixes; they are in fact prefixes.)

I have alluded above to the table formatting.  Given that most readers
(myself included) will know only a few of the languages, the choice of
how the paradigms are "sliced" (you can't easily show the entire
paradigm of the Spanish verb, for example, and it would only be
confusing) is crucial; in this, I believe Herce has been eminently
successful.  Less successful is the use of shading.  For tables where
there is more than one morphome, the shading is inconsistent between
tables and can be confusing.  Coloring is used in only a handful of
tables, and would have been welcome in many more.  Coloring is used in
a few figures, but could also have been used more widely.  I realize
that colored ink can be expensive, but the PDF (where color would be
free) is like the printed book.  A few tables show morpheme breaks (or
the stem is bolded, but only in one table), which is also helpful;
more (where these are more or less unambiguous) would have been even
better.

There is a slightly misleading discussion of Zipf's Law on p88, where
it says "...more frequent words and meanings tend to be shorter.  This
is known as Zipf's (1935) law."  I'm not sure what it means for a
"meaning" to be shorter, but in any case Zipf's 1935 law does not
refer to the length of words, rather it is strictly about rank as
measured by frequency, and token  frequency (specifically, the
relative frequency of the Nth most frequent word is approximately
0.1/N, although there are other mathematical formulations).  Zipf did
discuss the length of words in his later (1945) work, claiming that
more frequent words tend to be shorter, as measured in phonemes
(English) or syllables (Latin); this principle has since been extended
to many other languages, and has come to be known as "Zipf's Law of
Abbreviation", or the "Brevity Law."  (To be fair, Herce is not the
only writer to collapse Zipf's Law based on rank with Zipf's Law of
Abbreviation.)

Perhaps my greatest criticism of this work is that the data on
individual languages (the bulk of Chapter 4) is referred to as a
"database", but it is not.  A database is contained in some clearly
laid out format, a format which is computationally processable by
sorting, filtering, extracting, adding and deleting entries, and
perhaps other computations, depending on the kind of data (numeric
data allows different sorts of processing than text data).  Examples
of such database formats include relational databases (of which SQL
databases are the most common), spreadsheets, tab- or comma-delimited
tables, and XML- and JSON-formatted data.  Print documents and even
PDFs are not databases.  I emphasize this because the data gathered
here is a goldmine, but because it is not a database, it is far less
easy to work with than it deserves to be.

There are also inconsistencies in the information given for each
language in Chapter 4.  Some of this is to be expected; it is
difficult to surmise the diachronic origins of morphomes in language
isolates, for example.  But some omissions appear to be accidental:
for many languages, there is a summary of the morphomic distribution,
e.g. "Chinantec, L2: 1PL/2.Completive/3" (meaning the second
alternation for "L", where "L" appears to refer to the Lealao variety
of Chinantec, and the morphome exists in the three stated regions of
the paradigm); but for many other languages, including Palantla
Chinantec, there is no such summary.

There is at least one mention (pg.256) of "the supplementary materials
that accompany this book."  I did not find any indication of where
these supplementary materials might be.  The book in its entirety is
available as an open-access PDF from
https://academic.oup.com/book/45787, but there does not appear to be
any link from there (including the citation on pg.256 in the PDF) to
any supplementary materials.  Herce's Google Scholar page
(https://scholar.google.com/citations?user=FZ4EX7kAAAAJ) includes a
link to his 2023 open access article in the journal Morphology, and
this contains a few supplementary materials, but apparently not all
the material that was used in this book.  That journal article also
includes a link to the searchable 2010 Oxford Online Database of
Romance Verb Morphology, attributed to Martin Maiden and others,
although as the title suggests this is restricted to Romance
languages.

Typos appear to be minor, although obviously I could not check most of
the language data for accuracy.  There were a few errors in the
bibliography and citations.  The citation to "Harbour 2019" on pg. 257
does not appear in the bibliography, although there is a bibliographic
entry for Harbour 2008, which may be the intended reference.  There
are a few places where an author's name has been given differently in
different entries, resulting in misplaced entries.

Bottom line: Herce is exactly right to expand the discussion of
morphomes to more languages and language families.  Having read many
grammars myself, I am amazed that he has managed to read so many, and
astounded that he has condensed them in such a brief and insightful
manner.  No doubt there will be re-analyses of the data for some of
these languages (it would not be the first time that someone mis-read
or mis-transcribed a grammar), but in general Herce's descriptions
appear sound, and for the few languages that I am familiar with, I can
say that his descriptions are accurate.  I do hope that the
descriptions can be ported into a real database, probably in XML or
JSON (spreadsheets are probably a poor way to represent the data in a
searchable fashion, and a relational database would doubtless be a
mess, to use the technical term).

Now it's time for the theorists to take into account the fruits of
Herce's research.  In particular, these results should inform work on
natural classes in morphology (might one hope for spill-over into work
on natural classes in phonology), and into morphomes specifically.  I
have already mentioned the question of morphomic affixes vs. morphomic
stems or other lexemes; while Herce's work does not immediately
separate those cases (as discussed above), it would not be difficult
to add that information, and the distinction will prove relevant to
many theories.  Another line of effort would be the extent to which
the "elsewhere" principle can explain some of the patterns, since
"elsewhere" is almost by definition not a natural class.  I look
forward to other theoretical advances coming out of this work--or
equally, disconfirmations of previous theoretical proposals, perhaps
on the typology of person/ number features.

REFERENCES

Albright, Adam, and Eric Fuß. 2012. "Syncretism".  Pp. 236--288 in
Trommer, Jochen (ed.) The Morphology and Phonology of Exponence.
Oxford Studies in Theoretical Linguistics.  Oxford: Oxford University
Press.

Harris, James. 1978. "Two theories of non-automatic morphophonological
alternations."  Language 54(1): 41--60.

Herce, Borja. 2023. "Morphological autonomy and the long-term vitality
of morphomes: stem-final consonant loss in Romance verbs and
paradigmatic analogy."  Morphology 33(2): 153--187.
https://rdcu.be/dusPD.

Heuvel, Wilco Van den. 2006. Biak : Description of an Austronesian
Language of Papua. Lot, 138. Utrecht: LOT.  https://research.vu.nl/ws/
portalfiles/portal/42174909/complete+dissertation.pdf
(https://hdl.handle.net/1871/10282)

Mielke, Jeff. 2008.  The Emergence of Distinctive Features.  Oxford:
Oxford University Press.

Zonneveld, Wim. 1978. A Formal Theory of Exceptions in Generative
Phonology.  Lisse: The Peter de Ridder Press.

ABOUT THE REVIEWER

Dr. Maxwell is a retired researcher in computational morphology and
other computational resources for low density languages, formerly at
the Center for Advanced Study of Language (later the Applied Research
Laboratory for Intelligence and Security) at the University of
Maryland.  Before that he did research at the Linguistic Data
Consortium at the University of Pennsylvania, and studied endangered
languages of Ecuador and Colombia with the Summer Institute of
Linguistics.

------------------------------------------------------------------------------

Please consider donating to the Linguist List https://give.myiu.org/iu-bloomington/I320011968.html

LINGUIST List is supported by the following publishers:

Cambridge University Press http://www.cambridge.org/linguistics

De Gruyter Mouton https://cloud.newsletter.degruyter.com/mouton

Equinox Publishing Ltd http://www.equinoxpub.com/

John Benjamins http://www.benjamins.com/

Multilingual Matters http://www.multilingual-matters.com/

Narr Francke Attempto Verlag GmbH + Co. KG http://www.narr.de/

Wiley http://www.wiley.com

----------------------------------------------------------
LINGUIST List: Vol-35-727
----------------------------------------------------------