[Lingtyp] Folk definition of “word”

Mon Nov 29 03:37:17 UTC 2021

This is the topic of the next lecture in my Morphology class (and my
students are currently reading your 2011 paper, Martin), so thank you
everyone for this timely and interesting discussion.

I would like to look at your conclusion from a different perspective,
though: I agree that spaces may not directly tell us about word boundaries
in languages, but for another reason.

Japanese is a very interesting example because there are three (sub)scripts
working together: kanji (from Chinese characters) for most lexical items,
hiragana for function morphemes, and katakana for borrowings and
onomatopoeia. The first response from most people (i.e. students) learning
about this for the first time is that Japanese sounds hard to write, with
the impression that the system may be redundant. But remember that Japanese
does not use spaces. And it simply does not need to: kanji and hiragana
very clearly mark the morphosyntactic structure of a sentence, so it is
easy to skim and identify word boundaries, or at least equivalent
information to what word boundaries do for us. This is an extremely
efficient and transparent system, reflecting how Japanese words
grammatically, not just orthographically.

My suggestion then is to not look at when spaces are used in an
orthography, but to look at what different orthographies do instead of
spaces, or otherwise in a way that reflects specific morphosyntactic
properties of languages.

Looking back at the history of our familiar alphabet, Greek vowels were
basically an accident when adapting the Phoenician writing system to Greek.
Phoenician follows the well-known triliteral root system of Semitic
languages, and those languages typically use abjads (consonant-only writing
systems), and I have to assume that this is due to the central importance
of consonants over vowels in their morphology. This can be traced back to
the origins of this writing in Ancient Egyptian hieroglyphs, where a direct
iconic representation of a meaning shifted to take on a specific consonant
value, and this was codified for only consonants, with vowels unwritten.
That is still the case today in Arabic, Hebrew, etc. (Aside: I prefer to
think of so-called the "long vowels" as vowel-holding consonants, i.e.
semivowels, etc., similar to how /i/ and /j/ or /u/ and /w/ may be
represented with the same letter in alphabets, such as the letter "V" in
Latin.) Consonant-only writing isn't such an obvious fit for another kind
of language where the vowels are equally important morphologically. The
Arabic script has been adapted for a number of other languages, so I'm not
suggesting it is impossible or that it won't work, but that it probably
wouldn't arise naturally, and due to this borrowing, it probably doesn't
tell us much about the structure of those languages. (On the other hand,
the Arabic script might have been a good fit for Turkish given that vowel
harmony means there are few contrasts to represent within vowels, so that's
another topic to look at.)

We might also ask what the introduction of spaces can tell us about the
structure of European languages. Perhaps the highly fusional inflection of
Latin and Greek was in itself enough to signal word boundaries and
morphological structure in general given certain typical orthotactic(?)
forms were much more frequent than others (similar to Japanese hiragana
marking functional morphemes).

But I don't think that adopting the existing traditions of English/European
orthography to a new, previously unwritten language necessarily tells us
very much about the morphological structure of that language, because it
will more likely be heavily influenced by the norms of English. This might
be less likely in cases where the speakers of the language are illiterate
before writing their own language, rather than biliterate(?) with English
or another European language (or Indonesian, etc., following similar
conventions). Where they write spaces might give us some suggestions about
word boundaries, of course. But I think it is even more interesting to see
what non-alphabetic scripts can tell us about the languages that they
represent.

Unfortunately we don't have a substantial number of truly independent
writing systems around the world to really test these ideas, but it's
certainly interesting to think about. There are a few more relevant
examples, like how Chinese simply has no need for a word "word" because it
has almost exclusively monosyllabic morphemes and characters, as well as
some idiomatic combinations of them (i.e. compounds). That tells us
something about the morphological structure of Chinese too, I think.

Whether any of this is really about "wordhood" is not yet clear to me, but
I do think that different orthographic traditions can give insights into
morphological structure in general. One way of looking at it is that
orthographies are a kind of formal analysis for morphological structure,
and as we all know, analyses are informed by but do not determine
linguistic organization. So if we think about writers as linguists, that
may be helpful in this discussion. In fact, just like there are different
grammatical theories, it may be that different orthographies are different
theories of wordhood or similar levels of structure. If so, it may be that
"words" (in the European sense) are just one way of looking at languages,
and that they are an analysis, but not necessarily a fundamental part of
linguistic structure. Or more interestingly, it may be that different
languages have different units on part with "words", often reflected by
orthographic systems. This is also why it's so interesting to look at
proposals for writing signed languages, which introduce other kinds of
challenges. I think this is generally in line with the conclusions of your
2011 paper, Martin.

Last week I assigned my students a paper about questions of polysynthetic
wordhood in Cree and Dakota (https://doi.org/10.1075/cilt.174.08rus), and
the paper emphasized that speakers of these languages would often write
much shorter words (with spaces between them) than expected according to
the traditional polysynthetic analysis of linguists. But I am suspicious
that they may be constrained by what they expect written "words" to look
like due to familiarity with English, and I was left wondering, most
importantly, what an original, indigenous script for Cree or Dakota would
look like: what is the ideal way to write these languages, not how English
writing can be borrowed for them. (I should add that Cree is often written
in Canadian syllabics, but I think that is a general writing system, and
according to Wikipedia designed by a linguist, so it may have other biases.
But perhaps a syllabary has other advantages suitable for "polysynthetic"
languages, however their structure is best analyze-- one option is that
there are multiple word-like levels in their structure, rather than a
unique level, and in that case a syllabary seems like a nice compromise to
divide it into iterated units.)

Daniel

On Sun, Nov 28, 2021 at 8:29 AM Martin Haspelmath <
martin_haspelmath at eva.mpg.de> wrote:

> This is a really interesting thread! It still seems to me that the term
> "word" has a well-understood orthographic sense, but no well-understood
> general phonological or morphosyntactic sense. Writing is now almost
> universal, but it does appear that most unwritten languages did not have a
> word for 'word' (as opposed to 'speech' or 'what someone said').
>
> I agree with Ian that "the emergence of spaces is sufficient evidence of
> wordhood", in the sense of orthographic wordhood – because spaces define
> orthographic words.
>
> As the fascinating discussion of the history of reading has made clear,
> reading is by no means a straightforward or natural activity. It's more
> like riding a bike – extremely useful, but dependent on highly specific
> cultural traditions and practices.
>
> It may well be that orthographic spaces are primarily an autonomous device
> to facilitate reading, like punctuation, paragraphs, section headings, and
> typographical ascenders/descenders in Latin script – but with no direct
> relationship to anything in the spoken language. As our grammatical
> investigations began with written language (*gram-matica* originally
> means 'study of writing', cf. *graph-* 'write'), it is natural that it
> was based on the study of written language. *Sciptio continua* may simply
> be a bit harder to read than spaced writing (just as I find Cyrillic a bit
> harder to read than Latin, because there are fewer ascenders/descenders).
>
> So I'm not sure if we can presuppose that spaces between words tell us
> anything about non-written language structure.
>
> Best,
> Martin
>
> Am 26.11.21 um 11:54 schrieb JOO, Ian [Student]:
>
> Dear David,
>
> thank you for introducing your interesting paper which I’ll have a look
> into soon.
> But, I don’t think speakers not employing spaces necessarily indicates the
> absence of wordhood.
> In many traditional orthographies, there are no spaces at all: Thai,
> Tibetan, Khmer, Japanese, pre-modern Korean, etc.
> But that wouldn’t necessarily mean that Thai speakers don’t perceive words.
> Many orthographies only transcribe consonants - but that wouldn’t mean
> that the speakers don’t perceive vowels as phonological units.
> So I think the emergence of spaces is sufficient, but not necessary,
> evidence of wordhood.
>
> Regards,
> Ian
> On 26 Nov 2021, 6:45 PM +0800, David Gil <gil at shh.mpg.de> <gil at shh.mpg.de>,
> wrote:
>
> Following on Nikolaus' comment, it is also an experiment that is performed
> whenever speakers of an unwritten language decide to introduce an
> orthography for the first time:  Do they insert spaces, and if so where?
>
> I wrote about about this in Gil (2020), with reference to a naturalistic
> corpus of SMS messages in Riau Indonesian, produced in 2003, which was the
> year everybody in the village I was staying in got their first mobile
> phones and suddenly had to figure out how to write their language.  In the
> 2020 article, my focus was more on the presence or absence of evidence for
> bound morphology, and less on whether they introduce spaces in the first
> case. What I did not mention there, but which is most germane to Ian's
> query, is the latter question, whether they use spaces at all.  In fact, my
> corpus contains lots of messages that were written without spaces at all.
> Within a couple of years the orthography became more conventionalized, and
> everybody started using spaces, but to begin with, at least, it seemed like
> many speakers were not entertaining any (meta-)linguistic notion of 'word'
> whatsoever.
>
> (BTW, in Riau and many other dialects of Indonesian, the word for 'word',
> *kata*, also means 'say'.)
>
> David
>
> Gil, David (2020) "What Does It Mean to Be an Isolating Language? The Case
> of Riau Indonesian", in D. Gil and A. Schapper eds., *Austronesian
> Undressed: How and Why Languages Become Isolating*, John Benjamins,
> Amsterdam, 9-96.
>
>
> On 26/11/2021 12:11, Nikolaus P Himmelmann wrote:
>
> Hi
> On 26/11/2021 10:17, JOO, Ian [Student] wrote:
>
>
> The question would be, when one asks a speaker of a given language to
> divide a sentence into words, would the number of words be consistent
> throughout different speakers?
> It would be an interesting experiment. I’d be happy to be informed of any
> previous study who conducted such an experiment.
>
> Yes, indeed. And it is an experiment, though largely uncontrolled, that is
> carried out whenever someone carries out fieldwork on an undocumented lect.
> In this context, speakers provide evidence for word units in two ways: a)
> in elicitation when prompted by pointing or with a word from a contact
> language; b) when chunking a recording into chunks that can be written down
> by the researcher.
>
> In my experience, speakers across a given community are pretty consistent
> in both activities though one may distinguish two basic types speakers. One
> group provides word-like units, so when you ask for "stone" you get a
> minimal form for stone. The other primarily provides utterance-like units.
> So you do not get "stone" but rather "look at this stone", "how big the
> stone is", "stones for building ovens" or the like.
>
> Depending on the language, there is some variation in the units provided
> in both activities but this is typically restricted to the kind of
> phenomena that later on cause the main problems in the analytical
> reconstruction of a word unit, i.e. mostly phenomena that come under the
> broad term of "clitics". In my view, one should clearly distinguish between
> these analytical reconstructions, which are basic building blocks of
> grammatial descriptions, and the "natural" units provided by speakers,
> which are primary data providing the basis for the description.
>
> Best
>
> Nikolaus
>
>
>
> --
> David Gil
>
> Senior Scientist (Associate)
> Department of Linguistic and Cultural Evolution
> Max Planck Institute for Evolutionary Anthropology
> Deutscher Platz 6, Leipzig, 04103, Germany
>
> Email: gil at shh.mpg.de
> Mobile Phone (Israel): +972-526117713
> Mobile Phone (Indonesia): +62-81344082091
>
>
>
> *Disclaimer:*
>
> *This message (including any attachments) contains confidential
> information intended for a specific individual and purpose. If you are not
> the intended recipient, you should delete this message and notify the
> sender and The Hong Kong Polytechnic University (the University)
> immediately. Any disclosure, copying, or distribution of this message, or
> the taking of any action based on it, is strictly prohibited and may be
> unlawful.*
>
> *The University specifically denies any responsibility for the accuracy or
> quality of information obtained through University E-mail Facilities. Any
> views and opinions expressed are only those of the author(s) and do not
> necessarily represent those of the University and the University accepts no
> liability whatsoever for any losses or damages incurred or caused to any
> party as a result of the use of such information.*
>
> _______________________________________________
> Lingtyp mailing listLingtyp at listserv.linguistlist.orghttp://listserv.linguistlist.org/mailman/listinfo/lingtyp
>
>
> --
> Martin Haspelmath
> Max Planck Institute for Evolutionary Anthropology
> Deutscher Platz 6
> D-04103 Leipzighttps://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin-haspelmath/
>
> _______________________________________________
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
> http://listserv.linguistlist.org/mailman/listinfo/lingtyp
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20211128/bacd0d51/attachment.htm>