[Lingtyp] Folk definition of “word”

Mon Nov 29 03:39:00 UTC 2021

By the way, a simple exercise, and a fun one for students, is to look at
how dictionaries can be organized for languages with different
orthographies and different types of morphological structure.

On Sun, Nov 28, 2021 at 7:37 PM Daniel Ross <djross3 at gmail.com> wrote:

> This is the topic of the next lecture in my Morphology class (and my
> students are currently reading your 2011 paper, Martin), so thank you
> everyone for this timely and interesting discussion.
>
> I would like to look at your conclusion from a different perspective,
> though: I agree that spaces may not directly tell us about word boundaries
> in languages, but for another reason.
>
> Japanese is a very interesting example because there are three
> (sub)scripts working together: kanji (from Chinese characters) for most
> lexical items, hiragana for function morphemes, and katakana for borrowings
> and onomatopoeia. The first response from most people (i.e. students)
> learning about this for the first time is that Japanese sounds hard to
> write, with the impression that the system may be redundant. But remember
> that Japanese does not use spaces. And it simply does not need to: kanji
> and hiragana very clearly mark the morphosyntactic structure of a sentence,
> so it is easy to skim and identify word boundaries, or at least equivalent
> information to what word boundaries do for us. This is an extremely
> efficient and transparent system, reflecting how Japanese words
> grammatically, not just orthographically.
>
> My suggestion then is to not look at when spaces are used in an
> orthography, but to look at what different orthographies do instead of
> spaces, or otherwise in a way that reflects specific morphosyntactic
> properties of languages.
>
> Looking back at the history of our familiar alphabet, Greek vowels were
> basically an accident when adapting the Phoenician writing system to Greek.
> Phoenician follows the well-known triliteral root system of Semitic
> languages, and those languages typically use abjads (consonant-only writing
> systems), and I have to assume that this is due to the central importance
> of consonants over vowels in their morphology. This can be traced back to
> the origins of this writing in Ancient Egyptian hieroglyphs, where a direct
> iconic representation of a meaning shifted to take on a specific consonant
> value, and this was codified for only consonants, with vowels unwritten.
> That is still the case today in Arabic, Hebrew, etc. (Aside: I prefer to
> think of so-called the "long vowels" as vowel-holding consonants, i.e.
> semivowels, etc., similar to how /i/ and /j/ or /u/ and /w/ may be
> represented with the same letter in alphabets, such as the letter "V" in
> Latin.) Consonant-only writing isn't such an obvious fit for another kind
> of language where the vowels are equally important morphologically. The
> Arabic script has been adapted for a number of other languages, so I'm not
> suggesting it is impossible or that it won't work, but that it probably
> wouldn't arise naturally, and due to this borrowing, it probably doesn't
> tell us much about the structure of those languages. (On the other hand,
> the Arabic script might have been a good fit for Turkish given that vowel
> harmony means there are few contrasts to represent within vowels, so that's
> another topic to look at.)
>
> We might also ask what the introduction of spaces can tell us about the
> structure of European languages. Perhaps the highly fusional inflection of
> Latin and Greek was in itself enough to signal word boundaries and
> morphological structure in general given certain typical orthotactic(?)
> forms were much more frequent than others (similar to Japanese hiragana
> marking functional morphemes).
>
> But I don't think that adopting the existing traditions of
> English/European orthography to a new, previously unwritten language
> necessarily tells us very much about the morphological structure of that
> language, because it will more likely be heavily influenced by the norms of
> English. This might be less likely in cases where the speakers of the
> language are illiterate before writing their own language, rather than
> biliterate(?) with English or another European language (or Indonesian,
> etc., following similar conventions). Where they write spaces might give us
> some suggestions about word boundaries, of course. But I think it is even
> more interesting to see what non-alphabetic scripts can tell us about the
> languages that they represent.
>
> Unfortunately we don't have a substantial number of truly independent
> writing systems around the world to really test these ideas, but it's
> certainly interesting to think about. There are a few more relevant
> examples, like how Chinese simply has no need for a word "word" because it
> has almost exclusively monosyllabic morphemes and characters, as well as
> some idiomatic combinations of them (i.e. compounds). That tells us
> something about the morphological structure of Chinese too, I think.
>
> Whether any of this is really about "wordhood" is not yet clear to me, but
> I do think that different orthographic traditions can give insights into
> morphological structure in general. One way of looking at it is that
> orthographies are a kind of formal analysis for morphological structure,
> and as we all know, analyses are informed by but do not determine
> linguistic organization. So if we think about writers as linguists, that
> may be helpful in this discussion. In fact, just like there are different
> grammatical theories, it may be that different orthographies are different
> theories of wordhood or similar levels of structure. If so, it may be that
> "words" (in the European sense) are just one way of looking at languages,
> and that they are an analysis, but not necessarily a fundamental part of
> linguistic structure. Or more interestingly, it may be that different
> languages have different units on part with "words", often reflected by
> orthographic systems. This is also why it's so interesting to look at
> proposals for writing signed languages, which introduce other kinds of
> challenges. I think this is generally in line with the conclusions of your
> 2011 paper, Martin.
>
> Last week I assigned my students a paper about questions of polysynthetic
> wordhood in Cree and Dakota (https://doi.org/10.1075/cilt.174.08rus), and
> the paper emphasized that speakers of these languages would often write
> much shorter words (with spaces between them) than expected according to
> the traditional polysynthetic analysis of linguists. But I am suspicious
> that they may be constrained by what they expect written "words" to look
> like due to familiarity with English, and I was left wondering, most
> importantly, what an original, indigenous script for Cree or Dakota would
> look like: what is the ideal way to write these languages, not how English
> writing can be borrowed for them. (I should add that Cree is often written
> in Canadian syllabics, but I think that is a general writing system, and
> according to Wikipedia designed by a linguist, so it may have other biases.
> But perhaps a syllabary has other advantages suitable for "polysynthetic"
> languages, however their structure is best analyze-- one option is that
> there are multiple word-like levels in their structure, rather than a
> unique level, and in that case a syllabary seems like a nice compromise to
> divide it into iterated units.)
>
> Daniel
>
> On Sun, Nov 28, 2021 at 8:29 AM Martin Haspelmath <
> martin_haspelmath at eva.mpg.de> wrote:
>
>> This is a really interesting thread! It still seems to me that the term
>> "word" has a well-understood orthographic sense, but no well-understood
>> general phonological or morphosyntactic sense. Writing is now almost
>> universal, but it does appear that most unwritten languages did not have a
>> word for 'word' (as opposed to 'speech' or 'what someone said').
>>
>> I agree with Ian that "the emergence of spaces is sufficient evidence of
>> wordhood", in the sense of orthographic wordhood – because spaces define
>> orthographic words.
>>
>> As the fascinating discussion of the history of reading has made clear,
>> reading is by no means a straightforward or natural activity. It's more
>> like riding a bike – extremely useful, but dependent on highly specific
>> cultural traditions and practices.
>>
>> It may well be that orthographic spaces are primarily an autonomous
>> device to facilitate reading, like punctuation, paragraphs, section
>> headings, and typographical ascenders/descenders in Latin script – but with
>> no direct relationship to anything in the spoken language. As our
>> grammatical investigations began with written language (*gram-matica*
>> originally means 'study of writing', cf. *graph-* 'write'), it is
>> natural that it was based on the study of written language. *Sciptio
>> continua* may simply be a bit harder to read than spaced writing (just
>> as I find Cyrillic a bit harder to read than Latin, because there are fewer
>> ascenders/descenders).
>>
>> So I'm not sure if we can presuppose that spaces between words tell us
>> anything about non-written language structure.
>>
>> Best,
>> Martin
>>
>> Am 26.11.21 um 11:54 schrieb JOO, Ian [Student]:
>>
>> Dear David,
>>
>> thank you for introducing your interesting paper which I’ll have a look
>> into soon.
>> But, I don’t think speakers not employing spaces necessarily indicates
>> the absence of wordhood.
>> In many traditional orthographies, there are no spaces at all: Thai,
>> Tibetan, Khmer, Japanese, pre-modern Korean, etc.
>> But that wouldn’t necessarily mean that Thai speakers don’t perceive
>> words.
>> Many orthographies only transcribe consonants - but that wouldn’t mean
>> that the speakers don’t perceive vowels as phonological units.
>> So I think the emergence of spaces is sufficient, but not necessary,
>> evidence of wordhood.
>>
>> Regards,
>> Ian
>> On 26 Nov 2021, 6:45 PM +0800, David Gil <gil at shh.mpg.de>
>> <gil at shh.mpg.de>, wrote:
>>
>> Following on Nikolaus' comment, it is also an experiment that is
>> performed whenever speakers of an unwritten language decide to introduce an
>> orthography for the first time:  Do they insert spaces, and if so where?
>>
>> I wrote about about this in Gil (2020), with reference to a naturalistic
>> corpus of SMS messages in Riau Indonesian, produced in 2003, which was the
>> year everybody in the village I was staying in got their first mobile
>> phones and suddenly had to figure out how to write their language.  In the
>> 2020 article, my focus was more on the presence or absence of evidence for
>> bound morphology, and less on whether they introduce spaces in the first
>> case. What I did not mention there, but which is most germane to Ian's
>> query, is the latter question, whether they use spaces at all.  In fact, my
>> corpus contains lots of messages that were written without spaces at all.
>> Within a couple of years the orthography became more conventionalized, and
>> everybody started using spaces, but to begin with, at least, it seemed like
>> many speakers were not entertaining any (meta-)linguistic notion of 'word'
>> whatsoever.
>>
>> (BTW, in Riau and many other dialects of Indonesian, the word for 'word',
>> *kata*, also means 'say'.)
>>
>> David
>>
>> Gil, David (2020) "What Does It Mean to Be an Isolating Language? The
>> Case of Riau Indonesian", in D. Gil and A. Schapper eds., *Austronesian
>> Undressed: How and Why Languages Become Isolating*, John Benjamins,
>> Amsterdam, 9-96.
>>
>>
>> On 26/11/2021 12:11, Nikolaus P Himmelmann wrote:
>>
>> Hi
>> On 26/11/2021 10:17, JOO, Ian [Student] wrote:
>>
>>
>> The question would be, when one asks a speaker of a given language to
>> divide a sentence into words, would the number of words be consistent
>> throughout different speakers?
>> It would be an interesting experiment. I’d be happy to be informed of any
>> previous study who conducted such an experiment.
>>
>> Yes, indeed. And it is an experiment, though largely uncontrolled, that
>> is carried out whenever someone carries out fieldwork on an undocumented
>> lect. In this context, speakers provide evidence for word units in two
>> ways: a) in elicitation when prompted by pointing or with a word from a
>> contact language; b) when chunking a recording into chunks that can be
>> written down by the researcher.
>>
>> In my experience, speakers across a given community are pretty consistent
>> in both activities though one may distinguish two basic types speakers. One
>> group provides word-like units, so when you ask for "stone" you get a
>> minimal form for stone. The other primarily provides utterance-like units.
>> So you do not get "stone" but rather "look at this stone", "how big the
>> stone is", "stones for building ovens" or the like.
>>
>> Depending on the language, there is some variation in the units provided
>> in both activities but this is typically restricted to the kind of
>> phenomena that later on cause the main problems in the analytical
>> reconstruction of a word unit, i.e. mostly phenomena that come under the
>> broad term of "clitics". In my view, one should clearly distinguish between
>> these analytical reconstructions, which are basic building blocks of
>> grammatial descriptions, and the "natural" units provided by speakers,
>> which are primary data providing the basis for the description.
>>
>> Best
>>
>> Nikolaus
>>
>>
>>
>> --
>> David Gil
>>
>> Senior Scientist (Associate)
>> Department of Linguistic and Cultural Evolution
>> Max Planck Institute for Evolutionary Anthropology
>> Deutscher Platz 6, Leipzig, 04103, Germany
>>
>> Email: gil at shh.mpg.de
>> Mobile Phone (Israel): +972-526117713
>> Mobile Phone (Indonesia): +62-81344082091
>>
>>
>>
>> *Disclaimer:*
>>
>> *This message (including any attachments) contains confidential
>> information intended for a specific individual and purpose. If you are not
>> the intended recipient, you should delete this message and notify the
>> sender and The Hong Kong Polytechnic University (the University)
>> immediately. Any disclosure, copying, or distribution of this message, or
>> the taking of any action based on it, is strictly prohibited and may be
>> unlawful.*
>>
>> *The University specifically denies any responsibility for the accuracy
>> or quality of information obtained through University E-mail Facilities.
>> Any views and opinions expressed are only those of the author(s) and do not
>> necessarily represent those of the University and the University accepts no
>> liability whatsoever for any losses or damages incurred or caused to any
>> party as a result of the use of such information.*
>>
>> _______________________________________________
>> Lingtyp mailing listLingtyp at listserv.linguistlist.orghttp://listserv.linguistlist.org/mailman/listinfo/lingtyp
>>
>>
>> --
>> Martin Haspelmath
>> Max Planck Institute for Evolutionary Anthropology
>> Deutscher Platz 6
>> D-04103 Leipzighttps://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/martin-haspelmath/
>>
>> _______________________________________________
>> Lingtyp mailing list
>> Lingtyp at listserv.linguistlist.org
>> http://listserv.linguistlist.org/mailman/listinfo/lingtyp
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20211128/9de4710d/attachment.htm>