<div dir="auto">I agree with David that "utterance" is far from a trivial unit to reliably identify from continuous discourse.<div dir="auto"><br></div><div dir="auto">Far more reliable is the intonation unit. Intonation unit boundaries are universally and reliably recognizable based on prosodic cues alone, even in a language you don't know. See Himmelmann, and Troiani.) Intonation units thus have a well-defined beginning, middle, and end (unlike the other units).</div><div dir="auto"><br></div><div dir="auto">New work by Giorgia Toiani and myself on Kazakh presents a solid methodology for confirming inter-transcriber reliability for intonation units. </div><div dir="auto"><br></div><div dir="auto">Further, new work by Ryan Ka Yau Lai and myself on English shows in detail what it means for a word to be initial, medial or final in the intonation unit. (We'll present this next month at LSA.) This has consequences for typological concepts like so-called "sentence-final particle", some of which are probably actually intonation-unit-final. </div><div dir="auto"><br></div><div dir="auto">We are also developing a prosodic operationalization of the utterance, based on a sequence of 1 or more intonation units.</div><div dir="auto"><div dir="auto"><br></div><div dir="auto">In addition to the very interesting work by Kibrik and colleagues on basic discourse units, there is equally interesting work along the same lines by Liesbeth Degand and her students. These 2 initiatives are closer to operationalizing something like "utterance" in a meaningful way. Still, it's not clear if the reliability of these units can reach the level of the intonation unit, nor of the prosodically-defined utterance.</div><div dir="auto">Best,</div><div dir="auto">John<br><br><div data-smartmail="gmail_signature" dir="auto">==============================<br>John W. Du Bois<br>Professor of Linguistics <br>University of California, Santa Barbara<br>Santa Barbara, California 93106<br>USA<br><a href="mailto:dubois@ucsb.edu">dubois@ucsb.edu</a></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Dec 15, 2022, 1:36 AM David Gil <<a href="mailto:gil@shh.mpg.de">gil@shh.mpg.de</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Ian, and everybody,<br>

<br>

My impression is that the notion of "utterance" is every bit as <br>

problematical as that of "word" — though it seems like there as not been <br>

as much discussion about utterances as there has been about words.<br>

<br>

I was particularly struck by the lack of clarity of the notion of <br>

utterance when developing our Max Planck Institute naturalistic corpora <br>

in Jakarta.  When transcribing our naturalistic data, our goal was to <br>

enter each utterance into a separate field in our database; however, we <br>

had no clear set of principles how to parse a continuous say hour-long <br>

text into such utterances.  While for many purposes it didn't really <br>

matter, for some it most clearly did.  Ian's proposed generalizations <br>

might be a case in point, but the case that struck me as most cogent was <br>

in the field of 1st language acquisition, for which we compiled a large <br>

corpus.  In child language studies, a central role is played by the <br>

notion of MLU, or Mean Length of Utterance, so obviously we wanted to <br>

examine our data in terms of MLU.  But it was patently clear that our <br>

parsings into utterances were arbitrary and problematical in many ways, <br>

which got me to wondering whether this was due to our own ignorance, or <br>

alternatively a more general problem that should perhaps be addressed.  <br>

I must confess I haven't thought much about this recently, but I'm now <br>

wondering:  Are there any go-to references on how to parse a text into <br>

utterances, or is this indeed a lacuna that still needs to be filled?<br>

<br>

David<br>

<br>

On 15/12/2022 07:31, Ian Joo wrote:<br>

> Dear typologists,<br>

><br>

> many grammars employ the terms “word-initial”, “word-final”, and “word-medial”, without specifying what a “word” is.<br>

> And, as we have discussed earlier, there is no consensus on what a “word” is, or whether it is a cross-linguistically valid concept.<br>

> But can we at least agree that the following concepts are universal: “utterance-initial”, “utterance-final”, and “utterance-medial”?<br>

> As all human utterances are finite (signed or spoken), the corollary is that there is a beginning, the ending, and phases in between.<br>

> For example, instead of saying that “a lect does not allow /r/ word-initially”, can we say that it does not allow /r/ utterance-initially?<br>

> Would it save us from the conceptual ambiguity of woordhood?<br>

><br>

>  From Hong Kong,<br>

> Ian<br>

> _______________________________________________<br>

> Lingtyp mailing list<br>

> <a href="mailto:Lingtyp@listserv.linguistlist.org" target="_blank" rel="noreferrer">Lingtyp@listserv.linguistlist.org</a><br>

> <a href="https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp" rel="noreferrer noreferrer" target="_blank">https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp</a><br>

<br>

-- <br>

David Gil<br>

<br>

Senior Scientist (Associate)<br>

Department of Linguistic and Cultural Evolution<br>

Max Planck Institute for Evolutionary Anthropology<br>

Deutscher Platz 6, Leipzig, 04103, Germany<br>

<br>

Email: <a href="mailto:gil@shh.mpg.de" target="_blank" rel="noreferrer">gil@shh.mpg.de</a><br>

Mobile Phone (Israel): +972-526117713<br>

Mobile Phone (Indonesia): +62-082113720302<br>

<br>

_______________________________________________<br>

Lingtyp mailing list<br>

<a href="mailto:Lingtyp@listserv.linguistlist.org" target="_blank" rel="noreferrer">Lingtyp@listserv.linguistlist.org</a><br>

<a href="https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp" rel="noreferrer noreferrer" target="_blank">https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp</a><br>

</blockquote></div>