[Lingtyp] orthography in formatted examples
Daniel W. Hieber
dwhieb at gmail.com
Wed Mar 25 17:54:15 UTC 2020
Hi all,
The representational issue raised by Christian arises because, historically and presently, authors use the first line of an interlinear gloss for different purposes, depending on the focus of that piece of writing or its audience.
Sometimes the first line is intended to be a (surface level) phonemic transcription. In this case, it makes sense that the transcription should include only graphemes that represent phonemes in that language, and no punctuation, capitals, etc. The orthography could be IPA, but it could also be any other phonemic orthography used for that language. As an example, the stylesheet for the International Journal of American Linguistics<http://www.americanlinguistics.org/wp-content/uploads/IJAL-interlinear.pdf> asks authors to use a phonemic transcription for the first line of interlinear examples.
Sometimes the first line of the interlinear gloss is intended to be a transcript. In this case, the transcription would include extra-phonemic characters such as <,.?!>, whether those characters are meant to indicate written conventions that group linguistic units logically, or prosodic conventions (like those used in Discourse Functional Transcription or Conversation Analysis) that group words by intonation unit or conversational turn, etc. When Christian and others refer to an orthographic representation, I believe they have in mind this notion of a transcript. For publications aimed at speakers, learners, or general audiences, it is most common for the first line to be a transcript, following the standard writing conventions of that language. Sometimes a second, purely phonemic transcription is included as a second line. Dianne Friesen’s A grammar of Moloko<https://langsci-press.org/catalog/book/118> (Language Science Press, 2017) is an example of a publication that uses a transcript for the first line of most interlinear examples.
More rarely, the first line is used for a phonetic transcription, though more commonly the phonetic transcription is given as a second line beneath the transcript or phonemic transcription. In this case, because the transcription is phonetic, it seems to me this transcription should always be in IPA, with no capitals or punctuation.
Frequently authors omit any initial transcription and instead present a morphemic analysis as the first line. While the morphemic analysis is obviously a kind of transcription, it is a transcription of morphemes rather than the entire utterance, and therefore may not represent all morphophonemic sound changes, allomorphy, etc. Writers differ as to whether the morphemic analysis line should include only underlying forms or allow allomorphs. (Christian’s guidance<https://christianlehmann.eu/ling/ling_meth/ling_description/grammaticography/gloss/index.php?open=allomorphy> is that allomorphy should not be represented in the morphemic analysis.) Interlinear examples that start with a morphemic analysis rather than a transcript are a particularly linguist-oriented representation. They are dispreferred by most speakers and learners because of their illegibility, and this practice is sometimes viewed as disrespectful (a point emphasized by Megan Lukaniec in her 2019 LSA talk as part of the Natives4Linguistics symposium<https://natives4linguistics.wordpress.com/natives4linguistics-and-lsa2019/>).
The resolution to the dilemma raised by Christian is to be explicit about the type of information you are conveying in each line, and follow representational principles most suited to that type, some suggestions for which I just briefly outlined above. Language Science Press’s style guide encourages this approach: As Sebastian mentioned, the LSP style guide gives authors the flexibility to choose which type of representation they are using for any particular example, and asks them to be consistent within that line. Authors may essentially choose between a transcript and a phonemic transcription for any given example.
Inconsistencies and conflicts arise when authors try to conflate these different purposes in a single line, or assume that this first line always has the same purpose, creating the illusion of a forced dichotomy between using a transcript, a phonemic transcription, or something else. But the interlinear gloss is a wonderfully flexible convention in that authors can adjust it to their purpose and audience. Imposing a standard whereby authors are required to include or exclude punctuation, would restrict the ability of authors to tailor their representations as appropriate.
Allowing for this flexibility does not entail that we should be satisfied with any random way of structuring interlinear examples. It’s entirely reasonable that editors impose formatting requirements; however, those requirements should be sensitive to the type of information being conveyed. For example, I suggested above that any line intended to be a phonetic representation must use IPA. Additionally, editors could require some types of lines and disprefer others, similar to the way that IJAL requires a phonemic transcription line.
So I agree with Christian, Jack, and Martin in that the “ban” on punctuation should be lifted, in the sense that editors should allow authors to use a transcript as the first line of interlinear examples where appropriate. In fact, I think it would be great if this were the default policy for most publications, so as to make our research maximally readable and accessible. If needed, that transcript line could then be supplemented with other representations.
For those interested, I created a specification for how to represent linguistic examples in a way that both follows traditional conventions but is nonetheless easily parseable by a script. In this specification I define the various types of lines that appear in interlinear glosses, some of which I've mentioned above. You can view the complete specification at the link below.
https://scription.digitallinguistics.io/
best,
Danny
Daniel W. Hieber
Ph.D. Candidate in Linguistics
University of California, Santa Barbara
danielhieber.com?<http://www.danielhieber.com>
________________________________
From: Lingtyp <lingtyp-bounces at listserv.linguistlist.org> on behalf of Nikolaus Himmelmann <n.himmelmann at uni-koeln.de>
Sent: Wednesday, March 25, 2020 12:43 PM
To: lingtyp at listserv.linguistlist.org <lingtyp at listserv.linguistlist.org>
Subject: Re: [Lingtyp] orthography in formatted examples
Dear colleagues
I believe that different punctuation conventions should be used in
accordance with the type of language a given example represents
(Christian is alluding to this possibility). Essentially, linguistic
examples in linguistics come from three different sources and ideally,
punctuation would be part of marking the distinction.
- originally written data (from a written source)
- transcribed data (from a recording of unscripted speech)
- invented data (including elicited data)
Representing orginally spoken data in the conventions of (European)
written languages ignores the fundamental differences that exist between
the two modalities and leads to many of the putatively untractable
problems such as defining "word".
Therefore, I make point of representing spoken language differently from
written and elicited/invented language. Not using punctuation and
capitalization but representing intonation units. Whether it is
worthwhile to invent a convention to distinguish the difference between
orginally written data and invented data is an issue that may merit
further debate.
Of course there are many additional special case problems such as
whether one should use punctuation in representing originally written
language where the original does not use punctuation.
Best regards
Nikolaus
On 25.03.2020 16:15, Peter Austin wrote:
> Dear colleagues
>
> It seems to me that part of what Christian is alluding to is a failure
> on the part of descriptive and documentary linguists to take
> transcription (and orthographic representation) seriously and to come to
> some agreements about how it should be handled in our various
> representational schema. The "Leipzig glossing rules" don't discuss it
> and this lacuna gives rise to conflicting practices, that Christian
> observes.
>
> Epigraphers have thought long and hard about this matter and I would
> recommend looking at their XML schema expressed in EpiDoc
> (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsourceforge.net%2Fp%2Fepidoc%2Fwiki%2FHome%2F&data=02%7C01%7C%7C9705c6c4605340b18c5908d7d0e417f7%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637207550379179827&sdata=QC6L0znF42soIuNdvmYxRtRnaaAfbvkLO6efLmQL87c%3D&reserved=0) to get some idea of how a
> whole field can approach the matter of transcription -- they don't only
> deal with punctuation, but also with things like "missing" elements,
> corrections, spacing etc. The Discourse Functional Transcription (DFT)
> mentioned by Jack Du Bois could be one basis for starting a proper
> discussion going and getting some basic agreements among researchers in
> place. They are sorely needed.
>
> Best wishes,
> Peter
>
>
> On Wed, 25 Mar 2020 at 14:59, Françoise Rose
> <francoise.rose at univ-lyon2.fr <mailto:francoise.rose at univ-lyon2.fr>> wrote:
>
> Dear all,____
>
> It seems most grammars of languages without a written tradition do
> use punctuation (although minimal) in the examples, if those are
> full sentences. Capitalization maybe less systematically, probably
> for the reason that Katharina has mentioned. “,” are important
> sometimes to get an idea of the prosody, and the syntactic
> structure, and I use “…” a lot to mark errors and hesitations.____
>
> I don’t see the problem of punctuation symbols being also used in
> the gloss line: in different lines, the same symbols have different
> meanings (and a different distribution anyway: “.” Is always used
> after a word (i.e. before a space) in the example line and within
> the gloss in a gloss line). The only problem me or my students have
> been confronted with is when the “-“ is used in the orthography. If
> in the gloss line, I usually replace it with “_”, as in
> “grand_père”. If in the example line, I don’t have an ideal
> solution.____
>
> Nice to have this discussion !____
>
> Keep safe,____
>
> Françoise____
>
> __ __
>
> __ __
>
> *De :*Lingtyp <lingtyp-bounces at listserv.linguistlist.org
> <mailto:lingtyp-bounces at listserv.linguistlist.org>> *De la part de*
> Christian Lehmann
> *Envoyé :* mercredi 25 mars 2020 12:15
> *À :* LINGTYP LINGTYP <LINGTYP at listserv.linguistlist.org
> <mailto:LINGTYP at listserv.linguistlist.org>>
> *Objet :* [Lingtyp] orthography in formatted examples____
>
> __ __
>
> Dear colleagues,____
>
> here is a little methodological problem which some may dismiss as
> trivial but which needs to be solved if we care for standardizing
> linguistic methodology. It concerns the orthographic representation
> of linguistic data, esp. such as are provided with an interlinear
> gloss.____
>
> In the past decades, it has become customary in linguistic
> publications to omit punctuation in data which are formatted as
> examples and provided by a gloss, like this:____
>
> __ __
>
> quo____
>
>
>
> usque____
>
>
>
> tandem____
>
>
>
> abutere____
>
>
>
> Catilina____
>
>
>
> patientia____
>
>
>
> nostra____
>
> whither____
>
>
>
> continually____
>
>
>
> finally____
>
>
>
> abuse:FUT:MID.2.SG <https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2FMID.2.SG&data=02%7C01%7C%7C9705c6c4605340b18c5908d7d0e417f7%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637207550379179827&sdata=aWELTBCLq5XkPu5Iv4FopO9Dgg6jkU9MmTGIZi%2BdyjY%3D&reserved=0>____
>
>
>
> Catilina:VOC.SG <https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2FVOC.SG&data=02%7C01%7C%7C9705c6c4605340b18c5908d7d0e417f7%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637207550379179827&sdata=lEyRg6eTdq2j6Sbh1w%2F%2FNd94JzTtAXOSgsWE666MzNg%3D&reserved=0>____
>
>
>
> patience(F):ABL.SG <https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2FABL.SG&data=02%7C01%7C%7C9705c6c4605340b18c5908d7d0e417f7%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637207550379179827&sdata=sjU5sroA13M%2FzvHS6cLE4foTTPVdRY8LtndA9s2RAM0%3D&reserved=0> ____
>
>
>
> our:F.ABL.SG <https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2FF.ABL.SG&data=02%7C01%7C%7C9705c6c4605340b18c5908d7d0e417f7%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637207550379179827&sdata=GMe3FHe6lX4esAWu%2FgPFt7aQ8kQPebjkAay6hjFK5gk%3D&reserved=0>____
>
> “ How far will you continue to abuse our patience, Catiline?” (Cic.
> /Cat/. I, 1)____
>
> The example is actually taken from a text; and there it is, of
> course, provided with initial capitalization, with commas in between
> and with a final question mark. Many of us have gotten accustomed to
> omitting these things in formatted examples. My own guidelines for
> interlinear glosses____
>
> (christianlehmann.eu/ling/ling_meth/ling_description/grammaticography/gloss/
> <https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fchristianlehmann.eu%2Fling%2Fling_meth%2Fling_description%2Fgrammaticography%2Fgloss%2F&data=02%7C01%7C%7C9705c6c4605340b18c5908d7d0e417f7%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637207550379179827&sdata=gezuUNHDFXccq%2FGzSd6y%2BZHQ6JvYqVGkZcjXuYIvzh4%3D&reserved=0>)
> ____
>
> also recommend the omission. The practice seems inevitable for a
> representation of a piece of text which is not in orthography but in
> some more formal representation, say phonetic or morphophonemic.
> Here I am talking about *orthographic representations*.____
>
> There are some reasons for the practice of omitting punctuation and
> sentence-initial capitalization in glossed examples:____
>
> __1.__These orthographic marks may not figure in the original
> source:____
>
> __a.__There is no published orthographic version which would need to
> be cited literally; it is just a transcription of a recording.
> Omission of punctuation signals this.____
>
> __b.__The quoted stretch of text is not (necessarily) a sentence, be
> it in its original context, be it in the language system.____
>
> __2.__These orthographic marks would confuse the mapping of symbols
> structuring the interlinear gloss onto the original text line:____
>
> __a.__Punctuation symbols like ‘.’, ‘:’ have a special function in
> glosses which they do not have in a fully orthographic text line.
> Others like ‘,’ and ‘!’ are inadmissible in the gloss. If such
> symbols appeared in the original text line, they would map on
> nothing in the gloss line.____
>
> __b.__Punctuation symbols like ‘-’ should have the same function in
> the original text and in the gloss.____
>
> (Ad (1b): We are not talking about examples which are just syntagmas
> below clause level. In some linguistic publications, such examples
> are provided with a final full stop, too. This is plainly
> unthinking.)____
>
> Here are some reasons for abandoning the ban on punctuation and
> initial capitalization:____
>
> __1.__It makes the language exemplified appear as one which lacks an
> orthography, thus dangerously evoking the attitude towards „an idiom
> which does not even have a grammar“.____
>
> __2.__Punctuation, of course, fulfills a sensible function in
> established orthographies: it reflects the syntactic or prosodic
> structure of a piece of text. Omitting it from an example renders
> this less easily intelligible.____
>
> __3.__Whenever a linguistic example is, in fact, quoted from a text
> noted in established orthography, the quotation should be faithful,
> including the punctuation.____
>
> __4.__Current practice allows for exceptions to the principle of
> suppression of punctuation: at least question marks are commonly
> set.____
>
> You may know of more reasons for or against the practice of
> suppression of punctuation and of initial capitalization in
> linguistic examples, or you may be able to invalidate some of the
> above. I would be grateful for some discussion which helps to bring
> this closer to a recommendation that most of us could share and that
> would have a chance to find its way into style sheets.____
>
> Christian____
>
> -- ____
>
> Prof. em. Dr. Christian Lehmann
> Rudolfstr. 4
> 99092 Erfurt
> Deutschland____
>
> Tel.:____
>
>
>
> +49/361/2113417____
>
> E-Post:____
>
>
>
> christianw_lehmann at arcor.de <mailto:christianw_lehmann at arcor.de>____
>
> Web:____
>
>
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.christianlehmann.eu&data=02%7C01%7C%7C9705c6c4605340b18c5908d7d0e417f7%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637207550379179827&sdata=MyJCFneClRSoXuknMPNfulI6%2FDlPxCXvsChU517G%2FuE%3D&reserved=0____
>
> __ __
>
> _______________________________________________
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
> <mailto:Lingtyp at listserv.linguistlist.org>
> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Flistserv.linguistlist.org%2Fmailman%2Flistinfo%2Flingtyp&data=02%7C01%7C%7C9705c6c4605340b18c5908d7d0e417f7%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637207550379179827&sdata=q4o%2BXQVNiJmvMmp%2BCYZamwcHkvEq2Sex4YC%2BeZHC7Aw%3D&reserved=0
>
>
>
> --
> Prof Peter K. Austin
> Humboldt Researcher, Frankfurt University (Nov 2019, Jan-March 2020)
> Emeritus Professor in Field Linguistics, SOAS
> Visiting Researcher, Oxford University
> Foundation Editor, EL Publishing
> Honorary Treasurer, Philological Society
>
> Department of Linguistics, SOAS
> Thornhaugh Street, Russell Square
> London WC1H 0XG
> United Kingdom
>
>
> _______________________________________________
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Flistserv.linguistlist.org%2Fmailman%2Flistinfo%2Flingtyp&data=02%7C01%7C%7C9705c6c4605340b18c5908d7d0e417f7%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637207550379179827&sdata=q4o%2BXQVNiJmvMmp%2BCYZamwcHkvEq2Sex4YC%2BeZHC7Aw%3D&reserved=0
>
_______________________________________________
Lingtyp mailing list
Lingtyp at listserv.linguistlist.org
https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Flistserv.linguistlist.org%2Fmailman%2Flistinfo%2Flingtyp&data=02%7C01%7C%7C9705c6c4605340b18c5908d7d0e417f7%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637207550379179827&sdata=q4o%2BXQVNiJmvMmp%2BCYZamwcHkvEq2Sex4YC%2BeZHC7Aw%3D&reserved=0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20200325/3eabfc65/attachment.htm>
More information about the Lingtyp
mailing list