[Lingtyp] (no subject)

Mattis List mattis.list at lingpy.org
Sat May 30 15:55:50 UTC 2020

Dear Christian,

you are completely right, and I was expressing myself wrongly: an
additional annotation line is even better than mixing information in the
glosses. What I still find problematic is that information on the
direction of the boundary symbols is only implicitly encoded in another
level of the annotation, as you correctly say. Even if this holds for
the most cases, as you say, it gives me an unpleasant feeling, since it
means that we deliberately choose a system of annotation of which we
know it may create ambiguity, although we know we could easily do better.

Thanks for the clarification and all the best,


On 30.05.20 16:55, Christian Lehmann wrote:
> Dear Mattis,
> I am afraid the requirement to use only one segmentation symbol and to
> accommodate information on the category of the elements so segmented in
> the gloss line is based on a multiple misunderstanding:
>  1. Information of the kind "x is a prefix", "y is a proclitic", "z is a
>     stem" concerns the structural category of items x - z. It has
>     nothing to do with the semantics. To the extent that there is not
>     1:1 mapping of meaning onto structure in language, the same gloss
>     (indicating the meaning of the item) is compatible with different
>     kinds of structural units  and, thus, with different kinds of
>     grammatical boundary.
>  2. The interlinear morphological gloss is not meant to categorize the
>     morphological elements being glossed. It is meant to identify each
>     morph by a proper name which, in principle (and fortunately in most
>     cases) indicates its meaning or function.
>  3. If you want a categorization of the units of the line being
>     annotated, you need more annotation layers. See Liebe & Drude 2000
>     (and
>     https://www.christianlehmann.eu/ling/ling_meth/ling_description/representations/gloss/index.php?open=class_member).
>     Packing different categories of linguistic information into one
>     gloss is theoretically inconsistent and not computationally practical.
>  4. It is true that the boundary symbols are not asymmetric, so you
>     cannot read off them which element of a pair is the affix, the
>     clitic and so on. However, this information is contained in the
>     gloss line in most cases: If the gloss of an element is in upper
>     case or small caps, it is a grammatical element. Otherwise, it is a
>     lexical element (a root or stem). If it is a grammatical element,
>     then the '-' vs. '=' symbol tells you whether it is a clitic or
>     affix. (We can work out the details for configurations where both
>     components of a pair thus linked/separated are written in the same
>     case.)
> Best,
> Christian
> Lieb, Hans-Heinrich & Drude, Sebastian 2000, /Advanced glossing: A
> language documentation format./ Berlin: Technische Universität (Working
> Papers).
>> Dear all,
>> when working on pyigt, a Python package that handles interlinear-glossed
>> text in order to allow to represent it consistently in our machine- and
>> human-readable CLDF-format (https://cldf.clld.org) (draft here:
>> http://doi.org/10.17613/nppg-x393), we realized at some point that the
>> current practice of using symbols for the segmentation of words into
>> morphemes which ALSO have an inherent semantics that defines the
>> function of one of the elements that are separated with this very symbol
>> can be quite problematic, since it is often ambiguous, to which element
>> the semantics are attached.
>> The = vs. - symbols are a good example here, as a - b does not tell me
>> which is the affix, the a or the b. The plus, which we use as a standard
>> separator in our version of IGT in CLDF now is unproblematic, as it does
>> not provide different semantics to the elements it splits.
>> However, given these inconsistencies, it is now impossible to
>> consistently investigate larger collections of IGT that have been
>> published, since it is often not clear which element is what, unless the
>> relevant information is given in the gloss layer.
>> My recommendation is therefore to use one segmentation symbol only and
>> to mark the information of whether something is a clitic, a prefix, an
>> infix, etc., on the element itself, in the gloss. My argument is: if you
>> define the semantics in the glosses (and do this in a consistent way)
>> you don't need to think of whether it is a clitic, an affix, or
>> something else. So one can just get away with one symbol for the
>> segmentation, and still be much more explicit than we can often observe
>> in the current practice.
>> For those interested in the arguments, we had a discussion on github,
>> with Florian Matter, which we consider as resolved now:
>> https://github.com/cldf/pyigt/issues/6
>> Best,
>> Mattis
>> On 30.05.20 15:45, Christian Lehmann wrote:
>>> Dear Sergey,
>>> I understand there are at least two distinct problems there:
>>>  1. How is the process which produces your Akzentkomposita to be
>>>     categorized?
>>>  2. Once we know the type of grammatical boundary separating/joining the
>>>     two components in a unit, what is the standard boundary symbol for it?
>>> Ad 1: From the examples that you provide, it does not appear that it is
>>> a kind of compounding. (Consequently, I would not call the products
>>> X-komposita, no matter what X is.) Still from your examples, it would
>>> appear that the process is (some kind of) clisis. I take the liberty of
>>> sending the link to my most recent article (just accepted for
>>> publication), devoted to exactly this kind of problem:
>>> https://www.christianlehmann.eu/publ/lehmann_univerbation.pdf
>>> Ad 2: If it is clisis, the = symbol you are using is standard. If it
>>> were compounding, you would use the + symbol. If it is a new kind of
>>> process, with a new kind of grammatical boundary, we would have to
>>> deploy another symbol. There are plenty of as yet unused symbols around;
>>> how about ⧧ (Unicode 29e7) ?
>>> Cheers,
>>> Christian
>>> -- 
>>> Prof. em. Dr. Christian Lehmann
>>> Rudolfstr. 4
>>> 99092 Erfurt
>>> Deutschland
>>> Tel.: 	+49/361/2113417
>>> E-Post: 	christianw_lehmann at arcor.de
>>> Web: 	https://www.christianlehmann.eu
>>> _______________________________________________
>>> Lingtyp mailing list
>>> Lingtyp at listserv.linguistlist.org
>>> http://listserv.linguistlist.org/mailman/listinfo/lingtyp
>> ---
>> Dr. Johann-Mattis List
>> Research Group Leader "Computer-Assisted Language Comparison"
>> Department of Linguistic and Cultural Evolution
>> Max Planck Institute for the Science of Human History
>> 07745 Jena
>> https://lingulist.de
>> _______________________________________________
>> Lingtyp mailing list
>> Lingtyp at listserv.linguistlist.org
>> http://listserv.linguistlist.org/mailman/listinfo/lingtyp
> -- 
> Prof. em. Dr. Christian Lehmann
> Rudolfstr. 4
> 99092 Erfurt
> Deutschland
> Tel.: 	+49/361/2113417
> E-Post: 	christianw_lehmann at arcor.de
> Web: 	https://www.christianlehmann.eu
> _______________________________________________
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
> http://listserv.linguistlist.org/mailman/listinfo/lingtyp

More information about the Lingtyp mailing list