[Lingtyp] (no subject)

Sat May 30 14:55:13 UTC 2020

Dear Mattis,

I am afraid the requirement to use only one segmentation symbol and to 
accommodate information on the category of the elements so segmented in 
the gloss line is based on a multiple misunderstanding:

 1. Information of the kind "x is a prefix", "y is a proclitic", "z is a
    stem" concerns the structural category of items x - z. It has
    nothing to do with the semantics. To the extent that there is not
    1:1 mapping of meaning onto structure in language, the same gloss
    (indicating the meaning of the item) is compatible with different
    kinds of structural units  and, thus, with different kinds of
    grammatical boundary.
 2. The interlinear morphological gloss is not meant to categorize the
    morphological elements being glossed. It is meant to identify each
    morph by a proper name which, in principle (and fortunately in most
    cases) indicates its meaning or function.
 3. If you want a categorization of the units of the line being
    annotated, you need more annotation layers. See Liebe & Drude 2000
    (and
    https://www.christianlehmann.eu/ling/ling_meth/ling_description/representations/gloss/index.php?open=class_member).
    Packing different categories of linguistic information into one
    gloss is theoretically inconsistent and not computationally practical.
 4. It is true that the boundary symbols are not asymmetric, so you
    cannot read off them which element of a pair is the affix, the
    clitic and so on. However, this information is contained in the
    gloss line in most cases: If the gloss of an element is in upper
    case or small caps, it is a grammatical element. Otherwise, it is a
    lexical element (a root or stem). If it is a grammatical element,
    then the '-' vs. '=' symbol tells you whether it is a clitic or
    affix. (We can work out the details for configurations where both
    components of a pair thus linked/separated are written in the same
    case.)

Best,

Christian

Lieb, Hans-Heinrich & Drude, Sebastian 2000, /Advanced glossing: A 
language documentation format./ Berlin: Technische Universität (Working 
Papers).

> Dear all,
>
> when working on pyigt, a Python package that handles interlinear-glossed
> text in order to allow to represent it consistently in our machine- and
> human-readable CLDF-format (https://cldf.clld.org) (draft here:
> http://doi.org/10.17613/nppg-x393), we realized at some point that the
> current practice of using symbols for the segmentation of words into
> morphemes which ALSO have an inherent semantics that defines the
> function of one of the elements that are separated with this very symbol
> can be quite problematic, since it is often ambiguous, to which element
> the semantics are attached.
>
> The = vs. - symbols are a good example here, as a - b does not tell me
> which is the affix, the a or the b. The plus, which we use as a standard
> separator in our version of IGT in CLDF now is unproblematic, as it does
> not provide different semantics to the elements it splits.
>
> However, given these inconsistencies, it is now impossible to
> consistently investigate larger collections of IGT that have been
> published, since it is often not clear which element is what, unless the
> relevant information is given in the gloss layer.
>
> My recommendation is therefore to use one segmentation symbol only and
> to mark the information of whether something is a clitic, a prefix, an
> infix, etc., on the element itself, in the gloss. My argument is: if you
> define the semantics in the glosses (and do this in a consistent way)
> you don't need to think of whether it is a clitic, an affix, or
> something else. So one can just get away with one symbol for the
> segmentation, and still be much more explicit than we can often observe
> in the current practice.
>
> For those interested in the arguments, we had a discussion on github,
> with Florian Matter, which we consider as resolved now:
> https://github.com/cldf/pyigt/issues/6
>
> Best,
>
> Mattis
>
>
> On 30.05.20 15:45, Christian Lehmann wrote:
>> Dear Sergey,
>>
>> I understand there are at least two distinct problems there:
>>
>>   1. How is the process which produces your Akzentkomposita to be
>>      categorized?
>>   2. Once we know the type of grammatical boundary separating/joining the
>>      two components in a unit, what is the standard boundary symbol for it?
>>
>> Ad 1: From the examples that you provide, it does not appear that it is
>> a kind of compounding. (Consequently, I would not call the products
>> X-komposita, no matter what X is.) Still from your examples, it would
>> appear that the process is (some kind of) clisis. I take the liberty of
>> sending the link to my most recent article (just accepted for
>> publication), devoted to exactly this kind of problem:
>>
>> https://www.christianlehmann.eu/publ/lehmann_univerbation.pdf
>>
>> Ad 2: If it is clisis, the = symbol you are using is standard. If it
>> were compounding, you would use the + symbol. If it is a new kind of
>> process, with a new kind of grammatical boundary, we would have to
>> deploy another symbol. There are plenty of as yet unused symbols around;
>> how about ⧧ (Unicode 29e7) ?
>>
>> Cheers,
>>
>> Christian
>>
>> -- 
>>
>> Prof. em. Dr. Christian Lehmann
>> Rudolfstr. 4
>> 99092 Erfurt
>> Deutschland
>>
>> Tel.: 	+49/361/2113417
>> E-Post: 	christianw_lehmann at arcor.de
>> Web: 	https://www.christianlehmann.eu
>>
>>
>> _______________________________________________
>> Lingtyp mailing list
>> Lingtyp at listserv.linguistlist.org
>> http://listserv.linguistlist.org/mailman/listinfo/lingtyp
>>
> ---
> Dr. Johann-Mattis List
> Research Group Leader "Computer-Assisted Language Comparison"
> Department of Linguistic and Cultural Evolution
> Max Planck Institute for the Science of Human History
> 07745 Jena
> https://lingulist.de
> _______________________________________________
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
> http://listserv.linguistlist.org/mailman/listinfo/lingtyp

-- 

Prof. em. Dr. Christian Lehmann
Rudolfstr. 4
99092 Erfurt
Deutschland

Tel.: 	+49/361/2113417
E-Post: 	christianw_lehmann at arcor.de
Web: 	https://www.christianlehmann.eu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20200530/26cfa207/attachment.htm>