<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <div class="moz-cite-prefix">Dear Mattis,</div>

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix">I am afraid the requirement to use only

      one segmentation symbol and to accommodate information on the

      category of the elements so segmented in the gloss line is based

      on a multiple misunderstanding:</div>

    <div class="moz-cite-prefix">

      <ol>

        <li>Information of the kind "x is a prefix", "y is a proclitic",

          "z is a stem" concerns the structural category of items x - z.

          It has nothing to do with the semantics. To the extent that

          there is not 1:1 mapping of meaning onto structure in

          language, the same gloss (indicating the meaning of the item)

          is compatible with different kinds of structural units  and,

          thus, with different kinds of grammatical boundary.</li>

        <li>The interlinear morphological gloss is not meant to

          categorize the morphological elements being glossed. It is

          meant to identify each morph by a proper name which, in

          principle (and fortunately in most cases) indicates its

          meaning or function.</li>

        <li>If you want a categorization of the units of the line being

          annotated, you need more annotation layers. See Liebe &

          Drude 2000 (and

<a class="moz-txt-link-freetext" href="https://www.christianlehmann.eu/ling/ling_meth/ling_description/representations/gloss/index.php?open=class_member">https://www.christianlehmann.eu/ling/ling_meth/ling_description/representations/gloss/index.php?open=class_member</a>).

          Packing different categories of linguistic information into

          one gloss is theoretically inconsistent and not

          computationally practical.</li>

        <li>It is true that the boundary symbols are not asymmetric, so

          you cannot read off them which element of a pair is the affix,

          the clitic and so on. However, this information is contained

          in the gloss line in most cases: If the gloss of an element is

          in upper case or small caps, it is a grammatical element.

          Otherwise, it is a lexical element (a root or stem). If it is

          a grammatical element, then the '-' vs. '=' symbol tells you

          whether it is a clitic or affix. (We can work out the details

          for configurations where both components of a pair thus

          linked/separated are written in the same case.)<br>

        </li>

      </ol>

      Best,<br>

      <p>Christian</p>

      <p>Lieb, Hans-Heinrich & Drude, Sebastian 2000, <i>Advanced

          glossing: A language documentation format.</i> Berlin:

        Technische Universität (Working Papers).</p>

    </div>

    <blockquote type="cite"

      cite="mid:23c7c378-2f5e-1a77-3b26-9976111476e7@lingpy.org">

      <pre class="moz-quote-pre" wrap="">Dear all,

when working on pyigt, a Python package that handles interlinear-glossed

text in order to allow to represent it consistently in our machine- and

human-readable CLDF-format (<a class="moz-txt-link-freetext" href="https://cldf.clld.org">https://cldf.clld.org</a>) (draft here:

<a class="moz-txt-link-freetext" href="http://doi.org/10.17613/nppg-x393">http://doi.org/10.17613/nppg-x393</a>), we realized at some point that the

current practice of using symbols for the segmentation of words into

morphemes which ALSO have an inherent semantics that defines the

function of one of the elements that are separated with this very symbol

can be quite problematic, since it is often ambiguous, to which element

the semantics are attached.

The = vs. - symbols are a good example here, as a - b does not tell me

which is the affix, the a or the b. The plus, which we use as a standard

separator in our version of IGT in CLDF now is unproblematic, as it does

not provide different semantics to the elements it splits.

However, given these inconsistencies, it is now impossible to

consistently investigate larger collections of IGT that have been

published, since it is often not clear which element is what, unless the

relevant information is given in the gloss layer.

My recommendation is therefore to use one segmentation symbol only and

to mark the information of whether something is a clitic, a prefix, an

infix, etc., on the element itself, in the gloss. My argument is: if you

define the semantics in the glosses (and do this in a consistent way)

you don't need to think of whether it is a clitic, an affix, or

something else. So one can just get away with one symbol for the

segmentation, and still be much more explicit than we can often observe

in the current practice.

For those interested in the arguments, we had a discussion on github,

with Florian Matter, which we consider as resolved now:

<a class="moz-txt-link-freetext" href="https://github.com/cldf/pyigt/issues/6">https://github.com/cldf/pyigt/issues/6</a>

Best,

Mattis

On 30.05.20 15:45, Christian Lehmann wrote:

</pre>

      <blockquote type="cite">

        <pre class="moz-quote-pre" wrap="">Dear Sergey,

I understand there are at least two distinct problems there:

 1. How is the process which produces your Akzentkomposita to be

    categorized?

 2. Once we know the type of grammatical boundary separating/joining the

    two components in a unit, what is the standard boundary symbol for it?

Ad 1: From the examples that you provide, it does not appear that it is

a kind of compounding. (Consequently, I would not call the products

X-komposita, no matter what X is.) Still from your examples, it would

appear that the process is (some kind of) clisis. I take the liberty of

sending the link to my most recent article (just accepted for

publication), devoted to exactly this kind of problem:

<a class="moz-txt-link-freetext" href="https://www.christianlehmann.eu/publ/lehmann_univerbation.pdf">https://www.christianlehmann.eu/publ/lehmann_univerbation.pdf</a>

Ad 2: If it is clisis, the = symbol you are using is standard. If it

were compounding, you would use the + symbol. If it is a new kind of

process, with a new kind of grammatical boundary, we would have to

deploy another symbol. There are plenty of as yet unused symbols around;

how about ⧧ (Unicode 29e7) ?

Cheers,

Christian

-- 

Prof. em. Dr. Christian Lehmann

Rudolfstr. 4

99092 Erfurt

Deutschland

Tel.:   +49/361/2113417

E-Post:         <a class="moz-txt-link-abbreviated" href="mailto:christianw_lehmann@arcor.de">christianw_lehmann@arcor.de</a>

Web:    <a class="moz-txt-link-freetext" href="https://www.christianlehmann.eu">https://www.christianlehmann.eu</a>

_______________________________________________

Lingtyp mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Lingtyp@listserv.linguistlist.org">Lingtyp@listserv.linguistlist.org</a>

<a class="moz-txt-link-freetext" href="http://listserv.linguistlist.org/mailman/listinfo/lingtyp">http://listserv.linguistlist.org/mailman/listinfo/lingtyp</a>

</pre>

      </blockquote>

      <pre class="moz-quote-pre" wrap="">

---

Dr. Johann-Mattis List

Research Group Leader "Computer-Assisted Language Comparison"

Department of Linguistic and Cultural Evolution

Max Planck Institute for the Science of Human History

07745 Jena

<a class="moz-txt-link-freetext" href="https://lingulist.de">https://lingulist.de</a>

_______________________________________________

Lingtyp mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Lingtyp@listserv.linguistlist.org">Lingtyp@listserv.linguistlist.org</a>

<a class="moz-txt-link-freetext" href="http://listserv.linguistlist.org/mailman/listinfo/lingtyp">http://listserv.linguistlist.org/mailman/listinfo/lingtyp</a>

</pre>

    </blockquote>

    <p><br>

    </p>

    <div class="moz-signature">-- <br>

      <p style="font-size:90%">Prof. em. Dr. Christian Lehmann<br>

        Rudolfstr. 4<br>

        99092 Erfurt<br>

        <span style="font-variant:small-caps">Deutschland</span></p>

      <table style="font-size:80%">

        <tbody>

          <tr>

            <td>Tel.:</td>

            <td>+49/361/2113417</td>

          </tr>

          <tr>

            <td>E-Post:</td>

            <td><a class="moz-txt-link-abbreviated" href="mailto:christianw_lehmann@arcor.de">christianw_lehmann@arcor.de</a></td>

          </tr>

          <tr>

            <td>Web:</td>

            <td><a class="moz-txt-link-freetext" href="https://www.christianlehmann.eu">https://www.christianlehmann.eu</a></td>

          </tr>

        </tbody>

      </table>

    </div>

  </body>

</html>