[Lingtyp] (no subject)

Sat May 30 14:11:00 UTC 2020

Dear all,

when working on pyigt, a Python package that handles interlinear-glossed
text in order to allow to represent it consistently in our machine- and
human-readable CLDF-format (https://cldf.clld.org) (draft here:
http://doi.org/10.17613/nppg-x393), we realized at some point that the
current practice of using symbols for the segmentation of words into
morphemes which ALSO have an inherent semantics that defines the
function of one of the elements that are separated with this very symbol
can be quite problematic, since it is often ambiguous, to which element
the semantics are attached.

The = vs. - symbols are a good example here, as a - b does not tell me
which is the affix, the a or the b. The plus, which we use as a standard
separator in our version of IGT in CLDF now is unproblematic, as it does
not provide different semantics to the elements it splits.

However, given these inconsistencies, it is now impossible to
consistently investigate larger collections of IGT that have been
published, since it is often not clear which element is what, unless the
relevant information is given in the gloss layer.

My recommendation is therefore to use one segmentation symbol only and
to mark the information of whether something is a clitic, a prefix, an
infix, etc., on the element itself, in the gloss. My argument is: if you
define the semantics in the glosses (and do this in a consistent way)
you don't need to think of whether it is a clitic, an affix, or
something else. So one can just get away with one symbol for the
segmentation, and still be much more explicit than we can often observe
in the current practice.

For those interested in the arguments, we had a discussion on github,
with Florian Matter, which we consider as resolved now:
https://github.com/cldf/pyigt/issues/6

Best,

Mattis

On 30.05.20 15:45, Christian Lehmann wrote:
> Dear Sergey,
> 
> I understand there are at least two distinct problems there:
> 
>  1. How is the process which produces your Akzentkomposita to be
>     categorized?
>  2. Once we know the type of grammatical boundary separating/joining the
>     two components in a unit, what is the standard boundary symbol for it?
> 
> Ad 1: From the examples that you provide, it does not appear that it is
> a kind of compounding. (Consequently, I would not call the products
> X-komposita, no matter what X is.) Still from your examples, it would
> appear that the process is (some kind of) clisis. I take the liberty of
> sending the link to my most recent article (just accepted for
> publication), devoted to exactly this kind of problem:
> 
> https://www.christianlehmann.eu/publ/lehmann_univerbation.pdf
> 
> Ad 2: If it is clisis, the = symbol you are using is standard. If it
> were compounding, you would use the + symbol. If it is a new kind of
> process, with a new kind of grammatical boundary, we would have to
> deploy another symbol. There are plenty of as yet unused symbols around;
> how about ⧧ (Unicode 29e7) ?
> 
> Cheers,
> 
> Christian
> 
> -- 
> 
> Prof. em. Dr. Christian Lehmann
> Rudolfstr. 4
> 99092 Erfurt
> Deutschland
> 
> Tel.: 	+49/361/2113417
> E-Post: 	christianw_lehmann at arcor.de
> Web: 	https://www.christianlehmann.eu
> 
> 
> _______________________________________________
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
> http://listserv.linguistlist.org/mailman/listinfo/lingtyp
> 

---
Dr. Johann-Mattis List
Research Group Leader "Computer-Assisted Language Comparison"
Department of Linguistic and Cultural Evolution
Max Planck Institute for the Science of Human History
07745 Jena
https://lingulist.de