[Lingtyp] A list of 50 basic sentences

Sun May 9 09:55:17 UTC 2021

Dear Ian,
there are a few other existing projects of that sort, e.g. in 
dialectology (e.g. the 'Wenker questionnaires').

I think such a project, if carried out today, should be based on a solid 
theoretical as well as methodological foundation. How do you represent 
sentence meanings and linguistic items expressing these meanings without 
creating a translation bias? By using multimodal stimuli perhaps? I do 
not think that glosses are appropriate representations of form-meaning 
mapping; they are good old items-and-arrangement morphology, which is 
known to be inappropriate for many languages; and edit distance is 
probably not a good way of measuring similarities between glosses, as 
has been pointed out.

I sympathize with the idea of your project, and some of us have in fact 
been involved in projects of this type, as pointed out by Martin. My 
advice would be to think this through before you start gathering data, 
and to make sure that it meets state-of-the-art standards in theoretical 
and methodological terms. Linguistic structure is better represented in 
network models, not as linear sequences of morph(eme)s. Interestingly, 
this insight has been arrived at from two different angles 
independently, from a methodological one (e.g. in annotation practice) 
and from a theoretical point of view (see Holger Diessel's [2019] book 
'The Grammar Network'). Note also that there have been recent advances 
in what we might call 'comparative NLP', with the UD Treebank as a 
prominent representative. You could get some inspiration from that 
angle, too (for instance, languages may exhibit similar types of 
dependency structures with different types of ordering relations).

Best,
Volker

On 09/05/2021 11:24, Christian Lehmann wrote:
> Dear Ian,
>
> as Martin says, this can be a valuable project. Just a few 
> observations on methodology:
>
> The method you envisage seems valid to the extent that the 50 
> sentences that you choose are representative of what you want to 
> compare - grammatical systems of languages, I assume.
>
> A list of sentences taken to be representative of a language system 
> has been used in the Archivo de lenguas indígenas de México. A 
> preliminary survey of what can be expected from this approach is 
> provided in:
>
> Lastra, Yolanda 1993f, "El archivo de lenguas indígenas de México." 
> /Boletín de Filología/ 34:463-476.
>
> The list of sentences itself appears in each of the contributions to 
> the series:
>
> https://cell.colmex.mx/es/proyecto/archivo-de-lenguas-indigenas-de-mexico
>
> As some correspondents have observed, this method can be reliable only 
> if you guarantee communicative equivalence of the sentences to the 
> extent possible. This would be true a fortiori if a great weight were 
> attributed to differences in constituent order. However, just as 
> others have suggested, it would seem wise not to exaggerate this 
> weight. Constituent order at the higher levels of syntax is among the 
> most variable features of a grammar.
>
> A basic contribution to the requirement of guaranteeing communicative 
> equivalence was made by Östen Dahl with his contextualized translation 
> questionnaires. A sample of them is published in:
>
> Dahl, Östen (ed.) 2000, /Tense and aspect in the languages of Europe./ 
> Berlin & New York: Mouton de Gruyter (Empirical Approaches to Language 
> Typology, EUROTYP, 20-6).
>
> Good luck,
> Christian
>
> -- 
>
> Prof. em. Dr. Christian Lehmann
> Rudolfstr. 4
> 99092 Erfurt
> Deutschland
>
> Tel.: 	+49/361/2113417
> E-Post: 	christianw_lehmann at arcor.de
> Web: 	https://www.christianlehmann.eu
>
>
> _______________________________________________
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
> http://listserv.linguistlist.org/mailman/listinfo/lingtyp
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20210509/11d95184/attachment.htm>