<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Dear Ian,<br>
there are a few other existing projects of that sort, e.g. in
dialectology (e.g. the 'Wenker questionnaires').</p>
<p> I think such a project, if carried out today, should be based on
a solid theoretical as well as methodological foundation. How do
you represent sentence meanings and linguistic items expressing
these meanings without creating a translation bias? By using
multimodal stimuli perhaps? I do not think that glosses are
appropriate representations of form-meaning mapping; they are good
old items-and-arrangement morphology, which is known to be
inappropriate for many languages; and edit distance is probably
not a good way of measuring similarities between glosses, as has
been pointed out.</p>
<p>I sympathize with the idea of your project, and some of us have
in fact been involved in projects of this type, as pointed out by
Martin. My advice would be to think this through before you start
gathering data, and to make sure that it meets state-of-the-art
standards in theoretical and methodological terms. Linguistic
structure is better represented in network models, not as linear
sequences of morph(eme)s. Interestingly, this insight has been
arrived at from two different angles independently, from a
methodological one (e.g. in annotation practice) and from a
theoretical point of view (see Holger Diessel's [2019] book 'The
Grammar Network'). Note also that there have been recent advances
in what we might call 'comparative NLP', with the UD Treebank as a
prominent representative. You could get some inspiration from that
angle, too (for instance, languages may exhibit similar types of
dependency structures with different types of ordering relations).</p>
<p>Best,<br>
Volker</p>
On 09/05/2021 11:24, Christian Lehmann wrote:<br>
<blockquote type="cite"
cite="mid:66990048-eb71-df8b-cf44-b8aa2fba6053@Uni-Erfurt.De">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
Dear Ian,<br>
<br>
as Martin says, this can be a valuable project. Just a few
observations on methodology:<br>
<br>
The method you envisage seems valid to the extent that the 50
sentences that you choose are representative of what you want to
compare - grammatical systems of languages, I assume.<br>
<br>
A list of sentences taken to be representative of a language
system has been used in the Archivo de lenguas indígenas de
México. A preliminary survey of what can be expected from this
approach is provided in: <br>
<br>
Lastra, Yolanda 1993f, "El archivo de lenguas indígenas de
México." <i>Boletín de Filología</i> 34:463-476.<br>
<br>
The list of sentences itself appears in each of the contributions
to the series:<br>
<br>
<a class="moz-txt-link-freetext"
href="https://cell.colmex.mx/es/proyecto/archivo-de-lenguas-indigenas-de-mexico"
moz-do-not-send="true">https://cell.colmex.mx/es/proyecto/archivo-de-lenguas-indigenas-de-mexico</a><br>
<br>
As some correspondents have observed, this method can be reliable
only if you guarantee communicative equivalence of the sentences
to the extent possible. This would be true a fortiori if a great
weight were attributed to differences in constituent order.
However, just as others have suggested, it would seem wise not to
exaggerate this weight. Constituent order at the higher levels of
syntax is among the most variable features of a grammar.<br>
<br>
A basic contribution to the requirement of guaranteeing
communicative equivalence was made by Östen Dahl with his
contextualized translation questionnaires. A sample of them is
published in:<br>
<br>
Dahl, Östen (ed.) 2000, <i>Tense and aspect in the languages of
Europe.</i> Berlin & New York: Mouton de Gruyter (Empirical
Approaches to Language Typology, EUROTYP, 20-6).<br>
<br>
Good luck,<br>
Christian<br>
<br>
<div class="moz-signature">-- <br>
<p style="font-size:90%">Prof. em. Dr. Christian Lehmann<br>
Rudolfstr. 4<br>
99092 Erfurt<br>
<span style="font-variant:small-caps">Deutschland</span></p>
<table style="font-size:80%">
<tbody>
<tr>
<td>Tel.:</td>
<td>+49/361/2113417</td>
</tr>
<tr>
<td>E-Post:</td>
<td><a class="moz-txt-link-abbreviated"
href="mailto:christianw_lehmann@arcor.de"
moz-do-not-send="true">christianw_lehmann@arcor.de</a></td>
</tr>
<tr>
<td>Web:</td>
<td><a class="moz-txt-link-freetext"
href="https://www.christianlehmann.eu"
moz-do-not-send="true">https://www.christianlehmann.eu</a></td>
</tr>
</tbody>
</table>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
Lingtyp mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Lingtyp@listserv.linguistlist.org">Lingtyp@listserv.linguistlist.org</a>
<a class="moz-txt-link-freetext" href="http://listserv.linguistlist.org/mailman/listinfo/lingtyp">http://listserv.linguistlist.org/mailman/listinfo/lingtyp</a>
</pre>
</blockquote>
</body>
</html>