<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>Dear Ian,<br>

      there are a few other existing projects of that sort, e.g. in

      dialectology (e.g. the 'Wenker questionnaires').</p>

    <p> I think such a project, if carried out today, should be based on

      a solid theoretical as well as methodological foundation. How do

      you represent sentence meanings and linguistic items expressing

      these meanings without creating a translation bias? By using

      multimodal stimuli perhaps? I do not think that glosses are

      appropriate representations of form-meaning mapping; they are good

      old items-and-arrangement morphology, which is known to be

      inappropriate for many languages; and edit distance is probably

      not a good way of measuring similarities between glosses, as has

      been pointed out.</p>

    <p>I sympathize with the idea of your project, and some of us have

      in fact been involved in projects of this type, as pointed out by

      Martin. My advice would be to think this through before you start

      gathering data, and to make sure that it meets state-of-the-art

      standards in theoretical and methodological terms. Linguistic

      structure is better represented in network models, not as linear

      sequences of morph(eme)s. Interestingly, this insight has been

      arrived at from two different angles independently, from a

      methodological one (e.g. in annotation practice) and from a

      theoretical point of view (see Holger Diessel's [2019] book 'The

      Grammar Network'). Note also that there have been recent advances

      in what we might call 'comparative NLP', with the UD Treebank as a

      prominent representative. You could get some inspiration from that

      angle, too (for instance, languages may exhibit similar types of

      dependency structures with different types of ordering relations).</p>

    <p>Best,<br>

      Volker</p>

    On 09/05/2021 11:24, Christian Lehmann wrote:<br>

    <blockquote type="cite"

      cite="mid:66990048-eb71-df8b-cf44-b8aa2fba6053@Uni-Erfurt.De">

      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

      Dear Ian,<br>

      <br>

      as Martin says, this can be a valuable project. Just a few

      observations on methodology:<br>

      <br>

      The method you envisage seems valid to the extent that the 50

      sentences that you choose are representative of what you want to

      compare - grammatical systems of languages, I assume.<br>

      <br>

      A list of sentences taken to be representative of a language

      system has been used in the Archivo de lenguas indígenas de

      México. A preliminary survey of what can be expected from this

      approach is provided in: <br>

      <br>

      Lastra, Yolanda 1993f, "El archivo de lenguas indígenas de

      México." <i>Boletín de Filología</i> 34:463-476.<br>

      <br>

      The list of sentences itself appears in each of the contributions

      to the series:<br>

      <br>

      <a class="moz-txt-link-freetext"

href="https://cell.colmex.mx/es/proyecto/archivo-de-lenguas-indigenas-de-mexico"

        moz-do-not-send="true">https://cell.colmex.mx/es/proyecto/archivo-de-lenguas-indigenas-de-mexico</a><br>

      <br>

      As some correspondents have observed, this method can be reliable

      only if you guarantee communicative equivalence of the sentences

      to the extent possible. This would be true a fortiori if a great

      weight were attributed to differences in constituent order.

      However, just as others have suggested, it would seem wise not to

      exaggerate this weight. Constituent order at the higher levels of

      syntax is among the most variable features of a grammar.<br>

      <br>

      A basic contribution to the requirement of guaranteeing

      communicative equivalence was made by Östen Dahl with his

      contextualized translation questionnaires. A sample of them is

      published in:<br>

      <br>

      Dahl, Östen (ed.) 2000, <i>Tense and aspect in the languages of

        Europe.</i> Berlin & New York: Mouton de Gruyter (Empirical

      Approaches to Language Typology, EUROTYP, 20-6).<br>

      <br>

      Good luck,<br>

      Christian<br>

      <br>

      <div class="moz-signature">-- <br>

        <p style="font-size:90%">Prof. em. Dr. Christian Lehmann<br>

          Rudolfstr. 4<br>

          99092 Erfurt<br>

          <span style="font-variant:small-caps">Deutschland</span></p>

        <table style="font-size:80%">

          <tbody>

            <tr>

              <td>Tel.:</td>

              <td>+49/361/2113417</td>

            </tr>

            <tr>

              <td>E-Post:</td>

              <td><a class="moz-txt-link-abbreviated"

                  href="mailto:christianw_lehmann@arcor.de"

                  moz-do-not-send="true">christianw_lehmann@arcor.de</a></td>

            </tr>

            <tr>

              <td>Web:</td>

              <td><a class="moz-txt-link-freetext"

                  href="https://www.christianlehmann.eu"

                  moz-do-not-send="true">https://www.christianlehmann.eu</a></td>

            </tr>

          </tbody>

        </table>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <pre class="moz-quote-pre" wrap="">_______________________________________________

Lingtyp mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Lingtyp@listserv.linguistlist.org">Lingtyp@listserv.linguistlist.org</a>

<a class="moz-txt-link-freetext" href="http://listserv.linguistlist.org/mailman/listinfo/lingtyp">http://listserv.linguistlist.org/mailman/listinfo/lingtyp</a>

</pre>

    </blockquote>

  </body>

</html>