<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p>Dear all,</p>

    <p>I have followed the discussion on this thread with interest. Let

      me ask you, would any of what you discuss and suggest here also

      apply to Interlinear Glossed Data?<br>

    </p>

    <p>Sebastian talked about making  "typological research more

      replicable". A related issue is reproducible research in

      linguists. I guess a good starting point for whatever we do as

      linguists is to keep things<br>

    </p>

    <div class="moz-forward-container">

      <p>transparent, and to give public access to data collections.

        Especially for languages with little to no public resources

        (except for what one finds in articles), this seems essential.<br>

      </p>

      <p>Here is an example of what I have in mind:  We just released 41

        Interlinear Glossed Texts in Akan. The data can be downloaded as

        XML from:</p>

      <p><a class="moz-txt-link-freetext" href="https://typecraft.org/tc2wiki/The_TypeCraft_Akan_Corpus">https://typecraft.org/tc2wiki/The_TypeCraft_Akan_Corpus</a><br>

      </p>

      The corpus is described on the download page, and also in the

      notes contained in the download. (Note that we can offer the

      material in several other formats.) <br>

      <br>

      <br>

      Dorothee <br>

      <br>

      <font color="#999999" size="-1">Professor Dorothee Beermann, PhD<br>

        Norwegian University of Science and Technology (NTNU)<br>

        Dept. of Language and Literature<br>

        Surface mail to: NO-7491 Trondheim, Norway/Norge<br>

        <br>

        Visit: Building 4, level 5, room 4512, Dragvoll,<br>

        E-mail:  <a class="moz-txt-link-abbreviated" href="mailto:dorothee.beermann@ntnu.no">dorothee.beermann@ntnu.no</a><br>

        <br>

        Homepage:<a class="moz-txt-link-freetext" href="http://www.ntnu.no/ansatte/dorothee.beermann">http://www.ntnu.no/ansatte/dorothee.beermann</a><br>

        TypeCraft:<a class="moz-txt-link-freetext" href="http://typecraft.org/tc2wiki/User:Dorothee_Beermann">http://typecraft.org/tc2wiki/User:Dorothee_Beermann</a><br>

      </font><br>

      <br>

      <br>

      <br>

      <br>

      -------- Forwarded Message --------

      <table class="moz-email-headers-table" border="0" cellspacing="0"

        cellpadding="0">

        <tbody>

          <tr>

            <th valign="BASELINE" align="RIGHT" nowrap="nowrap">Subject:

            </th>

            <td>Re: [Lingtyp] Empirical standards in typology:

              incentives</td>

          </tr>

          <tr>

            <th valign="BASELINE" align="RIGHT" nowrap="nowrap">Date: </th>

            <td>Fri, 23 Mar 2018 11:59:18 +1100</td>

          </tr>

          <tr>

            <th valign="BASELINE" align="RIGHT" nowrap="nowrap">From: </th>

            <td>Hedvig Skirgård <a class="moz-txt-link-rfc2396E" href="mailto:hedvig.skirgard@gmail.com"><hedvig.skirgard@gmail.com></a></td>

          </tr>

          <tr>

            <th valign="BASELINE" align="RIGHT" nowrap="nowrap">To: </th>

            <td>Johanna NICHOLS <a class="moz-txt-link-rfc2396E" href="mailto:johanna@berkeley.edu"><johanna@berkeley.edu></a></td>

          </tr>

          <tr>

            <th valign="BASELINE" align="RIGHT" nowrap="nowrap">CC: </th>

            <td>Linguistic Typology

              <a class="moz-txt-link-rfc2396E" href="mailto:lingtyp@listserv.linguistlist.org"><lingtyp@listserv.linguistlist.org></a></td>

          </tr>

        </tbody>

      </table>

      <br>

      <br>

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

      <div dir="ltr">Dear all, 

        <div><br>

        </div>

        <div>I think Sebastian's suggestion is very good. </div>

        <div><br>

        </div>

        <div>Is this something LT would consider, Masja?</div>

        <div><br>

        </div>

        <div>Johanna's point is good as well, but it shouldn't matter

          for Sebastian's suggestion as I understand it. We're not being

          asked to submit the coding criteria prior to the survey being

          completed, but only at the time of publication. There are

          initiatives in STEM that encourages research teams to submit

          what they're planning to do prior to doing if (to avoid

          biases), but that's not baked into what Sebastian is

          suggestion, from what I can tell.</div>

        <div><br>

        </div>

        <div>I would also add a 4 star category which includes

          inter-coderreliabiity tests, i.e. the original author(s) have

          given different people the same instructions and tested how

          often they do the same thing with the same grammar.</div>

        <div><br>

        </div>

        <div>/Hedvig</div>

      </div>

      <div class="gmail_extra"><br clear="all">

        <div>

          <div class="gmail_signature" data-smartmail="gmail_signature">

            <div dir="ltr">

              <div>

                <div dir="ltr">

                  <div dir="ltr">

                    <div dir="ltr">

                      <div dir="ltr">

                        <div dir="ltr">

                          <div dir="ltr">

                            <div dir="ltr">

                              <div dir="ltr">

                                <div dir="ltr">

                                  <div dir="ltr">

                                    <p style="margin:0cm 0cm

                                      0.0001pt;font-size:11pt;font-family:Calibri,sans-serif"><span

                                        style="font-size:9pt"><b><br>

                                        </b></span></p>

                                    <p style="margin:0cm 0cm 0.0001pt"><font

                                        size="2" face="arial, helvetica,

                                        sans-serif"><b>Med vänliga

                                          hälsningar</b><b>,</b><br>

                                      </font></p>

                                    <p style="margin:0cm 0cm 0.0001pt"><b><font

                                          size="2" face="arial,

                                          helvetica, sans-serif">Hedvig

                                          Skirgård</font></b></p>

                                    <p style="margin:0cm 0cm 0.0001pt"><br>

                                    </p>

                                    <p style="margin:0cm 0cm 0.0001pt"><font

                                        size="1"><span

                                          style="font-family:verdana,sans-serif;color:rgb(0,0,0)">PhD

                                          Candidate</span><br>

                                      </font></p>

                                    <p

style="color:rgb(0,0,0);font-family:Verdana,Helvetica,Arial,sans-serif;margin:0cm

                                      0cm 0.0001pt"><span

                                        style="font-family:verdana,sans-serif"><font

                                          size="1">The Wellsprings of

                                          Linguistic Diversity</font></span></p>

                                    <p

style="color:rgb(0,0,0);font-family:Verdana,Helvetica,Arial,sans-serif;margin:0cm

                                      0cm 0.0001pt"><font size="1"

                                        face="verdana, sans-serif">ARC

                                        Centre of Excellence for the

                                        Dynamics of Language</font></p>

                                    <p

style="color:rgb(0,0,0);font-family:Verdana,Helvetica,Arial,sans-serif;margin:0cm

                                      0cm 0.0001pt"><font size="1"

                                        face="verdana, sans-serif">School

                                        of Culture, History and Language<br>

                                        College of Asia and the Pacific</font></p>

                                    <p

style="color:rgb(0,0,0);font-family:Verdana,Helvetica,Arial,sans-serif;margin:0cm

                                      0cm 0.0001pt"><font size="1"

                                        face="verdana, sans-serif">The

                                        Australian National University</font></p>

                                    <p style="margin:0cm 0cm 0.0001pt"><font

                                        color="#666666" size="1"

                                        face="arial, helvetica,

                                        sans-serif"><a

                                          href="https://sites.google.com/site/hedvigskirgard/"

                                          target="_blank"

                                          moz-do-not-send="true">Website</a><br>

                                      </font></p>

                                    <div><br>

                                    </div>

                                    <p style="margin:0cm 0cm 0.0001pt"><br>

                                    </p>

                                  </div>

                                </div>

                              </div>

                            </div>

                          </div>

                        </div>

                      </div>

                    </div>

                  </div>

                </div>

              </div>

            </div>

          </div>

        </div>

        <br>

        <div class="gmail_quote">2018-03-23 0:49 GMT+11:00 Johanna

          NICHOLS <span dir="ltr"><<a

              href="mailto:johanna@berkeley.edu" target="_blank"

              moz-do-not-send="true">johanna@berkeley.edu</a>></span>:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div dir="ltr">

              <div>What's in the codebook -- the coding categories and

                the criteria?  That much is usually in the body of the

                paper.<br>

                <br>

              </div>

              <div>Also, a minor but I think important point: 

                Ordinarily the codebook doesn't in fact chronologically

                precede the spreadsheet.  A draft or early version of it

                does, and that gets revised many times as you run into

                new and unexpected things.  (And every previous entry in

                the spreadsheet gets checked and edited too.)  By the

                time you've finished your survey the categories and

                typology can look different from what you started with. 

                You publish when you're comfortably past the point of

                diminishing returns.  In most sciences this is bad

                method, but in linguistics it's common and I'd say

                normal.  The capacity to handle it needs to be built

                into the method in advance.  <br>

              </div>

              <span class="HOEnZb"><font color="#888888">

                  <div><br>

                  </div>

                  Johanna<br>

                </font></span></div>

            <div class="HOEnZb">

              <div class="h5">

                <div class="gmail_extra"><br>

                  <div class="gmail_quote">On Thu, Mar 22, 2018 at 2:10

                    PM, Sebastian Nordhoff <span dir="ltr"><<a

                        href="mailto:sebastian.nordhoff@glottotopia.de"

                        target="_blank" moz-do-not-send="true">sebastian.nordhoff@<wbr>glottotopia.de</a>></span>

                    wrote:<br>

                    <blockquote class="gmail_quote" style="margin:0 0 0

                      .8ex;border-left:1px #ccc solid;padding-left:1ex">Dear

                      all,<br>

                      taking up a thread from last November, I would

                      like to start a<br>

                      discussion about how to make typological research

                      more replicable, where<br>

                      replicable means "less dependent on the original

                      researcher". This<br>

                      includes coding decisions, tabular data,

                      quantitative analyses etc.<br>

                      <br>

                      Volker Gast wrote (full quote at bottom of mail):<br>

                      > Let's assume that self-annotation cannot be

                      avoided for financial<br>

                      > reasons. What about establishing a standard

                      saying that, for instance,<br>

                      > when you submit a quantitative-typological

                      paper to LT you have to<br>

                      > provide the data in such a way that the

                      coding decisions are made<br>

                      > sufficiently transparent for readers to see

                      if they can go along with<br>

                      > the argument?<br>

                      <br>

                      I see two possibilities for that: Option 1:

                      editors will refuse papers<br>

                      which do not adhere to this standard. That will

                      not work in my view.<br>

                      What might work (Option 2) is a star/badge system.

                      I could imagine the<br>

                      following:<br>

                      <br>

                      - no stars: only standard bibliographical

                      references<br>

                      - *         raw tabular data (spreadsheet)

                      available as a supplement<br>

                      - **        as above, + code book available as a

                      supplement<br>

                      - ***       as above, + computer code in R or

                      similar available<br>

                      <br>

                      For a three-star article, an unrelated researcher

                      could then take the<br>

                      original grammars and the code book and replicate

                      the spreadsheet to see<br>

                      if it matches. They could then run the computer

                      code to see if they<br>

                      arrive at the same results.<br>

                      <br>

                      This will not be practical for every research

                      project, but some might<br>

                      find it easier than others, and, in the long run,

                      it will require good<br>

                      arguments to submit a 0-star (i.e. non-replicable)

                      quantitative article.<br>

                      <br>

                      Any thoughts?<br>

                      Sebastian<br>

                      <br>

                      PS: Note that the codebook would actually

                      chronologically precede the<br>

                      spreadsheet, but I fill that spreadsheets are more

                      easily available than<br>

                      codebooks, so in order to keep the entry barrier

                      low, this order is<br>

                      reversed for the stars.<br>

                      <br>

                    </blockquote>

                  </div>

                </div>

              </div>

            </div>

            <br>

          </blockquote>

        </div>

        <br>

      </div>

    </div>

  </body>

</html>