<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    Dear Dorothee,<br>

    thanks for the pointers. I tried to validate the Akan corpus against

    the schema, but only succeeded after tweaking the PhraseType

    specification in the schema a bit (basically making "globaltags" and

    "word" elements optional).<br>

    <br>

    Then I took a stab at converting it to CLDF, which was fairly easy

    (using your typecraft_python package). The details of this

    conversion are here:

    <a class="moz-txt-link-freetext" href="https://github.com/cldf/cookbook/tree/master/recipes/igt">https://github.com/cldf/cookbook/tree/master/recipes/igt</a><br>

    I think different formats for rather loosely defined things like IGT

    make sense. The idea of CLDF in this respect is to specify only the

    better understood aspects of such datatypes (basically anything that

    can be used automatically) - whereas projects like TypeCraft (or

    XIGT) presumably aim at being able to model and store as much of IGT

    (whatever that means) as possible.<br>

    <br>

    I should note that CLDF also makes it easier to encode metadata in a

    machine-readable way, by piggy-backing on Linked Data: E.g. the

    license information you give for the Akan corpus could be specified

    via<br>

    "dc:license": <a class="moz-txt-link-rfc2396E" href="https://creativecommons.org/licenses/by-nc/4.0/">"https://creativecommons.org/licenses/by-nc/4.0/"</a><br>

    <br>

    Btw.: The DOI (10.13140/RG.2.2.14614.86088) you give for the Akan

    corpus doesn't resolve anymore, but leads here<br>

    <a class="moz-txt-link-freetext" href="https://www.researchgate.net/doi/removed">https://www.researchgate.net/doi/removed</a><br>

    <br>

    best,<br>

    robert<br>

    <br>

    <br>

    <div class="moz-cite-prefix">On 04.04.2018 20:36, Dorothee Beermann

      wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:76b213f6-854b-ebce-1e9f-f2c3dee0858c@ntnu.no">

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

      <p>Dear Robert,</p>

      <p>Thanks for the feedback. Our <span class="Y0NH2b CLPzrc">XML

          schema definition you find here<b>:</b></span> <a

          class="moz-txt-link-freetext"

          href="https://typecraft.org/typecraft.xsd"

          moz-do-not-send="true">https://typecraft.org/typecraft.xsd <span

            class="Y0NH2b CLPzrc"><br>

          </span></a></p>

      <p><span class="Y0NH2b CLPzrc">We started the development of our

          IGT-XML (TC-XML) in 2006/7,  at that time XIGT was not around

          yet. It was first presented in 2014, as far as I recall. <br>

        </span></p>

      <p>The most common IGT type is the basic three-line interlinear

        format, a format that can also be exported from TypeCraft.  Our

        Akan data is  part of speech tagged in addition.  The TypeCraft

        editor allows for annotations on several tiers which is also

        reflected in our XML.  <br>

      </p>

      <p>I agree with you; its is a good idea to also offer a CSV

        format. We do not do that at the moment, although it is an

        option, since we work with a PostgreSQL database.</p>

      <p>Best,</p>

      <p>Dorothee<br>

      </p>

      <p><br>

      </p>

      <div class="moz-cite-prefix">On 04. april 2018 11:19, Robert

        Forkel wrote:<br>

      </div>

      <blockquote type="cite"

        cite="mid:1afc924d-5b7f-2bef-ccf1-4dc917b349a1@shh.mpg.de"> Dear

        Dorothee,<br>

        I just had a brief look at the Akan corpus. I'd be curious what

        guided your decision to come up with a custom XML based export

        format. The namespace URL <br>

        <pre id="line1"><span><a class="attribute-value" moz-do-not-send="true">http://typecraft.org/typecraft</a></span></pre>

        doesn't seem to resolve, so I guess there is no schema defining

        the XML, right? We included (very basic) support for IGT in CLDF

        (see <a class="moz-txt-link-freetext"

          href="https://github.com/cldf/cldf/tree/master/components/examples"

          moz-do-not-send="true">https://github.com/cldf/cldf/tree/master/components/examples</a>),

        because<br>

        - the examples we found in databases like WALS could be modeled

        in this simplistic form and<br>

        - CSV is better suited for tools like version control than XML<br>

        - we wanted to have IGT data available in the same format

        framework as other linguistic data to make links between data

        homogenous.<br>

        <br>

        We also discussed other IGT formats (see <a

          class="moz-txt-link-freetext"

          href="https://github.com/cldf/cldf/issues/10"

          moz-do-not-send="true">https://github.com/cldf/cldf/issues/10</a>),

        among them XIGT (<a class="moz-txt-link-freetext"

          href="https://github.com/xigt/xigt" moz-do-not-send="true">https://github.com/xigt/xigt</a>),

        which is also an XML format. Did you look at XIGT, and if so,

        why was it not suitable as export format for TypeCraft?<br>

        <br>

        best<br>

        robert<br>

        <br>

        <br>

        <div class="moz-cite-prefix">On 25.03.2018 16:51, Dorothee

          Beermann wrote:<br>

        </div>

        <blockquote type="cite"

          cite="mid:e92280a6-770e-0bb6-c4cd-000f8a36cb7e@ntnu.no">

          <p>Dear all,</p>

          <p>I have followed the discussion on this thread with

            interest. Let me ask you, would any of what you discuss and

            suggest here also apply to Interlinear Glossed Data?<br>

          </p>

          <p>Sebastian talked about making  "typological research more

            replicable". A related issue is reproducible research in

            linguists. I guess a good starting point for whatever we do

            as linguists is to keep things<br>

          </p>

          <div class="moz-forward-container">

            <p>transparent, and to give public access to data

              collections. Especially for languages with little to no

              public resources (except for what one finds in articles),

              this seems essential.<br>

            </p>

            <p>Here is an example of what I have in mind:  We just

              released 41 Interlinear Glossed Texts in Akan. The data

              can be downloaded as XML from:</p>

            <p><a class="moz-txt-link-freetext"

                href="https://typecraft.org/tc2wiki/The_TypeCraft_Akan_Corpus"

                moz-do-not-send="true">https://typecraft.org/tc2wiki/The_TypeCraft_Akan_Corpus</a><br>

            </p>

            The corpus is described on the download page, and also in

            the notes contained in the download. (Note that we can offer

            the material in several other formats.) <br>

            <br>

            <br>

            Dorothee <br>

            <br>

            <font color="#999999" size="-1">Professor Dorothee Beermann,

              PhD<br>

              Norwegian University of Science and Technology (NTNU)<br>

              Dept. of Language and Literature<br>

              Surface mail to: NO-7491 Trondheim, Norway/Norge<br>

              <br>

              Visit: Building 4, level 5, room 4512, Dragvoll,<br>

              E-mail:  <a class="moz-txt-link-abbreviated"

                href="mailto:dorothee.beermann@ntnu.no"

                moz-do-not-send="true">dorothee.beermann@ntnu.no</a><br>

              <br>

              Homepage:<a class="moz-txt-link-freetext"

                href="http://www.ntnu.no/ansatte/dorothee.beermann"

                moz-do-not-send="true">http://www.ntnu.no/ansatte/dorothee.beermann</a><br>

              TypeCraft:<a class="moz-txt-link-freetext"

                href="http://typecraft.org/tc2wiki/User:Dorothee_Beermann"

                moz-do-not-send="true">http://typecraft.org/tc2wiki/User:Dorothee_Beermann</a><br>

            </font><br>

            <br>

            <br>

            <br>

            <br>

            -------- Forwarded Message --------

            <table class="moz-email-headers-table" border="0"

              cellspacing="0" cellpadding="0">

              <tbody>

                <tr>

                  <th nowrap="nowrap" valign="BASELINE" align="RIGHT">Subject:

                  </th>

                  <td>Re: [Lingtyp] Empirical standards in typology:

                    incentives</td>

                </tr>

                <tr>

                  <th nowrap="nowrap" valign="BASELINE" align="RIGHT">Date:

                  </th>

                  <td>Fri, 23 Mar 2018 11:59:18 +1100</td>

                </tr>

                <tr>

                  <th nowrap="nowrap" valign="BASELINE" align="RIGHT">From:

                  </th>

                  <td>Hedvig Skirgård <a class="moz-txt-link-rfc2396E"

                      href="mailto:hedvig.skirgard@gmail.com"

                      moz-do-not-send="true"><hedvig.skirgard@gmail.com></a></td>

                </tr>

                <tr>

                  <th nowrap="nowrap" valign="BASELINE" align="RIGHT">To:

                  </th>

                  <td>Johanna NICHOLS <a class="moz-txt-link-rfc2396E"

                      href="mailto:johanna@berkeley.edu"

                      moz-do-not-send="true"><johanna@berkeley.edu></a></td>

                </tr>

                <tr>

                  <th nowrap="nowrap" valign="BASELINE" align="RIGHT">CC:

                  </th>

                  <td>Linguistic Typology <a

                      class="moz-txt-link-rfc2396E"

                      href="mailto:lingtyp@listserv.linguistlist.org"

                      moz-do-not-send="true"><lingtyp@listserv.linguistlist.org></a></td>

                </tr>

              </tbody>

            </table>

            <br>

            <br>

            <div dir="ltr">Dear all, 

              <div><br>

              </div>

              <div>I think Sebastian's suggestion is very good. </div>

              <div><br>

              </div>

              <div>Is this something LT would consider, Masja?</div>

              <div><br>

              </div>

              <div>Johanna's point is good as well, but it shouldn't

                matter for Sebastian's suggestion as I understand it.

                We're not being asked to submit the coding criteria

                prior to the survey being completed, but only at the

                time of publication. There are initiatives in STEM that

                encourages research teams to submit what they're

                planning to do prior to doing if (to avoid biases), but

                that's not baked into what Sebastian is suggestion, from

                what I can tell.</div>

              <div><br>

              </div>

              <div>I would also add a 4 star category which includes

                inter-coderreliabiity tests, i.e. the original author(s)

                have given different people the same instructions and

                tested how often they do the same thing with the same

                grammar.</div>

              <div><br>

              </div>

              <div>/Hedvig</div>

            </div>

            <div class="gmail_extra"><br clear="all">

              <div>

                <div class="gmail_signature"

                  data-smartmail="gmail_signature">

                  <div dir="ltr">

                    <div>

                      <div dir="ltr">

                        <div dir="ltr">

                          <div dir="ltr">

                            <div dir="ltr">

                              <div dir="ltr">

                                <div dir="ltr">

                                  <div dir="ltr">

                                    <div dir="ltr">

                                      <div dir="ltr">

                                        <div dir="ltr">

                                          <p style="margin:0cm 0cm

                                            0.0001pt;font-size:11pt;font-family:Calibri,sans-serif"><span

                                              style="font-size:9pt"><b><br>

                                              </b></span></p>

                                          <p style="margin:0cm 0cm

                                            0.0001pt"><font face="arial,

                                              helvetica, sans-serif"

                                              size="2"><b>Med vänliga

                                                hälsningar</b><b>,</b><br>

                                            </font></p>

                                          <p style="margin:0cm 0cm

                                            0.0001pt"><b><font

                                                face="arial, helvetica,

                                                sans-serif" size="2">Hedvig

                                                Skirgård</font></b></p>

                                          <p style="margin:0cm 0cm

                                            0.0001pt"><br>

                                          </p>

                                          <p style="margin:0cm 0cm

                                            0.0001pt"><font size="1"><span

style="font-family:verdana,sans-serif;color:rgb(0,0,0)">PhD Candidate</span><br>

                                            </font></p>

                                          <p

style="color:rgb(0,0,0);font-family:Verdana,Helvetica,Arial,sans-serif;margin:0cm

                                            0cm 0.0001pt"><span

                                              style="font-family:verdana,sans-serif"><font

                                                size="1">The Wellsprings

                                                of Linguistic Diversity</font></span></p>

                                          <p

style="color:rgb(0,0,0);font-family:Verdana,Helvetica,Arial,sans-serif;margin:0cm

                                            0cm 0.0001pt"><font

                                              face="verdana, sans-serif"

                                              size="1">ARC Centre of

                                              Excellence for the

                                              Dynamics of Language</font></p>

                                          <p

style="color:rgb(0,0,0);font-family:Verdana,Helvetica,Arial,sans-serif;margin:0cm

                                            0cm 0.0001pt"><font

                                              face="verdana, sans-serif"

                                              size="1">School of

                                              Culture, History and

                                              Language<br>

                                              College of Asia and the

                                              Pacific</font></p>

                                          <p

style="color:rgb(0,0,0);font-family:Verdana,Helvetica,Arial,sans-serif;margin:0cm

                                            0cm 0.0001pt"><font

                                              face="verdana, sans-serif"

                                              size="1">The Australian

                                              National University</font></p>

                                          <p style="margin:0cm 0cm

                                            0.0001pt"><font

                                              color="#666666"

                                              face="arial, helvetica,

                                              sans-serif" size="1"><a

                                                href="https://sites.google.com/site/hedvigskirgard/"

                                                target="_blank"

                                                moz-do-not-send="true">Website</a><br>

                                            </font></p>

                                          <div><br>

                                          </div>

                                          <p style="margin:0cm 0cm

                                            0.0001pt"><br>

                                          </p>

                                        </div>

                                      </div>

                                    </div>

                                  </div>

                                </div>

                              </div>

                            </div>

                          </div>

                        </div>

                      </div>

                    </div>

                  </div>

                </div>

              </div>

              <br>

              <div class="gmail_quote">2018-03-23 0:49 GMT+11:00 Johanna

                NICHOLS <span dir="ltr"><<a

                    href="mailto:johanna@berkeley.edu" target="_blank"

                    moz-do-not-send="true">johanna@berkeley.edu</a>></span>:<br>

                <blockquote class="gmail_quote" style="margin:0 0 0

                  .8ex;border-left:1px #ccc solid;padding-left:1ex">

                  <div dir="ltr">

                    <div>What's in the codebook -- the coding categories

                      and the criteria?  That much is usually in the

                      body of the paper.<br>

                      <br>

                    </div>

                    <div>Also, a minor but I think important point: 

                      Ordinarily the codebook doesn't in fact

                      chronologically precede the spreadsheet.  A draft

                      or early version of it does, and that gets revised

                      many times as you run into new and unexpected

                      things.  (And every previous entry in the

                      spreadsheet gets checked and edited too.)  By the

                      time you've finished your survey the categories

                      and typology can look different from what you

                      started with.  You publish when you're comfortably

                      past the point of diminishing returns.  In most

                      sciences this is bad method, but in linguistics

                      it's common and I'd say normal.  The capacity to

                      handle it needs to be built into the method in

                      advance.  <br>

                    </div>

                    <span class="HOEnZb"><font color="#888888">

                        <div><br>

                        </div>

                        Johanna<br>

                      </font></span></div>

                  <div class="HOEnZb">

                    <div class="h5">

                      <div class="gmail_extra"><br>

                        <div class="gmail_quote">On Thu, Mar 22, 2018 at

                          2:10 PM, Sebastian Nordhoff <span dir="ltr"><<a

href="mailto:sebastian.nordhoff@glottotopia.de" target="_blank"

                              moz-do-not-send="true">sebastian.nordhoff@<wbr>glottotopia.de</a>></span>

                          wrote:<br>

                          <blockquote class="gmail_quote"

                            style="margin:0 0 0 .8ex;border-left:1px

                            #ccc solid;padding-left:1ex">Dear all,<br>

                            taking up a thread from last November, I

                            would like to start a<br>

                            discussion about how to make typological

                            research more replicable, where<br>

                            replicable means "less dependent on the

                            original researcher". This<br>

                            includes coding decisions, tabular data,

                            quantitative analyses etc.<br>

                            <br>

                            Volker Gast wrote (full quote at bottom of

                            mail):<br>

                            > Let's assume that self-annotation

                            cannot be avoided for financial<br>

                            > reasons. What about establishing a

                            standard saying that, for instance,<br>

                            > when you submit a

                            quantitative-typological paper to LT you

                            have to<br>

                            > provide the data in such a way that the

                            coding decisions are made<br>

                            > sufficiently transparent for readers to

                            see if they can go along with<br>

                            > the argument?<br>

                            <br>

                            I see two possibilities for that: Option 1:

                            editors will refuse papers<br>

                            which do not adhere to this standard. That

                            will not work in my view.<br>

                            What might work (Option 2) is a star/badge

                            system. I could imagine the<br>

                            following:<br>

                            <br>

                            - no stars: only standard bibliographical

                            references<br>

                            - *         raw tabular data (spreadsheet)

                            available as a supplement<br>

                            - **        as above, + code book available

                            as a supplement<br>

                            - ***       as above, + computer code in R

                            or similar available<br>

                            <br>

                            For a three-star article, an unrelated

                            researcher could then take the<br>

                            original grammars and the code book and

                            replicate the spreadsheet to see<br>

                            if it matches. They could then run the

                            computer code to see if they<br>

                            arrive at the same results.<br>

                            <br>

                            This will not be practical for every

                            research project, but some might<br>

                            find it easier than others, and, in the long

                            run, it will require good<br>

                            arguments to submit a 0-star (i.e.

                            non-replicable) quantitative article.<br>

                            <br>

                            Any thoughts?<br>

                            Sebastian<br>

                            <br>

                            PS: Note that the codebook would actually

                            chronologically precede the<br>

                            spreadsheet, but I fill that spreadsheets

                            are more easily available than<br>

                            codebooks, so in order to keep the entry

                            barrier low, this order is<br>

                            reversed for the stars.<br>

                            <br>

                          </blockquote>

                        </div>

                      </div>

                    </div>

                  </div>

                  <br>

                </blockquote>

              </div>

              <br>

            </div>

          </div>

          <br>

          <fieldset class="mimeAttachmentHeader"></fieldset>

          <br>

          <pre wrap="">_______________________________________________

Lingtyp mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Lingtyp@listserv.linguistlist.org" moz-do-not-send="true">Lingtyp@listserv.linguistlist.org</a>

<a class="moz-txt-link-freetext" href="http://listserv.linguistlist.org/mailman/listinfo/lingtyp" moz-do-not-send="true">http://listserv.linguistlist.org/mailman/listinfo/lingtyp</a>

</pre>

        </blockquote>

        <br>

        <br>

        <fieldset class="mimeAttachmentHeader"></fieldset>

        <br>

        <pre wrap="">_______________________________________________

Lingtyp mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Lingtyp@listserv.linguistlist.org" moz-do-not-send="true">Lingtyp@listserv.linguistlist.org</a>

<a class="moz-txt-link-freetext" href="http://listserv.linguistlist.org/mailman/listinfo/lingtyp" moz-do-not-send="true">http://listserv.linguistlist.org/mailman/listinfo/lingtyp</a>

</pre>

      </blockquote>

      <br>

      <pre class="moz-signature" cols="72">-- 

Professor Dorothee Beermann, PhD

Norwegian University of Science and Technology (NTNU)

Dept. of Language and Literature

Surface mail to: NO-7491 Trondheim, Norway/Norge

Visit: Building 4, level 5, room 4512, Dragvoll,

Tel.:    +47 73 596525

E-mail:  <a class="moz-txt-link-abbreviated" href="mailto:dorothee.beermann@ntnu.no" moz-do-not-send="true">dorothee.beermann@ntnu.no</a>

Homepage:<a class="moz-txt-link-freetext" href="http://www.ntnu.no/ansatte/dorothee.beermann" moz-do-not-send="true">http://www.ntnu.no/ansatte/dorothee.beermann</a>

TypeCraft:<a class="moz-txt-link-freetext" href="http://typecraft.org/tc2wiki/User:Dorothee_Beermann" moz-do-not-send="true">http://typecraft.org/tc2wiki/User:Dorothee_Beermann</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>