[Lingtyp] Fwd: Re: Empirical standards in typology: incentives

Sun Mar 25 14:51:26 UTC 2018

Dear all,

I have followed the discussion on this thread with interest. Let me ask 
you, would any of what you discuss and suggest here also apply to 
Interlinear Glossed Data?

Sebastian talked about making  "typological research more replicable". A 
related issue is reproducible research in linguists. I guess a good 
starting point for whatever we do as linguists is to keep things

transparent, and to give public access to data collections. Especially 
for languages with little to no public resources (except for what one 
finds in articles), this seems essential.

Here is an example of what I have in mind:  We just released 41 
Interlinear Glossed Texts in Akan. The data can be downloaded as XML from:

https://typecraft.org/tc2wiki/The_TypeCraft_Akan_Corpus

The corpus is described on the download page, and also in the notes 
contained in the download. (Note that we can offer the material in 
several other formats.)

Dorothee

Professor Dorothee Beermann, PhD
Norwegian University of Science and Technology (NTNU)
Dept. of Language and Literature
Surface mail to: NO-7491 Trondheim, Norway/Norge

Visit: Building 4, level 5, room 4512, Dragvoll,
E-mail:  dorothee.beermann at ntnu.no

Homepage:http://www.ntnu.no/ansatte/dorothee.beermann
TypeCraft:http://typecraft.org/tc2wiki/User:Dorothee_Beermann

-------- Forwarded Message --------
Subject: 	Re: [Lingtyp] Empirical standards in typology: incentives
Date: 	Fri, 23 Mar 2018 11:59:18 +1100
From: 	Hedvig Skirgård <hedvig.skirgard at gmail.com>
To: 	Johanna NICHOLS <johanna at berkeley.edu>
CC: 	Linguistic Typology <lingtyp at listserv.linguistlist.org>

Dear all,

I think Sebastian's suggestion is very good.

Is this something LT would consider, Masja?

Johanna's point is good as well, but it shouldn't matter for Sebastian's 
suggestion as I understand it. We're not being asked to submit the 
coding criteria prior to the survey being completed, but only at the 
time of publication. There are initiatives in STEM that encourages 
research teams to submit what they're planning to do prior to doing if 
(to avoid biases), but that's not baked into what Sebastian is 
suggestion, from what I can tell.

I would also add a 4 star category which includes inter-coderreliabiity 
tests, i.e. the original author(s) have given different people the same 
instructions and tested how often they do the same thing with the same 
grammar.

/Hedvig

*
*

*Med vänliga hälsningar**,*

*Hedvig Skirgård*

PhD Candidate

The Wellsprings of Linguistic Diversity

ARC Centre of Excellence for the Dynamics of Language

School of Culture, History and Language
College of Asia and the Pacific

The Australian National University

Website <https://sites.google.com/site/hedvigskirgard/>

2018-03-23 0:49 GMT+11:00 Johanna NICHOLS <johanna at berkeley.edu 
<mailto:johanna at berkeley.edu>>:

    What's in the codebook -- the coding categories and the criteria? 
    That much is usually in the body of the paper.

    Also, a minor but I think important point: Ordinarily the codebook
    doesn't in fact chronologically precede the spreadsheet.  A draft or
    early version of it does, and that gets revised many times as you
    run into new and unexpected things.  (And every previous entry in
    the spreadsheet gets checked and edited too.)  By the time you've
    finished your survey the categories and typology can look different
    from what you started with. You publish when you're comfortably past
    the point of diminishing returns.  In most sciences this is bad
    method, but in linguistics it's common and I'd say normal.  The
    capacity to handle it needs to be built into the method in advance.

    Johanna

    On Thu, Mar 22, 2018 at 2:10 PM, Sebastian Nordhoff
    <sebastian.nordhoff at glottotopia.de
    <mailto:sebastian.nordhoff at glottotopia.de>> wrote:

        Dear all,
        taking up a thread from last November, I would like to start a
        discussion about how to make typological research more
        replicable, where
        replicable means "less dependent on the original researcher". This
        includes coding decisions, tabular data, quantitative analyses etc.

        Volker Gast wrote (full quote at bottom of mail):
         > Let's assume that self-annotation cannot be avoided for financial
         > reasons. What about establishing a standard saying that, for
        instance,
         > when you submit a quantitative-typological paper to LT you
        have to
         > provide the data in such a way that the coding decisions are made
         > sufficiently transparent for readers to see if they can go
        along with
         > the argument?

        I see two possibilities for that: Option 1: editors will refuse
        papers
        which do not adhere to this standard. That will not work in my view.
        What might work (Option 2) is a star/badge system. I could
        imagine the
        following:

        - no stars: only standard bibliographical references
        - *         raw tabular data (spreadsheet) available as a supplement
        - **        as above, + code book available as a supplement
        - ***       as above, + computer code in R or similar available

        For a three-star article, an unrelated researcher could then
        take the
        original grammars and the code book and replicate the
        spreadsheet to see
        if it matches. They could then run the computer code to see if they
        arrive at the same results.

        This will not be practical for every research project, but some
        might
        find it easier than others, and, in the long run, it will
        require good
        arguments to submit a 0-star (i.e. non-replicable) quantitative
        article.

        Any thoughts?
        Sebastian

        PS: Note that the codebook would actually chronologically
        precede the
        spreadsheet, but I fill that spreadsheets are more easily
        available than
        codebooks, so in order to keep the entry barrier low, this order is
        reversed for the stars.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20180325/5009dabb/attachment.htm>