[Lingtyp] Fwd: Re: Empirical standards in typology: incentives
Dorothee Beermann
dorothee.beermann at ntnu.no
Sun Mar 25 14:51:26 UTC 2018
Dear all,
I have followed the discussion on this thread with interest. Let me ask
you, would any of what you discuss and suggest here also apply to
Interlinear Glossed Data?
Sebastian talked about making "typological research more replicable". A
related issue is reproducible research in linguists. I guess a good
starting point for whatever we do as linguists is to keep things
transparent, and to give public access to data collections. Especially
for languages with little to no public resources (except for what one
finds in articles), this seems essential.
Here is an example of what I have in mind: We just released 41
Interlinear Glossed Texts in Akan. The data can be downloaded as XML from:
https://typecraft.org/tc2wiki/The_TypeCraft_Akan_Corpus
The corpus is described on the download page, and also in the notes
contained in the download. (Note that we can offer the material in
several other formats.)
Dorothee
Professor Dorothee Beermann, PhD
Norwegian University of Science and Technology (NTNU)
Dept. of Language and Literature
Surface mail to: NO-7491 Trondheim, Norway/Norge
Visit: Building 4, level 5, room 4512, Dragvoll,
E-mail: dorothee.beermann at ntnu.no
Homepage:http://www.ntnu.no/ansatte/dorothee.beermann
TypeCraft:http://typecraft.org/tc2wiki/User:Dorothee_Beermann
-------- Forwarded Message --------
Subject: Re: [Lingtyp] Empirical standards in typology: incentives
Date: Fri, 23 Mar 2018 11:59:18 +1100
From: Hedvig Skirgård <hedvig.skirgard at gmail.com>
To: Johanna NICHOLS <johanna at berkeley.edu>
CC: Linguistic Typology <lingtyp at listserv.linguistlist.org>
Dear all,
I think Sebastian's suggestion is very good.
Is this something LT would consider, Masja?
Johanna's point is good as well, but it shouldn't matter for Sebastian's
suggestion as I understand it. We're not being asked to submit the
coding criteria prior to the survey being completed, but only at the
time of publication. There are initiatives in STEM that encourages
research teams to submit what they're planning to do prior to doing if
(to avoid biases), but that's not baked into what Sebastian is
suggestion, from what I can tell.
I would also add a 4 star category which includes inter-coderreliabiity
tests, i.e. the original author(s) have given different people the same
instructions and tested how often they do the same thing with the same
grammar.
/Hedvig
*
*
*Med vänliga hälsningar**,*
*Hedvig Skirgård*
PhD Candidate
The Wellsprings of Linguistic Diversity
ARC Centre of Excellence for the Dynamics of Language
School of Culture, History and Language
College of Asia and the Pacific
The Australian National University
Website <https://sites.google.com/site/hedvigskirgard/>
2018-03-23 0:49 GMT+11:00 Johanna NICHOLS <johanna at berkeley.edu
<mailto:johanna at berkeley.edu>>:
What's in the codebook -- the coding categories and the criteria?
That much is usually in the body of the paper.
Also, a minor but I think important point: Ordinarily the codebook
doesn't in fact chronologically precede the spreadsheet. A draft or
early version of it does, and that gets revised many times as you
run into new and unexpected things. (And every previous entry in
the spreadsheet gets checked and edited too.) By the time you've
finished your survey the categories and typology can look different
from what you started with. You publish when you're comfortably past
the point of diminishing returns. In most sciences this is bad
method, but in linguistics it's common and I'd say normal. The
capacity to handle it needs to be built into the method in advance.
Johanna
On Thu, Mar 22, 2018 at 2:10 PM, Sebastian Nordhoff
<sebastian.nordhoff at glottotopia.de
<mailto:sebastian.nordhoff at glottotopia.de>> wrote:
Dear all,
taking up a thread from last November, I would like to start a
discussion about how to make typological research more
replicable, where
replicable means "less dependent on the original researcher". This
includes coding decisions, tabular data, quantitative analyses etc.
Volker Gast wrote (full quote at bottom of mail):
> Let's assume that self-annotation cannot be avoided for financial
> reasons. What about establishing a standard saying that, for
instance,
> when you submit a quantitative-typological paper to LT you
have to
> provide the data in such a way that the coding decisions are made
> sufficiently transparent for readers to see if they can go
along with
> the argument?
I see two possibilities for that: Option 1: editors will refuse
papers
which do not adhere to this standard. That will not work in my view.
What might work (Option 2) is a star/badge system. I could
imagine the
following:
- no stars: only standard bibliographical references
- * raw tabular data (spreadsheet) available as a supplement
- ** as above, + code book available as a supplement
- *** as above, + computer code in R or similar available
For a three-star article, an unrelated researcher could then
take the
original grammars and the code book and replicate the
spreadsheet to see
if it matches. They could then run the computer code to see if they
arrive at the same results.
This will not be practical for every research project, but some
might
find it easier than others, and, in the long run, it will
require good
arguments to submit a 0-star (i.e. non-replicable) quantitative
article.
Any thoughts?
Sebastian
PS: Note that the codebook would actually chronologically
precede the
spreadsheet, but I fill that spreadsheets are more easily
available than
codebooks, so in order to keep the entry barrier low, this order is
reversed for the stars.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20180325/5009dabb/attachment.htm>
More information about the Lingtyp
mailing list