[Lingtyp] Fwd: Re: Empirical standards in typology: incentives

Robert Forkel forkel at shh.mpg.de
Wed Apr 4 09:19:32 UTC 2018


Dear Dorothee,
I just had a brief look at the Akan corpus. I'd be curious what guided 
your decision to come up with a custom XML based export format. The 
namespace URL

http://typecraft.org/typecraft

doesn't seem to resolve, so I guess there is no schema defining the XML, 
right? We included (very basic) support for IGT in CLDF (see 
https://github.com/cldf/cldf/tree/master/components/examples), because
- the examples we found in databases like WALS could be modeled in this 
simplistic form and
- CSV is better suited for tools like version control than XML
- we wanted to have IGT data available in the same format framework as 
other linguistic data to make links between data homogenous.

We also discussed other IGT formats (see 
https://github.com/cldf/cldf/issues/10), among them XIGT 
(https://github.com/xigt/xigt), which is also an XML format. Did you 
look at XIGT, and if so, why was it not suitable as export format for 
TypeCraft?

best
robert


On 25.03.2018 16:51, Dorothee Beermann wrote:
>
> Dear all,
>
> I have followed the discussion on this thread with interest. Let me 
> ask you, would any of what you discuss and suggest here also apply to 
> Interlinear Glossed Data?
>
> Sebastian talked about making  "typological research more replicable". 
> A related issue is reproducible research in linguists. I guess a good 
> starting point for whatever we do as linguists is to keep things
>
> transparent, and to give public access to data collections. Especially 
> for languages with little to no public resources (except for what one 
> finds in articles), this seems essential.
>
> Here is an example of what I have in mind:  We just released 41 
> Interlinear Glossed Texts in Akan. The data can be downloaded as XML from:
>
> https://typecraft.org/tc2wiki/The_TypeCraft_Akan_Corpus
>
> The corpus is described on the download page, and also in the notes 
> contained in the download. (Note that we can offer the material in 
> several other formats.)
>
>
> Dorothee
>
> Professor Dorothee Beermann, PhD
> Norwegian University of Science and Technology (NTNU)
> Dept. of Language and Literature
> Surface mail to: NO-7491 Trondheim, Norway/Norge
>
> Visit: Building 4, level 5, room 4512, Dragvoll,
> E-mail: dorothee.beermann at ntnu.no
>
> Homepage:http://www.ntnu.no/ansatte/dorothee.beermann
> TypeCraft:http://typecraft.org/tc2wiki/User:Dorothee_Beermann
>
>
>
>
>
> -------- Forwarded Message --------
> Subject: 	Re: [Lingtyp] Empirical standards in typology: incentives
> Date: 	Fri, 23 Mar 2018 11:59:18 +1100
> From: 	Hedvig Skirgård <hedvig.skirgard at gmail.com>
> To: 	Johanna NICHOLS <johanna at berkeley.edu>
> CC: 	Linguistic Typology <lingtyp at listserv.linguistlist.org>
>
>
>
> Dear all,
>
> I think Sebastian's suggestion is very good.
>
> Is this something LT would consider, Masja?
>
> Johanna's point is good as well, but it shouldn't matter for 
> Sebastian's suggestion as I understand it. We're not being asked to 
> submit the coding criteria prior to the survey being completed, but 
> only at the time of publication. There are initiatives in STEM that 
> encourages research teams to submit what they're planning to do prior 
> to doing if (to avoid biases), but that's not baked into what 
> Sebastian is suggestion, from what I can tell.
>
> I would also add a 4 star category which includes 
> inter-coderreliabiity tests, i.e. the original author(s) have given 
> different people the same instructions and tested how often they do 
> the same thing with the same grammar.
>
> /Hedvig
>
> *
> *
>
> *Med vänliga hälsningar**,*
>
> *Hedvig Skirgård*
>
>
> PhD Candidate
>
> The Wellsprings of Linguistic Diversity
>
> ARC Centre of Excellence for the Dynamics of Language
>
> School of Culture, History and Language
> College of Asia and the Pacific
>
> The Australian National University
>
> Website <https://sites.google.com/site/hedvigskirgard/>
>
>
>
>
> 2018-03-23 0:49 GMT+11:00 Johanna NICHOLS <johanna at berkeley.edu 
> <mailto:johanna at berkeley.edu>>:
>
>     What's in the codebook -- the coding categories and the criteria? 
>     That much is usually in the body of the paper.
>
>     Also, a minor but I think important point: Ordinarily the codebook
>     doesn't in fact chronologically precede the spreadsheet.  A draft
>     or early version of it does, and that gets revised many times as
>     you run into new and unexpected things.  (And every previous entry
>     in the spreadsheet gets checked and edited too.)  By the time
>     you've finished your survey the categories and typology can look
>     different from what you started with.  You publish when you're
>     comfortably past the point of diminishing returns.  In most
>     sciences this is bad method, but in linguistics it's common and
>     I'd say normal.  The capacity to handle it needs to be built into
>     the method in advance.
>
>     Johanna
>
>     On Thu, Mar 22, 2018 at 2:10 PM, Sebastian Nordhoff
>     <sebastian.nordhoff at glottotopia.de
>     <mailto:sebastian.nordhoff at glottotopia.de>> wrote:
>
>         Dear all,
>         taking up a thread from last November, I would like to start a
>         discussion about how to make typological research more
>         replicable, where
>         replicable means "less dependent on the original researcher". This
>         includes coding decisions, tabular data, quantitative analyses
>         etc.
>
>         Volker Gast wrote (full quote at bottom of mail):
>         > Let's assume that self-annotation cannot be avoided for
>         financial
>         > reasons. What about establishing a standard saying that, for
>         instance,
>         > when you submit a quantitative-typological paper to LT you
>         have to
>         > provide the data in such a way that the coding decisions are
>         made
>         > sufficiently transparent for readers to see if they can go
>         along with
>         > the argument?
>
>         I see two possibilities for that: Option 1: editors will
>         refuse papers
>         which do not adhere to this standard. That will not work in my
>         view.
>         What might work (Option 2) is a star/badge system. I could
>         imagine the
>         following:
>
>         - no stars: only standard bibliographical references
>         - *         raw tabular data (spreadsheet) available as a
>         supplement
>         - **        as above, + code book available as a supplement
>         - ***       as above, + computer code in R or similar available
>
>         For a three-star article, an unrelated researcher could then
>         take the
>         original grammars and the code book and replicate the
>         spreadsheet to see
>         if it matches. They could then run the computer code to see if
>         they
>         arrive at the same results.
>
>         This will not be practical for every research project, but
>         some might
>         find it easier than others, and, in the long run, it will
>         require good
>         arguments to submit a 0-star (i.e. non-replicable)
>         quantitative article.
>
>         Any thoughts?
>         Sebastian
>
>         PS: Note that the codebook would actually chronologically
>         precede the
>         spreadsheet, but I fill that spreadsheets are more easily
>         available than
>         codebooks, so in order to keep the entry barrier low, this
>         order is
>         reversed for the stars.
>
>
>
>
>
> _______________________________________________
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
> http://listserv.linguistlist.org/mailman/listinfo/lingtyp

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20180404/efa9951e/attachment.htm>


More information about the Lingtyp mailing list