[Lingtyp] Fwd: Re: Empirical standards in typology: incentives

Dorothee Beermann dorothee.beermann at ntnu.no
Wed Apr 4 18:36:57 UTC 2018


Dear Robert,

Thanks for the feedback. Our XML schema definition you find here*:* 
https://typecraft.org/typecraft.xsd

We started the development of our IGT-XML (TC-XML) in 2006/7,  at that 
time XIGT was not around yet. It was first presented in 2014, as far as 
I recall.

The most common IGT type is the basic three-line interlinear format, a 
format that can also be exported from TypeCraft.  Our Akan data is  part 
of speech tagged in addition.  The TypeCraft editor allows for 
annotations on several tiers which is also reflected in our XML.

I agree with you; its is a good idea to also offer a CSV format. We do 
not do that at the moment, although it is an option, since we work with 
a PostgreSQL database.

Best,

Dorothee


On 04. april 2018 11:19, Robert Forkel wrote:
> Dear Dorothee,
> I just had a brief look at the Akan corpus. I'd be curious what guided 
> your decision to come up with a custom XML based export format. The 
> namespace URL
> http://typecraft.org/typecraft
> doesn't seem to resolve, so I guess there is no schema defining the 
> XML, right? We included (very basic) support for IGT in CLDF (see 
> https://github.com/cldf/cldf/tree/master/components/examples), because
> - the examples we found in databases like WALS could be modeled in 
> this simplistic form and
> - CSV is better suited for tools like version control than XML
> - we wanted to have IGT data available in the same format framework as 
> other linguistic data to make links between data homogenous.
>
> We also discussed other IGT formats (see 
> https://github.com/cldf/cldf/issues/10), among them XIGT 
> (https://github.com/xigt/xigt), which is also an XML format. Did you 
> look at XIGT, and if so, why was it not suitable as export format for 
> TypeCraft?
>
> best
> robert
>
>
> On 25.03.2018 16:51, Dorothee Beermann wrote:
>>
>> Dear all,
>>
>> I have followed the discussion on this thread with interest. Let me 
>> ask you, would any of what you discuss and suggest here also apply to 
>> Interlinear Glossed Data?
>>
>> Sebastian talked about making  "typological research more 
>> replicable". A related issue is reproducible research in linguists. I 
>> guess a good starting point for whatever we do as linguists is to 
>> keep things
>>
>> transparent, and to give public access to data collections. 
>> Especially for languages with little to no public resources (except 
>> for what one finds in articles), this seems essential.
>>
>> Here is an example of what I have in mind:  We just released 41 
>> Interlinear Glossed Texts in Akan. The data can be downloaded as XML 
>> from:
>>
>> https://typecraft.org/tc2wiki/The_TypeCraft_Akan_Corpus
>>
>> The corpus is described on the download page, and also in the notes 
>> contained in the download. (Note that we can offer the material in 
>> several other formats.)
>>
>>
>> Dorothee
>>
>> Professor Dorothee Beermann, PhD
>> Norwegian University of Science and Technology (NTNU)
>> Dept. of Language and Literature
>> Surface mail to: NO-7491 Trondheim, Norway/Norge
>>
>> Visit: Building 4, level 5, room 4512, Dragvoll,
>> E-mail: dorothee.beermann at ntnu.no
>>
>> Homepage:http://www.ntnu.no/ansatte/dorothee.beermann
>> TypeCraft:http://typecraft.org/tc2wiki/User:Dorothee_Beermann
>>
>>
>>
>>
>>
>> -------- Forwarded Message --------
>> Subject: 	Re: [Lingtyp] Empirical standards in typology: incentives
>> Date: 	Fri, 23 Mar 2018 11:59:18 +1100
>> From: 	Hedvig Skirgård <hedvig.skirgard at gmail.com>
>> To: 	Johanna NICHOLS <johanna at berkeley.edu>
>> CC: 	Linguistic Typology <lingtyp at listserv.linguistlist.org>
>>
>>
>>
>> Dear all,
>>
>> I think Sebastian's suggestion is very good.
>>
>> Is this something LT would consider, Masja?
>>
>> Johanna's point is good as well, but it shouldn't matter for 
>> Sebastian's suggestion as I understand it. We're not being asked to 
>> submit the coding criteria prior to the survey being completed, but 
>> only at the time of publication. There are initiatives in STEM that 
>> encourages research teams to submit what they're planning to do prior 
>> to doing if (to avoid biases), but that's not baked into what 
>> Sebastian is suggestion, from what I can tell.
>>
>> I would also add a 4 star category which includes 
>> inter-coderreliabiity tests, i.e. the original author(s) have given 
>> different people the same instructions and tested how often they do 
>> the same thing with the same grammar.
>>
>> /Hedvig
>>
>> *
>> *
>>
>> *Med vänliga hälsningar**,*
>>
>> *Hedvig Skirgård*
>>
>>
>> PhD Candidate
>>
>> The Wellsprings of Linguistic Diversity
>>
>> ARC Centre of Excellence for the Dynamics of Language
>>
>> School of Culture, History and Language
>> College of Asia and the Pacific
>>
>> The Australian National University
>>
>> Website <https://sites.google.com/site/hedvigskirgard/>
>>
>>
>>
>>
>> 2018-03-23 0:49 GMT+11:00 Johanna NICHOLS <johanna at berkeley.edu 
>> <mailto:johanna at berkeley.edu>>:
>>
>>     What's in the codebook -- the coding categories and the
>>     criteria?  That much is usually in the body of the paper.
>>
>>     Also, a minor but I think important point: Ordinarily the
>>     codebook doesn't in fact chronologically precede the
>>     spreadsheet.  A draft or early version of it does, and that gets
>>     revised many times as you run into new and unexpected things.
>>     (And every previous entry in the spreadsheet gets checked and
>>     edited too.)  By the time you've finished your survey the
>>     categories and typology can look different from what you started
>>     with.  You publish when you're comfortably past the point of
>>     diminishing returns.  In most sciences this is bad method, but in
>>     linguistics it's common and I'd say normal.  The capacity to
>>     handle it needs to be built into the method in advance.
>>
>>     Johanna
>>
>>     On Thu, Mar 22, 2018 at 2:10 PM, Sebastian Nordhoff
>>     <sebastian.nordhoff at glottotopia.de
>>     <mailto:sebastian.nordhoff at glottotopia.de>> wrote:
>>
>>         Dear all,
>>         taking up a thread from last November, I would like to start a
>>         discussion about how to make typological research more
>>         replicable, where
>>         replicable means "less dependent on the original researcher".
>>         This
>>         includes coding decisions, tabular data, quantitative
>>         analyses etc.
>>
>>         Volker Gast wrote (full quote at bottom of mail):
>>         > Let's assume that self-annotation cannot be avoided for
>>         financial
>>         > reasons. What about establishing a standard saying that,
>>         for instance,
>>         > when you submit a quantitative-typological paper to LT you
>>         have to
>>         > provide the data in such a way that the coding decisions
>>         are made
>>         > sufficiently transparent for readers to see if they can go
>>         along with
>>         > the argument?
>>
>>         I see two possibilities for that: Option 1: editors will
>>         refuse papers
>>         which do not adhere to this standard. That will not work in
>>         my view.
>>         What might work (Option 2) is a star/badge system. I could
>>         imagine the
>>         following:
>>
>>         - no stars: only standard bibliographical references
>>         - *         raw tabular data (spreadsheet) available as a
>>         supplement
>>         - **        as above, + code book available as a supplement
>>         - ***       as above, + computer code in R or similar available
>>
>>         For a three-star article, an unrelated researcher could then
>>         take the
>>         original grammars and the code book and replicate the
>>         spreadsheet to see
>>         if it matches. They could then run the computer code to see
>>         if they
>>         arrive at the same results.
>>
>>         This will not be practical for every research project, but
>>         some might
>>         find it easier than others, and, in the long run, it will
>>         require good
>>         arguments to submit a 0-star (i.e. non-replicable)
>>         quantitative article.
>>
>>         Any thoughts?
>>         Sebastian
>>
>>         PS: Note that the codebook would actually chronologically
>>         precede the
>>         spreadsheet, but I fill that spreadsheets are more easily
>>         available than
>>         codebooks, so in order to keep the entry barrier low, this
>>         order is
>>         reversed for the stars.
>>
>>
>>
>>
>>
>> _______________________________________________
>> Lingtyp mailing list
>> Lingtyp at listserv.linguistlist.org
>> http://listserv.linguistlist.org/mailman/listinfo/lingtyp
>
>
>
> _______________________________________________
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
> http://listserv.linguistlist.org/mailman/listinfo/lingtyp

-- 
Professor Dorothee Beermann, PhD
Norwegian University of Science and Technology (NTNU)
Dept. of Language and Literature
Surface mail to: NO-7491 Trondheim, Norway/Norge

Visit: Building 4, level 5, room 4512, Dragvoll,
Tel.:    +47 73 596525
E-mail:  dorothee.beermann at ntnu.no

Homepage:http://www.ntnu.no/ansatte/dorothee.beermann
TypeCraft:http://typecraft.org/tc2wiki/User:Dorothee_Beermann


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20180404/bd6e3526/attachment.htm>


More information about the Lingtyp mailing list