[RNLD] Best practice wrt authentic data

Andrea L. Berez-Kroeker andrea.berez at gmail.com
Thu Jun 8 15:39:43 EDT 2017


Hi Joe and RNLD,

Jorge is right, our paper on methods and data citation in grammars just
came out in LD&C. That paper is part of a larger initiative on encouraging
data citation and attribution more widely in linguistics, in which some 40+
linguists and data scientists have participated (our website
<https://protect-au.mimecast.com/s/krZOBliOmkEGSD?domain=sites.google.com>, which contains
materials from three workshops held since 2015, and a session at the LSA in
January).

Other products of this project include a survey of 270 journal articles
from 9 journals across the field, and a general position paper on data
citation in linguistics. Unfortunately, the uptake on these last two items
has been difficult so far and we have not yet found a venue willing to
publish them -- it seems that documentary linguists are way ahead of the
rest of the field in terms of wanting to "speak to the authenticity of the
data", as you say.

In short, we found that we linguist-authors are very good at citing data
that comes from a traditional paper publication, but fairly lousy at citing
data from any other source, or even describing how the data was obtained,
whether through elicitation, or introspection, or narratives, etc. Our
hunch is that there are multiple reasons for this, including:

-lack of standards of *how* to format citations of data
-lack of policies by journals and other publishers on data
-lack of knowledge about *how,* *if* or *where* to store data, and when
it's appropriate/ethical or inappropriate/unethical to *share* data
-uneasiness about accountability
-a general de-valuing of all of the work that goes into properly caring for
data in hiring, tenure and promotion, such that analysis is prioritized
over data work

Again, the documentation community, and probably some areas of
computational linguistics, seem to be at the forefront of encouraging and
establishing practices for making linguistics more reproducible, but
sociological change is slow and there's a clear need for education. By the
way, we've also formed a group in the Research Data Alliance called
the Linguistics
Data Interest Group
<https://protect-au.mimecast.com/s/pLGZBoi2YqE0FW?domain=rd-alliance.org> that
is open for anyone to join and contribute to the discussion.

Andrea
--
Andrea L. Berez-Kroeker
Associate Professor and Chair of Graduate Studies
Department of Linguistics, University of Hawai'i at Mānoa
Director, Kaipuleohone UH Digital Language Archive
Senior co-Chair, LSA Committee for Endangered Languages and Their
Preservation
https://protect-au.mimecast.com/s/Ld1ZBmSp80Zxtn?domain=orcid.org
https://protect-au.mimecast.com/s/GN1VBvta67VzH1?domain=www2.hawaii.edu

Office hour appointments: https://protect-au.mimecast.com/s/YZ52BmiXz5xqHA?domain=bit.ly

On Thu, Jun 8, 2017 at 3:42 AM, Jorge Emilio Rosés Labrada <jrosesla at uwo.ca>
wrote:

> Dear Joe,
>
> There's a new paper out (as of a couple of days ago) in *Language
> Documentation & Conservation *by Lauren Gawne, Barbara F. Kelly, Andrea
> L. Berez-Kroeker & Tyler Heston that examines this issue: Putting
> practice into words: The state of data and methods transparency in
> grammatical descriptions
> <https://protect-au.mimecast.com/s/K41rBvS4ENr9sl?domain=scholarspace.manoa.hawaii.edu>
>
> Lauren, Andrea, Barbara and Tyler will be able to say more about this than
> I can but my understanding is that there is definitely a move in the
> direction of including this type of data with examples in newer work.
>
> Best,
> Jorge
>
> ____________
> Jorge Emilio Rosés Labrada
> Banting-Killam Postdoctoral Fellow
> First Nations and Endangered Languages Program
> University of British Columbia
>
> On Thu, Jun 8, 2017 at 6:44 AM, Brenda Boerger <brenda_boerger at sil.org>
> wrote:
>
>> Hi Joe,
>>
>>
>>
>> Yes, it is or should be best practice to include the source of any data
>> we use in our papers.
>>
>>
>>
>> For some of my work on Natqgu [ntu] I no longer have the recordings of
>> 30+ years ago, but only the transcribed text. In these instances I cite the
>> text number and line number for the examples, even though the examples are
>> also numbered sequentially in the paper. That makes for lots of numbers
>> running around.
>>
>>
>>
>> The ideal is that every file is named uniquely and prepared for archiving
>> so that someone 50 years in the future can find your corpus and the
>> recordings you cited based on what you put in your paper.
>>
>>
>>
>> I don’t think we’re there yet. I have not archived what I’m working on.
>> And I’m guessing that this will be a cyclic process—collect and name,
>> analyze and write, archive SOMETHING, repeat process.
>>
>>
>>
>> I’m not aware of particular references stipulating this and will be
>> interested to hear what others have to say. I’m thinking we need to add
>> this to our e-book:
>> Boerger, Brenda H., Stephen N. Self, Sarah Ruth Moeller, and D. Will
>> Reiman. 2016.  Language and Culture Documentation Manual. LeanPub.
>> https://protect-au.mimecast.com/s/e4MdB8S6GVEruY?domain=leanpub.com
>> <https://protect-au.mimecast.com/s/e4MdB8S6GVEruY?domain=leanpub.com>
>>
>> That would at least give you (and others) something to cite as best
>> practice. Thanks for bringing this up.
>>
>>
>>
>> ~Brenda
>>
>> _______________________________
>>
>> Brenda H. Boerger, PhD
>>
>> Special Consultant for Language and Culture Documentation
>>
>> SIL International Language Program Services
>>
>>
>>
>> 972-273-9356 <(972)%20273-9356>
>>
>> https://protect-au.mimecast.com/s/W91wBai9wm8Acp?domain=sil.org
>> <https://protect-au.mimecast.com/s/W91wBai9wm8Acp?domain=sil.org>
>>
>> https://protect-au.mimecast.com/s/0RmEBkHakMQ6HX?domain=gial.edu
>> <https://protect-au.mimecast.com/s/0RmEBkHakMQ6HX?domain=gial.edu>
>>
>> Skype ID:  brenda_boerger1
>>
>>
>>
>> *From:* Joe Blythe [mailto:joe.blythe at mq.edu.au]
>> *Sent:* Thursday, June 08, 2017 2:02 AM
>> *To:* r-n-l-d
>> *Subject:* [RNLD] Best practice wrt authentic data
>>
>>
>>
>> Dear RNLDers
>>
>>
>>
>> In just about every linguistics paper I’ve written, I’ve always mentioned
>> which recording an example comes from (with a recording reference and
>> time-codes), or which field notebook an example comes from, if elicited. I
>> always thought that this practise speaks to the authenticity of the data. I
>> assumed that if such a trail is trackable then you are unlikely to be
>> accused of making stuff up!
>>
>> I know that there are many other researchers that do this, so I’m
>> wondering there are references to this being best practice, or at least
>> being advisable.
>>
>>
>>
>> Also, turning this around, is it reasonable to expect (in 2017) that
>> researchers writing about an endangered language follow such a protocol, if
>> is in fact a protocol?
>>
>>
>>
>> Best
>>
>> Joe
>>
>>
>>
>> Dr Joe Blythe
>>
>> Department of Linguistics
>>
>> Macquarie University
>>
>> Room 566, Building C5A
>>
>> Balaclava Rd, North Ryde, NSW 2109, Australia
>>
>> *Ph*: +61-2-9850-8089 <+61%202%209850%208089>  |   *Mob*: +61-409-88-1153
>>
>> *E*: joe.blythe at mq.edu.au  |  *Web*
>> <https://protect-au.mimecast.com/s/wxnKB8FO387XSk?domain=mq.edu.au>
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/resource-network-linguistic-diversity/attachments/20170608/96d941fa/attachment-0001.html>


More information about the Resource-network-linguistic-diversity mailing list