Participant consent and metadata, analyses

Mon Sep 27 17:53:47 UTC 2010

On Sun, Sep 26, 2010 at 21:21, Steven Bird <stevenbird1 at gmail.com> wrote:
> On 26 September 2010 12:54, Claire Bowern <clairebowern at gmail.com> wrote:
>> Permission to release metadata should definitely be sought from
>> participants before any of it is put on the web or shared.
>
> I don't think it's as black-and-white as this.  Posting personal
> information on the web is very different to telling a colleague you
> recorded a frog story from some language.  It would be unfortunate if
> the legitimate need to protect "personal identifying information"
> prevented widespread dissemination of generic information about the
> resources that have been collected for a language.  (I think most IRBs
> require PII to be stored separately to other project records.)

Many working linguists in my experience have a very vague idea about
metadata, though I don’t intend to accuse anyone here of this.
Metadata are not a monolithic thing, to which access can be open or
closed. Metadata are just like data, there are some bits here and some
bits there and they’re not necessarily in the same categories of
value, quality, importance, or any other measure. What I’m getting at
is that any approach which says “metadata release requires permission”
is far too broad to be of any practical use.

An IRB’s idea of personal identifying information is based on the
medical experiment model, where subject number 2342 has an entry in
the personal ID database which gives age, sex, race, and perhaps name
and birthplace. Our metadata are completely different from the health
sciences, and even from laboratory linguistics, although we might
collect some of the same information. If I have a recording from a
village of only 30 people, the very name of the village could be
construed as personally identificatory, much less the name of the
speaker. The subject matter (the actual data) of the recording could
also be PII, but since that’s embedded in the data it’s impossible to
eliminate without careful data-surgery.

I don’t feel that there can be a principled distinction between
identificatory and non-identificatory metadata in any sort of general
setting. Each culture and field situation requires different
evaluation by the linguist. In one context, an individual’s clan
membership should be public, but in another context it should not.
That particular metadatum might have different public accessibility
criteria than say the person’s dialect or age. We can’t simply say
“metadata release requires permission”, but instead have to specify
which metadata have such a requirement, what scope of release is
imagined, and under what conditions access will be available. All of
these are contextualized both by the collection situations *and* the
use situations. Sometimes it’s perfectly reasonable to share a
relatively private recording with a fellow researcher with the
understanding that it’s not going to become public. Other times it’s
essential to prevent anyone except a few designated individuals from
accessing a recording for many decades, since the recording and the
metadata both include very sensitive information.

What needs to be emphasized here is that those of us with practical
experience in dealing with data and metadata access control – both
collectors and archivists – need to publicize our experiences. We need
to clarify to other working linguists and students what sort of
situations we have encountered so that they can make more informed
decisions about their unique contexts. In addition, we need to make
summaries of our issues available to people confronting recalcitrant
IRBs and human subjects review boards so that other field linguists in
the academy need not enter the lion’s den unprotected.

Feel free to poke holes in what I’ve said, I’ve written it with only
the aid of a single cup of coffee.

Cheers,
James