[Corpora-List] Ethical review of spoken corpus collection

Geoffrey Sampson grs2 at sussex.ac.uk
Mon Apr 11 19:31:04 UTC 2011


The contributions to this thread which I have seen so far take the line
that the ethics rules being applied to speech corpora are unreasonably
tight.  In some respects they may be, but there is another side to the
question.  If one works with the "demographically sampled speech" section
of the British National Corpus, for instance, even though some measures of
anonymization were applied during corpus compilation one not infrequently
comes across material that is rather damaging to identifiable individuals
(often third parties who were not participants in the recorded
conversations).  I discuss this to some extent for instance in section 4.1
of the documentation of my CHRISTINE Corpus
(www.grsampson.net/ChrisDoc.html), and others have picked the point up and
discussed it further.  Last time I was actively involved with the corpus
linguistics scene, this kind of problem did not seem to be adequately
recognized and addressed by the research community; yet, apart from the
fact that we researchers should surely want to act like decent people, the
law imposes constraints that linguists don't always seem to understand.  (I
am fairly sure, for instance, that some of the material I found in the BNC
violates the European Data Protection Directive, though possibly not the
deliberately weak implementation of it within UK law.) I don't doubt that
many university bureaucrats are stupidly heavy-handed in the manner they go
about controlling research, but there is a real issue here -- laissez-faire
would not be a good idea.

Geoffrey Sampson


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list