[Corpora-List] (no subject)
Eric Atwell
E.S.Atwell at leeds.ac.uk
Sun Aug 17 17:30:39 UTC 2014
The British National Corpus Reference Guide
http://www.natcorp.ox.ac.uk/docs/URG/index.html
states that speaker identities were anonymized: :
" ... guarantee of confidentiality and complete anonymity (all references
to full names and addresses have been removed from the corpus and the log)"
I assume name and address removal was done by hand-editing the text,
but were any tests done to double-check anonymization was complete?
What instructions did the manual editors have, on exactly how to identify
and process the names etc?
I am interested in the possibility of usinf the BNC as a training corpus
for automated anonymization of other text sources, eg narrative text
in medical patient records. Does this sound feasible? What pitfalls should I
watch out for?
thanks for expert advice
Eric Atwell, School of Computing, Leeds University
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list