[Corpora-List] : Anonymization or de-identification guidelines or software
Mehmet Kayaalp
Mehmet.Kayaalp at nih.gov
Fri Nov 4 16:08:29 UTC 2011
Anonymization of narrative text (unless trivial) is generally not attainable
due to the nature of natural languages. We soon publish our work on the
clinical text de-identification (CTD) system, which we have been developing
at the US National Library of Medicine, NIH. Here is an excerpt from that
work and the associated references, which may be of interest.
The code of federal regulation defines the protected health information
(PHI) as individually identifiable health information, where individual
means the person who is the subject of protected health information [3].
The regulation states that a covered entity may determine that health
information is not individually identifiable health information if names
along with the other 17 specified identifiers of the individual or of
relatives, employers, or household members of the individual, are removed
[2]. A closely related term personally identifiable information (PII) is
defined as any information about an individual maintained by an agency,
including (1) any information that can be used to distinguish or trace an
individuals identity, such as name, social security number, date and place
of birth, mothers maiden name, or biometric records; and (2) any other
information that is linked or linkable to an individual, such as medical,
educational, financial, and employment information [4].
2. U.S. Department of Health and Human Services (2002) Public Welfare;
Administrative Data Standards and Related Requirements; Security and
Privacy; Privacy of Individually Identifiable Health Information; Other
Requirements Relating to Uses and Disclosures of Protected Health
Information. 45 CFR § 164.514. Available:
<http://edocket.access.gpo.gov/cfr_2002/octqtr/pdf/45cfr164.514.pdf>
http://edocket.access.gpo.gov/cfr_2002/octqtr/pdf/45cfr164.514.pdf. Accessed
11 October 2011.
3. U.S. Department of Health and Human Services (2006) Public Welfare;
Administrative Data Standards and Related Requirements; General
Administrative Requirements; General Provisions; Definitions. 45 CFR §
160.103. Available:
<http://edocket.access.gpo.gov/cfr_2007/octqtr/pdf/45cfr160.103.pdf>
http://edocket.access.gpo.gov/cfr_2007/octqtr/pdf/45cfr160.103.pdf. Accessed
11 October 2011.
4. McCallister E, Grance T, Scarfone K (2010) Guide to Protecting the
Confidentialiyt of Personally Identifiable Information (PII).
Recommendations of the National Institute of Standards and Technology. NIST,
Commerce USDo: Special Publication 800-122. Available:
<http://csrc.nist.gov/publications/nistpubs/800-122/sp800-122.pdf>
http://csrc.nist.gov/publications/nistpubs/800-122/sp800-122.pdf. Accessed
12 October 2011.
There is also a fairly thorough recent review of the existing narrative text
de-identification systems, which is freely available and may also be of
interest.
5. Meystre S, Friedlin F, South B, Shen S, Samore M (2010) Automatic
de-identification of textual documents in the electronic health record: a
review of recent research. BMC Medical Research Methodology 10: 70.
Best,
--mehmet kayaalp
Lister Hill National Center for Biomedical Communications
Building 38A
National Institutes of Health
8600 Rockville Pike
Bethesda, MD 20894-3828
On 2 November 2011 10:11, Eric Atwell <csc6ea at leeds.ac.uk> wrote:
> We are exploring a corpus of Verbal Autopsies: semi-formal interviews
> about deaths, mainly mothers describing how their baby died.
> Before this corpus can be used more widely, we need to anonymize
> or de-identify all references to people.
> Can CORPORA experts please direct me to Guidelines or Protocols for
> anonymization or de-personalisation of texts?
> (eg to the standard exemplified in the BNC) And recommend software to
> automate this process?
>
> thnaks in advance for help
>
>
> Eric Atwell, Senior Lecturer, Language research group,
> I-AIBS Institute for Artificial Intelligence and Biological Systems
> School of Computing, Faculty of Engineering, UNIVERSITY OF LEEDS
> Leeds LS2 9JT, England. TEL: 0113-3435430 FAX: 0113-3435468
> WWW: http://www.comp.leeds.ac.uk/**eric
<http://www.comp.leeds.ac.uk/**eric%3chttp:/www.comp.leeds.ac.uk/eric>
<http://www.comp.leeds.ac.uk/eric>
> http://www.comp.leeds.ac.uk/**nlp <http://www.comp.leeds.ac.uk/nlp>
>
> ______________________________**_________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20111104/f6971b37/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list