[Corpora-List] : Anonymization or de-identification guidelines or software

Mehmet Kayaalp Mehmet.Kayaalp at nih.gov
Fri Nov 4 16:08:29 UTC 2011


Anonymization of narrative text (unless trivial) is generally not attainable
due to the nature of natural languages. We soon publish our work on the
clinical text de-identification (CTD) system, which we have been developing
at the US National Library of Medicine, NIH. Here is an excerpt from that
work and the associated references, which may be of interest.

 

The code of federal regulation defines the protected health information
(PHI) as individually identifiable health information, where “individual
means the person who is the subject of protected health information” [3].
The regulation states that “a covered entity may determine that health
information is not individually identifiable health information” if names
along with the other 17 specified “identifiers of the individual or of
relatives, employers, or household members of the individual, are removed”
[2]. A closely related term personally identifiable information (PII) is
defined as “any information about an individual maintained by an agency,
including (1) any information that can be used to distinguish or trace an
individual’s identity, such as name, social security number, date and place
of birth, mother‘s maiden name, or biometric records; and (2) any other
information that is linked or linkable to an individual, such as medical,
educational, financial, and employment information” [4].

 

2. U.S. Department of Health and Human Services (2002) Public Welfare;
Administrative Data Standards and Related Requirements; Security and
Privacy; Privacy of Individually Identifiable Health Information; Other
Requirements Relating to Uses and Disclosures of Protected Health
Information. 45 CFR § 164.514. Available:
<http://edocket.access.gpo.gov/cfr_2002/octqtr/pdf/45cfr164.514.pdf>
http://edocket.access.gpo.gov/cfr_2002/octqtr/pdf/45cfr164.514.pdf. Accessed
11 October 2011.

3. U.S. Department of Health and Human Services (2006) Public Welfare;
Administrative Data Standards and Related Requirements; General
Administrative Requirements; General Provisions; Definitions. 45 CFR §
160.103. Available:
<http://edocket.access.gpo.gov/cfr_2007/octqtr/pdf/45cfr160.103.pdf>
http://edocket.access.gpo.gov/cfr_2007/octqtr/pdf/45cfr160.103.pdf. Accessed
11 October 2011.

4. McCallister E, Grance T, Scarfone K (2010) Guide to Protecting the
Confidentialiyt of Personally Identifiable Information (PII).
Recommendations of the National Institute of Standards and Technology. NIST,
Commerce USDo: Special Publication 800-122. Available:
<http://csrc.nist.gov/publications/nistpubs/800-122/sp800-122.pdf>
http://csrc.nist.gov/publications/nistpubs/800-122/sp800-122.pdf. Accessed
12 October 2011.

 

There is also a fairly thorough recent review of the existing narrative text
de-identification systems, which is freely available and may also be of
interest.

5. Meystre S, Friedlin F, South B, Shen S, Samore M (2010) Automatic
de-identification of textual documents in the electronic health record: a
review of recent research. BMC Medical Research Methodology 10: 70.

 

Best,

 

--mehmet kayaalp

Lister Hill National Center for Biomedical Communications
Building 38A
National Institutes of Health
8600 Rockville Pike
Bethesda, MD 20894-3828

 

On 2 November 2011 10:11, Eric Atwell <csc6ea at leeds.ac.uk> wrote:

 

> We are exploring a corpus of Verbal Autopsies: semi-formal interviews

> about deaths, mainly mothers describing how their baby died.

> Before this corpus can be used more widely, we need to anonymize

> or de-identify all references to people.

> Can CORPORA experts please direct me to Guidelines or Protocols for

> anonymization or de-personalisation of texts?

> (eg to the standard exemplified in the BNC) And recommend software to

> automate this process?

> 

> thnaks in advance for help

> 

> 

> Eric Atwell, Senior Lecturer, Language research group,

>  I-AIBS Institute for Artificial Intelligence and Biological Systems

>  School of Computing, Faculty of Engineering, UNIVERSITY OF LEEDS

>  Leeds LS2 9JT, England.        TEL: 0113-3435430  FAX: 0113-3435468

>  WWW: http://www.comp.leeds.ac.uk/**eric
<http://www.comp.leeds.ac.uk/**eric%3chttp:/www.comp.leeds.ac.uk/eric>
<http://www.comp.leeds.ac.uk/eric>

>      http://www.comp.leeds.ac.uk/**nlp <http://www.comp.leeds.ac.uk/nlp>

> 

> ______________________________**_________________

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20111104/f6971b37/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list