<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=iso-8859-1"><meta name=Generator content="Microsoft Word 14 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
@font-face
{font-family:Verdana;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p.MsoPlainText, li.MsoPlainText, div.MsoPlainText
{mso-style-priority:99;
mso-style-link:"Plain Text Char";
margin:0in;
margin-bottom:.0001pt;
font-size:10.5pt;
font-family:Consolas;}
p.MsoAcetate, li.MsoAcetate, div.MsoAcetate
{mso-style-priority:99;
mso-style-link:"Balloon Text Char";
margin:0in;
margin-bottom:.0001pt;
font-size:8.0pt;
font-family:"Tahoma","sans-serif";}
span.PlainTextChar
{mso-style-name:"Plain Text Char";
mso-style-priority:99;
mso-style-link:"Plain Text";
font-family:Consolas;}
span.BalloonTextChar
{mso-style-name:"Balloon Text Char";
mso-style-priority:99;
mso-style-link:"Balloon Text";
font-family:"Tahoma","sans-serif";}
p.fqsintrotitle, li.fqsintrotitle, div.fqsintrotitle
{mso-style-name:fqsintrotitle;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
p.fqsintroauthors, li.fqsintroauthors, div.fqsintroauthors
{mso-style-name:fqsintroauthors;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
span.EmailStyle23
{mso-style-type:personal;
font-family:"Calibri","sans-serif";
color:windowtext;}
span.fqsauthorfullname
{mso-style-name:fqsauthorfullname;}
span.t2
{mso-style-name:t2;}
span.fqsvolumeno
{mso-style-name:fqsvolumeno;}
span.fqsissueno
{mso-style-name:fqsissueno;}
span.fqsarticleno
{mso-style-name:fqsarticleno;}
span.EmailStyle29
{mso-style-type:personal;
font-family:"Calibri","sans-serif";
color:#1F497D;}
span.EmailStyle30
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=EN-US link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><span style='color:#1F497D'>Anonymization of narrative text (unless trivial) is generally not attainable due to the nature of natural languages. We soon publish our work on the clinical text de-identification (CTD) system, which we have been developing at the US National Library of Medicine, NIH. Here is an excerpt from that work and the associated references, which may be of interest.<o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal>The code of federal regulation defines the protected health information (PHI) as individually identifiable health information, where “individual means the person who is the subject of protected health information” [<a href="#_ENREF_3" title=", 2006 #866"><span style='color:windowtext;text-decoration:none'>3</span></a>]. The regulation states that “a covered entity may determine that health information is not individually identifiable health information” if names along with the other 17 specified “identifiers of the individual or of relatives, employers, or household members of the individual, are removed” [<a href="#_ENREF_2" title=", 2002 #865"><span style='color:windowtext;text-decoration:none'>2</span></a>]. A closely related term personally identifiable information (PII) is defined as “any information about an individual maintained by an agency, including (1) any information that can be used to distinguish or trace an individual’s identity, such as name, social security number, date and place of birth, mother‘s maiden name, or biometric records; and (2) any other information that is linked or linkable to an individual, such as medical, educational, financial, and employment information” [<a href="#_ENREF_4" title="McCallister, 2010 #907"><span style='color:windowtext;text-decoration:none'>4</span></a>].<o:p></o:p></p><p class=MsoNormal><span style='color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal style='margin-left:14.0pt;text-indent:-14.0pt'><a name="_ENREF_2"><span style='font-size:9.0pt'>2. U.S. Department of Health and Human Services (2002) Public Welfare; Administrative Data Standards and Related Requirements; Security and Privacy; Privacy of Individually Identifiable Health Information; Other Requirements Relating to Uses and Disclosures of Protected Health Information. 45 CFR § 164.514. Available: </span></a><a href="http://edocket.access.gpo.gov/cfr_2002/octqtr/pdf/45cfr164.514.pdf"><span style='font-size:9.0pt'>http://edocket.access.gpo.gov/cfr_2002/octqtr/pdf/45cfr164.514.pdf</span></a><span style='font-size:9.0pt'>. Accessed 11 October 2011.<o:p></o:p></span></p><p class=MsoNormal style='margin-left:14.0pt;text-indent:-14.0pt'><a name="_ENREF_3"><span style='font-size:9.0pt'>3. U.S. Department of Health and Human Services (2006) Public Welfare; Administrative Data Standards and Related Requirements; General Administrative Requirements; General Provisions; Definitions. 45 CFR § 160.103. Available: </span></a><a href="http://edocket.access.gpo.gov/cfr_2007/octqtr/pdf/45cfr160.103.pdf"><span style='font-size:9.0pt'>http://edocket.access.gpo.gov/cfr_2007/octqtr/pdf/45cfr160.103.pdf</span></a><span style='font-size:9.0pt'>. Accessed 11 October 2011.<o:p></o:p></span></p><p class=MsoNormal style='margin-left:14.0pt;text-indent:-14.0pt'><a name="_ENREF_4"><span style='font-size:9.0pt'>4. McCallister E, Grance T, Scarfone K (2010) Guide to Protecting the Confidentialiyt of Personally Identifiable Information (PII). Recommendations of the National Institute of Standards and Technology. NIST, Commerce USDo: Special Publication 800-122. Available: </span></a><a href="http://csrc.nist.gov/publications/nistpubs/800-122/sp800-122.pdf"><span style='font-size:9.0pt'>http://csrc.nist.gov/publications/nistpubs/800-122/sp800-122.pdf</span></a><span style='font-size:9.0pt'>. Accessed 12 October 2011.<o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='color:#1F497D'>There is also a fairly thorough recent review of the existing narrative text de-identification systems, which is freely available and may also be of interest.<o:p></o:p></span></p><p class=MsoNormal style='margin-left:14.0pt;text-indent:-14.0pt'><a name="_ENREF_5"><span style='font-size:9.0pt'>5. Meystre S, Friedlin F, South B, Shen S, Samore M (2010) Automatic de-identification of textual documents in the electronic health record: a review of recent research. BMC Medical Research Methodology 10: 70.</span></a><span style='font-size:9.0pt'><o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='color:#1F497D'>Best,<o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='color:#1F497D'>--mehmet kayaalp<o:p></o:p></span></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:8.0pt;font-family:"Verdana","sans-serif";color:navy'>Lister Hill National Center for Biomedical Communications<br>Building 38A<br>National Institutes of Health<br>8600 Rockville Pike<br>Bethesda, MD 20894-3828</span><span style='color:#1F497D'><o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-GB><o:p> </o:p></span></p><p class=MsoPlainText><span lang=EN-GB>On 2 November 2011 10:11, Eric Atwell <<a href="mailto:csc6ea@leeds.ac.uk">csc6ea@leeds.ac.uk</a>> wrote:<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-GB><o:p> </o:p></span></p><p class=MsoPlainText><span lang=EN-GB>> We are exploring a corpus of Verbal Autopsies: semi-formal interviews<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-GB>> about deaths, mainly mothers describing how their baby died.<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-GB>> Before this corpus can be used more widely, we need to anonymize<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-GB>> or de-identify all references to people.<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-GB>> Can CORPORA experts please direct me to Guidelines or Protocols for<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-GB>> anonymization or de-personalisation of texts?<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-GB>> (eg to the standard exemplified in the BNC) And recommend software to<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-GB>> automate this process?<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-GB>><o:p> </o:p></span></p><p class=MsoPlainText><span lang=EN-GB>> thnaks in advance for help<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-GB>><o:p> </o:p></span></p><p class=MsoPlainText><span lang=EN-GB>><o:p> </o:p></span></p><p class=MsoPlainText><span lang=EN-GB>> Eric Atwell, Senior Lecturer, Language research group,<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-GB>> I-AIBS Institute for Artificial Intelligence and Biological Systems<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-GB>> School of Computing, Faculty of Engineering, UNIVERSITY OF LEEDS<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-GB>> Leeds LS2 9JT, England. TEL: 0113-3435430 FAX: 0113-3435468<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-GB>> WWW: <a href="http://www.comp.leeds.ac.uk/**eric%3chttp:/www.comp.leeds.ac.uk/eric">http://www.comp.leeds.ac.uk/**eric<http://www.comp.leeds.ac.uk/eric</a>><o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-GB>> <a href="http://www.comp.leeds.ac.uk/**nlp">http://www.comp.leeds.ac.uk/**nlp</a> <<a href="http://www.comp.leeds.ac.uk/nlp">http://www.comp.leeds.ac.uk/nlp</a>><o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-GB>><o:p> </o:p></span></p><p class=MsoPlainText><span lang=EN-GB>> ______________________________**_________________<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-GB><o:p> </o:p></span></p></div></body></html>