<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=us-ascii"><meta name=Generator content="Microsoft Word 14 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-fareast-language:EN-US;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri","sans-serif";
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;
font-family:"Calibri","sans-serif";
mso-fareast-language:EN-US;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:70.85pt 70.85pt 70.85pt 70.85pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=SL link=blue vlink=purple><div class=WordSection1><p class=MsoNormal>Dear all,<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>I would like to ask for your help with a legal problem that cropped up in Slovenia during this summer and might be interesting also for others. <o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>In July, Slovenian Information Commissoner (<a href="https://www.ip-rs.si/?id=195">https://www.ip-rs.si/?id=195</a>) issued a decision on the "Nova beseda" corpus (<a href="http://bos.zrc-sazu.si/a_beseda.html">http://bos.zrc-sazu.si/a_beseda.html</a>) which contains 318 million words from newpapers, magazines, books etc and is available in a web concordancer, accessible without authentication. The decision contains the obligation that all personal names in the corpus should be either anonymised or excluded from the results in the online concordancer because of the protection of personal data (mainly in newspaper articles). After some negotiation it is now possible to search for a name but not for a combination of names (and/or surnames). The list of prohibited combinations is based on the first name and family name database of the Statistical Office of the Republic of Slovenia. For instance, if you search for a combination of my name and surname, you get the following result: <o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal><a href="http://bos.zrc-sazu.si/c/ada.exe?hits_shown=100&clm=22&crm=22&expression=simon+krek&clm=22&crm=22&wth=0&hits_shown=100&sel=%28all%29&name=a">http://bos.zrc-sazu.si/c/ada.exe?hits_shown=100&clm=22&crm=22&expression=simon+krek&clm=22&crm=22&wth=0&hits_shown=100&sel=%28all%29&name=a</a>. <o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>In our corpora community, we view this solution as unacceptable as it severely limits the use of corpora on the web and, on the other hand, brings no additional protection of privacy, as the same information is available through search engines which are outside the jurisdiction of the Slovenian Information Commissioner.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>My question is if anybody involved in corpus creation encountered or considered this kind of problem before us? I am interested in any experience that involves **protecting personal privacy in corpus material already published before** which is simultaneously accessible in (digital) libraries and most of it also elsewhere on the web in archives of particular newspapers etc. Perhaps it should be emphasized that this is NOT in any way a question of copyright or the status of web crawled data in WaCs, it concerns only the laws on protection of personal data.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>I Google-translated the decision and put it on my page: <a href="http://www.simonkrek.si/blog/decision/index.html">http://www.simonkrek.si/blog/decision/index.html</a> (the original is linked on the same page).<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>The main ideas in the decision are the following:<o:p></o:p></p><p class=MsoNormal>- although all the material in the corpus had already been published before and can be found in libraries and in archives of particular newspapers/magazines, the corpus represents a NEW STRUCTURED collection which contains personal data, and as such it cannot be compared with the original publication in newspaper/magazine, which had a different PURPOSE <o:p></o:p></p><p class=MsoNormal>- a very important issue in this decision is "EASE OF ACCESS" as it takes only a few seconds to find personal data in the corpus whereas more effort is needed to access or collect the same data in newspapers articles in libraries or other places.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>I would be very grateful for hints about any comparable legal considerations or decisions elsewhere, particularly in EU countries. <o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Best regards,<o:p></o:p></p><p class=MsoNormal>Simon Krek<o:p></o:p></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New";mso-fareast-language:SL'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New";mso-fareast-language:SL'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New";mso-fareast-language:SL'>-----------------------</span><span style='font-size:10.0pt;font-family:"Courier New";mso-fareast-language:SL'><br></span><span style='font-size:10.0pt;font-family:"Courier New";mso-fareast-language:SL'>Amebis, d.o.o., Kamnik</span><span style='font-size:10.0pt;font-family:"Courier New";mso-fareast-language:SL'><br></span><span style='font-size:10.0pt;font-family:"Courier New";mso-fareast-language:SL'>Bakovnik 3</span><span style='font-size:10.0pt;font-family:"Courier New";mso-fareast-language:SL'><br></span><span style='font-size:10.0pt;font-family:"Courier New";mso-fareast-language:SL'>SI-1241 Kamnik</span><span style='font-size:10.0pt;font-family:"Courier New";mso-fareast-language:SL'><br></span><span style='font-size:10.0pt;font-family:"Courier New";mso-fareast-language:SL'>Slovenia</span><span style='font-size:10.0pt;font-family:"Courier New";mso-fareast-language:SL'><br><br></span><span style='font-size:10.0pt;font-family:"Courier New";mso-fareast-language:SL'>Jozef Stefan Institute</span><span style='font-size:10.0pt;font-family:"Courier New";mso-fareast-language:SL'><br></span><span style='font-size:10.0pt;font-family:"Courier New";mso-fareast-language:SL'>Artificial Intelligence Laboratory</span><span style='font-size:10.0pt;font-family:"Courier New";mso-fareast-language:SL'><br></span><span style='font-size:10.0pt;font-family:"Courier New";mso-fareast-language:SL'>Jamova 39</span><span style='font-size:10.0pt;font-family:"Courier New";mso-fareast-language:SL'><br></span><span style='font-size:10.0pt;font-family:"Courier New";mso-fareast-language:SL'>SI-1000 Ljubljana</span><span style='font-size:10.0pt;font-family:"Courier New";mso-fareast-language:SL'><br></span><span style='font-size:10.0pt;font-family:"Courier New";mso-fareast-language:SL'>Slovenia</span><span style='font-size:10.0pt;font-family:"Courier New";mso-fareast-language:SL'><br><br></span><span style='font-size:10.0pt;font-family:"Courier New";mso-fareast-language:SL'>skype: simon.krek.jsi<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New";mso-fareast-language:SL'>twitter: @SimonKrek</span><span style='font-size:10.0pt;font-family:"Courier New";mso-fareast-language:SL'><br></span><span style='font-size:10.0pt;font-family:"Courier New";mso-fareast-language:SL'>-----------------------</span><span style='font-size:10.0pt;font-family:"Courier New";mso-fareast-language:SL'><br></span><a href="http://www.simonkrek.si/"><span style='font-size:10.0pt;font-family:"Courier New";color:blue;mso-fareast-language:SL'>http://www.simonkrek.si/</span></a><span style='font-size:10.0pt;font-family:"Courier New";mso-fareast-language:SL'><br></span><a href="http://www.slovenscina.eu/"><span style='font-size:10.0pt;font-family:"Courier New";color:blue;mso-fareast-language:SL'>http://www.slovenscina.eu/</span></a><span style='mso-fareast-language:SL'><o:p></o:p></span></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal><o:p> </o:p></p></div></body></html>