<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 12 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal;
font-family:"Calibri","sans-serif";
color:windowtext;}
span.EmailStyle18
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:70.85pt 70.85pt 70.85pt 70.85pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-GB" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span style="color:#1F497D">Dear Simon,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">I have not encountered that approach to privacy before and find it somewhat perverse as the privacy is clearly breached already.
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">The only analogies I can think of relate to non-corpus cases, notably the discussion in the UK over the summer of whether a picture of a member of royalty (naked) should be printed in the UK Press. It was noted
at the time that the pictures were available elsewhere on the web, but this did not stop a discussion regarding whether a legislative bubble should isolate the .uk domain, so to speak. That case did not lead to a legal ruling, but there are similar examples
which did, I guess – where something is not legal in country A, but is in country B, so cannot be viewed legally on the web in country A, but is visible from country A on sites in country B.
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">Copyright might be an area where legislative bubbles could give rise to an issue directly similar to that which you have encountered – there are different jurisdictions in operation which permit different behaviours.
Might be worth looking at more closely. <o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">Prior to the internet cases like yours were more common – ‘Spycatcher’ was a book was banned in the UK which was available freely elsewhere (or at least in Australia from memory). That led to legislative fun
and games. So – legislative bubbles like this were known in the pre-internet age and do crop up in the internet age also. If you are in the bubble I guess there is little you can do but comply. I daresay there may be exciting and imaginative ways of trying
to sidestep the bubble, but I would take very careful advice before you tried any of those ideas, if I were you. Sorry to be unhelpful (and at some length!). Best wishes,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">Tony<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span lang="EN-US" style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span lang="EN-US" style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> corpora-bounces@uib.no [mailto:corpora-bounces@uib.no]
<b>On Behalf Of </b>Simon Krek<br>
<b>Sent:</b> 03 October 2012 20:47<br>
<b>To:</b> Corpora@uib.no<br>
<b>Subject:</b> [Corpora-List] Legal issue - privacy protection<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><span lang="SL">Dear all,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="SL"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="SL">I would like to ask for your help with a legal problem that cropped up in Slovenia during this summer and might be interesting also for others.
<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="SL"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="SL">In July, Slovenian Information Commissoner (<a href="https://www.ip-rs.si/?id=195">https://www.ip-rs.si/?id=195</a>) issued a decision on the "Nova beseda" corpus (<a href="http://bos.zrc-sazu.si/a_beseda.html">http://bos.zrc-sazu.si/a_beseda.html</a>)
which contains 318 million words from newpapers, magazines, books etc and is available in a web concordancer, accessible without authentication. The decision contains the obligation that all personal names in the corpus should be either anonymised or excluded
from the results in the online concordancer because of the protection of personal data (mainly in newspaper articles). After some negotiation it is now possible to search for a name but not for a combination of names (and/or surnames). The list of prohibited
combinations is based on the first name and family name database of the Statistical Office of the Republic of Slovenia. For instance, if you search for a combination of my name and surname, you get the following result:
<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="SL"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="SL"><a href="http://bos.zrc-sazu.si/c/ada.exe?hits_shown=100&clm=22&crm=22&expression=simon+krek&clm=22&crm=22&wth=0&hits_shown=100&sel=%28all%29&name=a">http://bos.zrc-sazu.si/c/ada.exe?hits_shown=100&clm=22&crm=22&expression=simon+krek&clm=22&crm=22&wth=0&hits_shown=100&sel=%28all%29&name=a</a>.
<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="SL"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="SL">In our corpora community, we view this solution as unacceptable as it severely limits the use of corpora on the web and, on the other hand, brings no additional protection of privacy, as the same information is available
through search engines which are outside the jurisdiction of the Slovenian Information Commissioner.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="SL"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="SL">My question is if anybody involved in corpus creation encountered or considered this kind of problem before us? I am interested in any experience that involves **protecting personal privacy in corpus material already published
before** which is simultaneously accessible in (digital) libraries and most of it also elsewhere on the web in archives of particular newspapers etc. Perhaps it should be emphasized that this is NOT in any way a question of copyright or the status of web crawled
data in WaCs, it concerns only the laws on protection of personal data.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="SL"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="SL">I Google-translated the decision and put it on my page:
<a href="http://www.simonkrek.si/blog/decision/index.html">http://www.simonkrek.si/blog/decision/index.html</a> (the original is linked on the same page).<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="SL"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="SL">The main ideas in the decision are the following:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="SL">- although all the material in the corpus had already been published before and can be found in libraries and in archives of particular newspapers/magazines, the corpus represents a NEW STRUCTURED collection which contains
personal data, and as such it cannot be compared with the original publication in newspaper/magazine, which had a different PURPOSE
<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="SL">- a very important issue in this decision is "EASE OF ACCESS" as it takes only a few seconds to find personal data in the corpus whereas more effort is needed to access or collect the same data in newspapers articles in
libraries or other places.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="SL"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="SL">I would be very grateful for hints about any comparable legal considerations or decisions elsewhere, particularly in EU countries.
<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="SL"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="SL">Best regards,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="SL">Simon Krek<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="SL" style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="SL" style="font-size:10.0pt;font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="SL" style="font-size:10.0pt;font-family:"Courier New"">-----------------------<br>
Amebis, d.o.o., Kamnik<br>
Bakovnik 3<br>
SI-1241 Kamnik<br>
Slovenia<br>
<br>
Jozef Stefan Institute<br>
Artificial Intelligence Laboratory<br>
Jamova 39<br>
SI-1000 Ljubljana<br>
Slovenia<br>
<br>
skype: simon.krek.jsi<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="SL" style="font-size:10.0pt;font-family:"Courier New"">twitter: @SimonKrek<br>
-----------------------<br>
</span><span lang="SL"><a href="http://www.simonkrek.si/"><span style="font-size:10.0pt;font-family:"Courier New"">http://www.simonkrek.si/</span></a></span><span lang="SL" style="font-size:10.0pt;font-family:"Courier New""><br>
</span><span lang="SL"><a href="http://www.slovenscina.eu/"><span style="font-size:10.0pt;font-family:"Courier New"">http://www.slovenscina.eu/</span></a></span><span lang="SL"><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="SL"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="SL"><o:p> </o:p></span></p>
</div>
</body>
</html>