<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#663300">
<font face="Cambria">Dear Simon<br>
<br>
I went through your email and the translations you provided and I
should confess that this is a first time (for over 15 years) that
I see such case (may be in addition to a German case that I can
discuss with you off line.<br>
<br>
The Commissioner decision is unfair as it refers to Privacy
Directive of the EU <small><i>("</i></small></font><small><i>Commissioner
on this point makes it clear that the definition of the term
"personal information" taken from the second Article of the
Directive 95/46/EC and means that all data that is in any way
related to the individual, the rules of protection of personal
data....") . </i><big>This is unfair as the Privacy Directive
regulate the collection and compilation of "personal data"; Here
we are talking about Sloven texts to highlight the use of the
sloven language and "real" texts that have been already
published.<br>
<br>
Do you know if a similar action has been taken against the
Newspapers/blogs that store archived data from which you took
your pieces? <br>
what about the official journal of the Slovene republic (it
should comprises a lot of names etc. all nominations of
ministers, ambassadors, etc.) , they are "public" Personalities
but the EU Directive applies to them as well. Can we find it on
the web and search it via google or the like?<br>
<br>
If you do not mind I would like to share this with our lawyers
and see if someone can come up with a proposition, <br>
<br>
I guess hosting the site outside Slovenia is an obvious option
but according to many laws/regulation it is not the "geographic"
location that matters but the "Editor/Publisher" , so this
should be considered as the ultimate option.<br>
</big></small><font face="Cambria"><small><i><br>
</i></small>Best regards<br>
Khalid Choukri<br>
European Language Resource Association<br>
<br>
<br>
<br>
</font>Mcenery, Tony wrote, On 04/10/2012 10:49:
<blockquote
cite="mid:FDEEA3E0F6E956449B7526D166CC7C7D11E7F7@EX-1-MB2.lancs.local"
type="cite">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="Generator" content="Microsoft Word 12 (filtered
medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal;
font-family:"Calibri","sans-serif";
color:windowtext;}
span.EmailStyle18
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:70.85pt 70.85pt 70.85pt 70.85pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal"><span style="color:#1F497D">Dear Simon,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">I have not
encountered that approach to privacy before and find it
somewhat perverse as the privacy is clearly breached
already.
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">The only
analogies I can think of relate to non-corpus cases, notably
the discussion in the UK over the summer of whether a
picture of a member of royalty (naked) should be printed in
the UK Press. It was noted at the time that the pictures
were available elsewhere on the web, but this did not stop a
discussion regarding whether a legislative bubble should
isolate the .uk domain, so to speak. That case did not lead
to a legal ruling, but there are similar examples which did,
I guess – where something is not legal in country A, but is
in country B, so cannot be viewed legally on the web in
country A, but is visible from country A on sites in country
B.
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">Copyright might
be an area where legislative bubbles could give rise to an
issue directly similar to that which you have encountered –
there are different jurisdictions in operation which permit
different behaviours. Might be worth looking at more
closely. <o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">Prior to the
internet cases like yours were more common – ‘Spycatcher’
was a book was banned in the UK which was available freely
elsewhere (or at least in Australia from memory). That led
to legislative fun and games. So – legislative bubbles like
this were known in the pre-internet age and do crop up in
the internet age also. If you are in the bubble I guess
there is little you can do but comply. I daresay there may
be exciting and imaginative ways of trying to sidestep the
bubble, but I would take very careful advice before you
tried any of those ideas, if I were you. Sorry to be
unhelpful (and at some length!). Best wishes,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">Tony<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #B5C4DF
1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif""
lang="EN-US">From:</span></b><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif""
lang="EN-US"> <a class="moz-txt-link-abbreviated" href="mailto:corpora-bounces@uib.no">corpora-bounces@uib.no</a>
[<a class="moz-txt-link-freetext" href="mailto:corpora-bounces@uib.no">mailto:corpora-bounces@uib.no</a>]
<b>On Behalf Of </b>Simon Krek<br>
<b>Sent:</b> 03 October 2012 20:47<br>
<b>To:</b> <a class="moz-txt-link-abbreviated" href="mailto:Corpora@uib.no">Corpora@uib.no</a><br>
<b>Subject:</b> [Corpora-List] Legal issue - privacy
protection<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><span lang="SL">Dear all,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="SL"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="SL">I would like to ask for
your help with a legal problem that cropped up in Slovenia
during this summer and might be interesting also for others.
<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="SL"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="SL">In July, Slovenian
Information Commissoner (<a moz-do-not-send="true"
href="https://www.ip-rs.si/?id=195">https://www.ip-rs.si/?id=195</a>)
issued a decision on the "Nova beseda" corpus (<a
moz-do-not-send="true"
href="http://bos.zrc-sazu.si/a_beseda.html">http://bos.zrc-sazu.si/a_beseda.html</a>)
which contains 318 million words from newpapers, magazines,
books etc and is available in a web concordancer, accessible
without authentication. The decision contains the obligation
that all personal names in the corpus should be either
anonymised or excluded from the results in the online
concordancer because of the protection of personal data
(mainly in newspaper articles). After some negotiation it is
now possible to search for a name but not for a combination
of names (and/or surnames). The list of prohibited
combinations is based on the first name and family name
database of the Statistical Office of the Republic of
Slovenia. For instance, if you search for a combination of
my name and surname, you get the following result:
<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="SL"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="SL"><a moz-do-not-send="true"
href="http://bos.zrc-sazu.si/c/ada.exe?hits_shown=100&clm=22&crm=22&expression=simon+krek&clm=22&crm=22&wth=0&hits_shown=100&sel=%28all%29&name=a">http://bos.zrc-sazu.si/c/ada.exe?hits_shown=100&clm=22&crm=22&expression=simon+krek&clm=22&crm=22&wth=0&hits_shown=100&sel=%28all%29&name=a</a>.
<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="SL"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="SL">In our corpora community,
we view this solution as unacceptable as it severely limits
the use of corpora on the web and, on the other hand, brings
no additional protection of privacy, as the same information
is available through search engines which are outside the
jurisdiction of the Slovenian Information Commissioner.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="SL"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="SL">My question is if anybody
involved in corpus creation encountered or considered this
kind of problem before us? I am interested in any experience
that involves **protecting personal privacy in corpus
material already published before** which is simultaneously
accessible in (digital) libraries and most of it also
elsewhere on the web in archives of particular newspapers
etc. Perhaps it should be emphasized that this is NOT in any
way a question of copyright or the status of web crawled
data in WaCs, it concerns only the laws on protection of
personal data.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="SL"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="SL">I Google-translated the
decision and put it on my page:
<a moz-do-not-send="true"
href="http://www.simonkrek.si/blog/decision/index.html">http://www.simonkrek.si/blog/decision/index.html</a>
(the original is linked on the same page).<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="SL"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="SL">The main ideas in the
decision are the following:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="SL">- although all the material
in the corpus had already been published before and can be
found in libraries and in archives of particular
newspapers/magazines, the corpus represents a NEW STRUCTURED
collection which contains personal data, and as such it
cannot be compared with the original publication in
newspaper/magazine, which had a different PURPOSE
<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="SL">- a very important issue in
this decision is "EASE OF ACCESS" as it takes only a few
seconds to find personal data in the corpus whereas more
effort is needed to access or collect the same data in
newspapers articles in libraries or other places.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="SL"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="SL">I would be very grateful
for hints about any comparable legal considerations or
decisions elsewhere, particularly in EU countries.
<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="SL"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="SL">Best regards,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="SL">Simon Krek<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Courier New""
lang="SL"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Courier New""
lang="SL"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Courier New""
lang="SL">-----------------------<br>
Amebis, d.o.o., Kamnik<br>
Bakovnik 3<br>
SI-1241 Kamnik<br>
Slovenia<br>
<br>
Jozef Stefan Institute<br>
Artificial Intelligence Laboratory<br>
Jamova 39<br>
SI-1000 Ljubljana<br>
Slovenia<br>
<br>
skype: simon.krek.jsi<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Courier New""
lang="SL">twitter: @SimonKrek<br>
-----------------------<br>
</span><span lang="SL"><a moz-do-not-send="true"
href="http://www.simonkrek.si/"><span
style="font-size:10.0pt;font-family:"Courier
New"">http://www.simonkrek.si/</span></a></span><span
style="font-size:10.0pt;font-family:"Courier New""
lang="SL"><br>
</span><span lang="SL"><a moz-do-not-send="true"
href="http://www.slovenscina.eu/"><span
style="font-size:10.0pt;font-family:"Courier
New"">http://www.slovenscina.eu/</span></a></span><span
lang="SL"><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="SL"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="SL"><o:p> </o:p></span></p>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
UNSUBSCRIBE from this page: <a class="moz-txt-link-freetext" href="http://mailman.uib.no/options/corpora">http://mailman.uib.no/options/corpora</a>
Corpora mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Corpora@uib.no">Corpora@uib.no</a>
<a class="moz-txt-link-freetext" href="http://mailman.uib.no/listinfo/corpora">http://mailman.uib.no/listinfo/corpora</a>
</pre>
</blockquote>
<br>
<div class="moz-signature">-- <br>
<b> Khalid Choukri </b>
<br>
ELRA General secretary & ELDA CEO
<br>
email: <a class="moz-txt-link-abbreviated" href="mailto:choukri@elda.org">choukri@elda.org</a>; <br>
Web: <a class="moz-txt-link-abbreviated" href="http://www.elra.info">www.elra.info</a> <a class="moz-txt-link-abbreviated" href="http://www.elda.org">www.elda.org</a>
<br>
Tel. +33 1 43 13 33 33 - Fax. +33 1 43 13 33 30
<br>
<br>
<b> ***************************************************<br>
** Info on LREC 2012 : <a class="moz-txt-link-abbreviated" href="http://www.lrec-conf.org">www.lrec-conf.org</a> <br>
***************************************************<br>
</b></div>
</body>
</html>