[Corpora-List] Legal issue - privacy protection

Khalid CHOUKRI choukri at elda.org
Wed Oct 24 11:24:23 UTC 2012


Dear Simon

I went through your email and the translations you provided and I should 
confess that this is a first time (for over 15 years) that I see such 
case (may be in addition to a German case that I can discuss with you 
off line.

The Commissioner decision is unfair as it refers to Privacy Directive of 
the EU /("//Commissioner on this point makes it clear that the 
definition of the term "personal information" taken from the second 
Article of the Directive 95/46/EC and means that all data that is in any 
way related to the individual, the rules of protection of personal 
data....") . /This is unfair as the Privacy Directive regulate the 
collection and compilation of "personal data"; Here we are talking about 
Sloven texts to highlight the use of the sloven language and "real" 
texts that have been already published.

Do you know if a similar action has been taken against the 
Newspapers/blogs that store archived data from which you took your pieces?
what about the official journal of the Slovene republic (it should 
comprises a lot of names etc. all nominations of ministers, ambassadors, 
etc.) , they are "public" Personalities but the EU Directive applies to 
them as well. Can we find it on the web and search it via google or the 
like?

If you do not mind I would like to share this with our lawyers and see 
if someone can come up with a proposition,

I guess hosting the site outside Slovenia is an obvious option but 
according to many laws/regulation it is not the "geographic" location 
that matters but the "Editor/Publisher" , so this should be considered 
as the ultimate option.
/
/Best regards
Khalid Choukri
European Language Resource Association



Mcenery, Tony wrote, On 04/10/2012 10:49:
>
> Dear Simon,
>
> I have not encountered that approach to privacy before and find it 
> somewhat perverse as the privacy is clearly breached already.
>
> The only analogies I can think of relate to non-corpus cases, notably 
> the discussion in the UK over the summer of whether a picture of a 
> member of royalty (naked) should be printed in the UK Press. It was 
> noted at the time that the pictures were available elsewhere on the 
> web, but this did not stop a discussion regarding whether a 
> legislative bubble should isolate the .uk domain, so to speak. That 
> case did not lead to a legal ruling, but there are similar examples 
> which did, I guess – where something is not legal in country A, but is 
> in country B, so cannot be viewed legally on the web in country A, but 
> is visible from country A on sites in country B.
>
> Copyright might be an area where legislative bubbles could give rise 
> to an issue directly similar to that which you have encountered – 
> there are different jurisdictions in operation which permit different 
> behaviours. Might be worth looking at more closely.
>
> Prior to the internet cases like yours were more common – ‘Spycatcher’ 
> was a book was banned in the UK which was available freely elsewhere 
> (or at least in Australia from memory). That led to legislative fun 
> and games. So – legislative bubbles like this were known in the 
> pre-internet age and do crop up in the internet age also. If you are 
> in the bubble I guess there is little you can do but comply. I daresay 
> there may be exciting and imaginative ways of trying to sidestep the 
> bubble, but I would take very careful advice before you tried any of 
> those ideas, if I were you. Sorry to be unhelpful (and at some 
> length!). Best wishes,
>
> Tony
>
> *From:*corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] *On 
> Behalf Of *Simon Krek
> *Sent:* 03 October 2012 20:47
> *To:* Corpora at uib.no
> *Subject:* [Corpora-List] Legal issue - privacy protection
>
> Dear all,
>
> I would like to ask for your help with a legal problem that cropped up 
> in Slovenia during this summer and might be interesting also for others.
>
> In July, Slovenian Information Commissoner 
> (https://www.ip-rs.si/?id=195) issued a decision on the "Nova beseda" 
> corpus (http://bos.zrc-sazu.si/a_beseda.html) which contains 318 
> million words from newpapers, magazines, books etc and is available in 
> a web concordancer, accessible without authentication. The decision 
> contains the obligation that all personal names in the corpus should 
> be either anonymised or excluded from the results in the online 
> concordancer because of the protection of personal data (mainly in 
> newspaper articles). After some negotiation it is now possible to 
> search for a name but not for a combination of names (and/or 
> surnames). The list of prohibited combinations is based on the first 
> name and family name database of the Statistical Office of the 
> Republic of Slovenia. For instance, if you search for a combination of 
> my name and surname, you get the following result:
>
> http://bos.zrc-sazu.si/c/ada.exe?hits_shown=100&clm=22&crm=22&expression=simon+krek&clm=22&crm=22&wth=0&hits_shown=100&sel=%28all%29&name=a 
> <http://bos.zrc-sazu.si/c/ada.exe?hits_shown=100&clm=22&crm=22&expression=simon+krek&clm=22&crm=22&wth=0&hits_shown=100&sel=%28all%29&name=a>. 
>
>
> In our corpora community, we view this solution as unacceptable as it 
> severely limits the use of corpora on the web and, on the other hand, 
> brings no additional protection of privacy, as the same information is 
> available through search engines which are outside the jurisdiction of 
> the Slovenian Information Commissioner.
>
> My question is if anybody involved in corpus creation encountered or 
> considered this kind of problem before us? I am interested in any 
> experience that involves **protecting personal privacy in corpus 
> material already published before** which is simultaneously accessible 
> in (digital) libraries and most of it also elsewhere on the web in 
> archives of particular newspapers etc. Perhaps it should be emphasized 
> that this is NOT in any way a question of copyright or the status of 
> web crawled data in WaCs, it concerns only the laws on protection of 
> personal data.
>
> I Google-translated the decision and put it on my page: 
> http://www.simonkrek.si/blog/decision/index.html (the original is 
> linked on the same page).
>
> The main ideas in the decision are the following:
>
> - although all the material in the corpus had already been published 
> before and can be found in libraries and in archives of particular 
> newspapers/magazines, the corpus represents a NEW STRUCTURED 
> collection which contains personal data, and as such it cannot be 
> compared with the original publication in newspaper/magazine, which 
> had a different PURPOSE
>
> - a very important issue in this decision is "EASE OF ACCESS" as it 
> takes only a few seconds to find personal data in the corpus whereas 
> more effort is needed to access or collect the same data in newspapers 
> articles in libraries or other places.
>
> I would be very grateful for hints about any comparable legal 
> considerations or decisions elsewhere, particularly in EU countries.
>
> Best regards,
>
> Simon Krek
>
> -----------------------
> Amebis, d.o.o., Kamnik
> Bakovnik 3
> SI-1241 Kamnik
> Slovenia
>
> Jozef Stefan Institute
> Artificial Intelligence Laboratory
> Jamova 39
> SI-1000 Ljubljana
> Slovenia
>
> skype: simon.krek.jsi
>
> twitter: @SimonKrek
> -----------------------
> http://www.simonkrek.si/
> http://www.slovenscina.eu/
>
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

-- 
*Khalid Choukri *
ELRA General secretary & ELDA CEO
email: choukri at elda.org;
Web: www.elra.info www.elda.org
Tel. +33 1 43 13 33 33 - Fax. +33 1 43 13 33 30

****************************************************
** Info on LREC 2012 : www.lrec-conf.org
***************************************************
*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20121024/967f51a6/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list