[Corpora-List] Building a corpus from Twitter & Tw's privacy concerns

Adam Kilgarriff adam at lexmasterclass.com
Thu Jul 18 08:26:04 UTC 2013


Miles,

> acts as a barrier to research.  Additionally one could argue that
preventing people from having access to static Tweet corpora
> undermines doing reproducible research.

You can argue all you like but it's a bit irrelevant -  the data privacy
battleground is the whole wide world, with hi-tech companies, politicians
and the media playing for big prizes, and they really won't care one jot
what us worker ants think (or if they trample us)

adam

On 18 July 2013 08:55, Miles Osborne <miles at inf.ed.ac.uk> wrote:

> Basically Twitter's insistence on distributing IDs and not raw Tweets
> stems from the fact that third parties need to honour deletion requests.
>
> If you pass around raw Tweets then there is no way for Twitter to argue
> that a deleted Tweet is deleted. If instead you force people to recrawl
> them each time then Tweets can be deleted at source and all subsequent
> access requests will not return that deleted Tweet.
>
> Personally I think this way of distributing Tweets in bulk is not scalable
> and acts as a barrier to research.  Additionally one could argue that
> preventing people from having access to static Tweet corpora undermines
> doing reproducible research.
>
> Miles
>
> --
> The University of Edinburgh is a charitable body, registered in Scotland,
> with registration number SC005336.
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>


-- 
========================================
Adam Kilgarriff <http://www.kilgarriff.co.uk/>
adam at lexmasterclass.com
Director                                    Lexical Computing
Ltd<http://www.sketchengine.co.uk/>

Visiting Research Fellow                 University of
Leeds<http://leeds.ac.uk>

*Corpora for all* with the Sketch Engine <http://www.sketchengine.co.uk>

                        *DANTE: a lexical database for
English<http://www.webdante.com>
                  *
========================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130718/e7f4ff51/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list