[Corpora-List] Building a corpus from Twitter & Tw's privacy concerns

Leon Derczynski leon at dcs.shef.ac.uk
Thu Jul 18 09:05:54 UTC 2013


On 18 July 2013 10:33, Miguel Almeida <miguelbalmeida at gmail.com> wrote:

> Adam, Miles,
>
> I think another reason is so that Twitter can "black out" everyone else at
> any time in the future. It's a great (and very selfish and narrow-minded)
> idea: let the research community publish papers with your data, showing you
> how to find interesting stuff in your data (using taxpayer money!), and
> then if at some point you want to black them out, use the kill switch.
>
> I don't think Twitter's owners care that much about reproducible research.
> ;)
>

Mind you, they do seem to be quite lackadaisical when it comes to enforcing
their policy - the only two instances of this that I've heard of came after
large corpora (millions of documents) were distributed conspicuously for a
number of years, and the enforcements didn't involve court fees, suing for
damages or anything like that; in fact, the rumour was that they were a
fairly low-key affairs. I'm sure list members can tell us if that was not
the case.

Leon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130718/755e1cad/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list