[Corpora-List] Corpus of threats?
Tristan Miller
miller at ukp.informatik.tu-darmstadt.de
Fri Nov 2 15:20:19 UTC 2012
Greetings.
On 01/11/12 05:59 PM, Tyler Schnoebelen wrote:
> I was looking over the records of searches that led to my corpus blog
> (http://corplinguistics.wordpress.com) and came across:
>
> “death threat corpus linguistics”
>
> This actually is a pretty interesting idea for a corpus. Does anyone
> know about such a corpus or something similar that would help
> researchers investigate the language of threatening/intimidation?
You might be able to construct one yourself semi-automatically using
Wikipedia. Editors sometimes post death threats against other editors
or against the organization which hosts the encyclopedia. Since this
contravenes Wikipedia's policies, other editors often remove these
threats, leaving clues in their edit summary such as "rv death threat".
If you obtain a Wikipedia database dump which includes the revision
history, and the appropriate API to process it (e.g., JWPL), you could
identify and extract these removal edits (including the exact text which
was removed).
Legal threats are also against Wikipedia policy but they're not usually
removed by other editors, so they're not as easy to identify
automatically. It's no problem identifying editors which have been
blocked or banned for issuing legal threats, since this information is
normally included in the block message posted on their user page, but
identifying which of their edits constituted the threat itself would be
problematic.
Regards,
Tristan
--
Tristan Miller, Doctoral Researcher
Ubiquitous Knowledge Processing Lab (UKP-TUDA)
Department of Computer Science, Technische Universität Darmstadt
Tel: +49 6151 16 6166 | Web: http://www.ukp.tu-darmstadt.de/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 259 bytes
Desc: OpenPGP digital signature
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20121102/874fb499/attachment-0001.sig>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list