[Corpora-List] British corpus containing instances of profanity?

Tue Feb 25 21:33:35 UTC 2014

Dear Michaël,

You may want to look at and try to emulate the approach of Mike
Thelwall's "Fk yea I swear", summarized with links at
http://academiclogbook.blogspot.com/2011/09/mikethelwallfk-yea-i-swear2008.html

He looked at UK MySpace (raise your hand if you still remember MySpace)
profiles and made use of the demographic data there.

Regards,
Bill Fletcher

On Tue, Feb 25, 2014 at 9:21 AM, Michaël GAUTHIER
<mic.gauthier at hotmail.fr>wrote:

>  Dear all,
>
> I am contacting the whole CORPORA list to try to get information on a
> corpus which could suit my needs, because up to now, all my efforts to find
> corresponding ones have been in vain.
>
> I am a PhD student investigating the use and perception of profanity among
> British speakers. Immediately, one difficulty which comes up is that
> instances of profanity are not easy to record, but there are other factors
> I need to take into consideration, thus my requirements imply that the
> corpus would have to:
>
> - Be very recent (after 2000), since the phenomenon on which I focus is a
> relatively new one
> - Focus on the U.K.
> - Be composed of naturally occurring conversations to be able to grasp
> instances of profanity
> - Provide at least basic information on the informants (such as age,
> gender, location, socio-economic situation, ethnic origin...)
> - Provide contextual information regarding the conversation and the
> link(s) between speakers
>
> I know this is a lot to ask, but these requirements are the ones I have in
> the most ideal situation. As I said, all the corpora I have been reviewing
> up to now do not correspond. A short list of the main corpora I have
> reviewed would be: the BNC, Bank of English, Collins Corpus (this one seems
> great, with 5 billion words, but it is apparently only available to the
> lexicographers from Collins, I contacted them but got no answer...), COLT,
> CANCODE, Longman British Spoken Corpus, Limerick Corpus, Scottish Corpus of
> texts and speech, IViE, London-Lund Corpus of Spoken English, Cambridge
> English Corpus (same thing as the Collins Corpus...), International Corpus
> of English, Diachronic Corpus of Present-day Spoken English, British
> English Speech Dat.
>
> This is it for the main ones, but as I said, no one corresponded
> perfectly. Thus, I would be more than happy if someone could point at a
> corpus I would have missed, even if it does not perfectly correspond. At
> this point, any new hint would be very welcome. If nothing comes up, I
> think I will have to "sacrifice" some of my requirements to be able to
> carry out this study, which by the way is a pilot study, so it would not be
> that tragic a situation, but if I have the opportunity to find something
> which perfectly corresponds this is even better!
>
> Sorry for the length of this email, I just tried to be as clear as
> possible... I hope I was...
>
> Thank you in advance for any idea/hint/plan/solution/revelation any one of
> you may have!
>
> Best regards
>
> Michaël GAUTHIER
> Université Lumière Lyon 2
> France
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140225/a4023b4d/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora