[Corpora-List] British corpus containing instances of profanity?

Krishnamurthy, Ramesh r.krishnamurthy at aston.ac.uk
Tue Feb 25 20:43:03 UTC 2014


Hi Michaël

A quick way would be to use a program WebBootCat,
use your 'profanity' words as seed words, restrict your search
to sites in .uk domain, and to dates after 2000?

But this method would NOT get you the participant details you require:
- Provide at least basic information on the informants (such as age, gender, location, socio-economic situation, ethnic origin...)
- Provide contextual information regarding the conversation and the link(s) between speakers

SO... I think you will have to reduce your list of criterial parameters in order to get any corpus at all...

best
ramesh

--------------
Date: Tue, 25 Feb 2014 15:21:23 +0100
From: Michaël GAUTHIER <mic.gauthier at hotmail.fr>
Subject: [Corpora-List] British corpus containing instances of
        profanity?
To: "Corpora at uib.no" <corpora at uib.no>




Dear all,

I am contacting the whole CORPORA list to try to get information on a corpus which could suit my needs, because up to now, all my efforts to find corresponding ones have been in vain.

I am a PhD student investigating the use and perception of profanity among British speakers. Immediately, one difficulty which comes up is that instances of profanity are not easy to record, but there are other factors I need to take into consideration, thus my requirements imply that the corpus would have to:

- Be very recent (after 2000), since the phenomenon on which I focus is a relatively new one
- Focus on the U.K.
- Be composed of naturally occurring conversations to be able to grasp instances of profanity
- Provide at least basic information on the informants (such as age, gender, location, socio-economic situation, ethnic origin...)
- Provide contextual information regarding the conversation and the link(s) between speakers

I know this is a lot to ask, but these requirements are the ones I have in the most ideal situation. As I said, all the corpora I have been reviewing up to now do not correspond. A short list of the main corpora I have reviewed would be: the BNC, Bank of English, Collins Corpus (this one seems great, with 5 billion words, but it is apparently only available to the lexicographers from Collins, I contacted them but got no answer...), COLT, CANCODE, Longman British Spoken Corpus, Limerick Corpus, Scottish Corpus of texts and speech, IViE, London-Lund Corpus of Spoken English, Cambridge English Corpus (same thing as the Collins Corpus...), International Corpus of English, Diachronic Corpus of Present-day Spoken English, British English Speech Dat.

This is it for the main ones, but as I said, no one corresponded perfectly. Thus, I would be more than happy if someone could point at a corpus I would have missed, even if it does not perfectly correspond. At this point, any new hint would be very welcome. If nothing comes up, I think I will have to ?sacrifice? some of my requirements to be able to carry out this study, which by the way is a pilot study, so it would not be that tragic a situation, but if I have the opportunity to find something which perfectly corresponds this is even better!

Sorry for the length of this email, I just tried to be as clear as possible... I hope I was...

Thank you in advance for any idea/hint/plan/solution/revelation any one of you may have!

Best regards

Michaël GAUTHIER
Université Lumière Lyon 2
France



_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list