[Corpora-List] Looking for corpora suitable for research on language and gender...

Diana Maynard d.maynard at dcs.shef.ac.uk
Thu Mar 25 16:48:20 UTC 2004


hi Ute
There is some gender information in the spoken texts of the BNC.

There's a code sdesex1 and sdesex2 representing Spoken:Demographic: Respondent
Sex (male =1, female =2)
i don't remember any more without looking at the files, but you can find the
info in the bncfinder.dat file I think

I think there is probably only gender info for the demographic texts, ie where
there is a single person speaking and/or responding

Regards
Diana Maynard



On Thursday 25 Mar 2004 3:22 pm, Ute Römer wrote:
> Dear All,
>
> I am in the process of preparing an introductory course on language and
> gender and was thinking about compiling a "language and gender studies
> corpus sampler" for my students so they can carry out some small-scale
> empirical research projects to base their term papers on. For this sampler
> it would be ideal to have spoken and/or written corpora with (roughly
> comparable) male and female subsections, or just all-male/all-female
> talk/writing corpora, or maybe even collections of exclusively gay and/or
> lesbian language.
>
> I'm going to include a couple of small and specialised home-made corpora
> (literary texts, book reviews, pop/rap song lyrics...), but would also like
> to use larger and less specialised ones, such as COLT and (parts of) the
> BNC. Does anyone know about a possibility to extract from these corpora
> all-female and all-male conversations or male/female authored texts
> (without having to read the headers of 4,000+ text files)? I had a look at
> David Lee's "BNC Index" Excel spreadsheet but couldn't find sex indicators
> for spoken texts (maybe most of them are mixed sex anyway). Also, I would
> be grateful for pointers to other corpora which might be appropriate for
> L&G-related research (MICASE online is already on my list; and I've
> subdivided the transcript files of the Santa Barbara Corpus of Spoken
> American English into male/female/mixed groups).
>
> Best wishes and thanks in advance... Ute
>
>
> ************************************************************
>
> Ute Rmer
> English Department
> University of Hanover
> Knigsworther Platz 1
> 30167 Hannover
> Germany
>
> Phone: +49 (0)511 762 2997
> Fax: +49 (0)511 762 2996
> E-mail: ute.roemer at anglistik.uni-hannover.de
> http://www.fbls.uni-hannover.de/angli/



More information about the Corpora mailing list