[Corpora-List] fiction dialogue in corpora

Martin Wynne martin.wynne at oucs.ox.ac.uk
Mon Apr 10 13:24:56 UTC 2006


You can calculate precisely the proporation of dialogue in fiction in 
the small corpus which we assembled and tagged in Lancaster, known as 
the Speech, Thought and Writing Presentation Corpus. All forms of direct 
speech (as well as many other forms of discourse presentation) were 
manually annotated in the corpus. Out of approximately 260,000 words of 
contemporary British narrative writing (fiction, news reports and 
biography), we found that roughly 16% of the words were some form of 
direct speech. The proportions varied greatly in the various texts, in 
the different text types and in 'serious' and 'popular' varieties. In 
fiction, the proportion was 23%. (I'd need to check these statistics a 
little more carefully if you want to use them, but from a quick glance 
at the tables which I have, this seems to be the right answer.)

Looking more widely than simply direct speech, we found that more than 
50% of the words in the corpus were in passages of some form of speech, 
thought and writing presentation, e.g. direct speech, indirect speech, 
direct thought, indirect thought, direct writing, indirect writing, free 
indirect speech, etc.

For you and anyone else interested, the corpus can be made available 
from the Oxford Text Archive. Just email me if you're interested.

You can read more about the corpus and the analysis of it, and check the 
statistics yourself, in 'Corpus Stylistics' by Mick Short and Elena 
Semino, Longman, 2002.

Best wishes,
Martin Wynne


Karin Axelsson wrote:
>  
> Dear Corpora-List members,
>  
> I’m a PhD student planning to study tag questions in the BNC. I’d like 
> to compare the use in fiction dialogue to that in spoken conversation. 
> In order to compare frequencies I would need to know the proportion of 
> dialogue, i.e. direct speech, in a sub-corpus of the written part 
> restricted to the imaginative domain and book as medium of text 
> (probably also restricted to the UK and Ireland as domicile of author 
> and maybe also restricted to the latest period of time: 1985-1993)
>  
> Does anybody know how large the proportion of direct speech is in this 
> sub-corpus?
>  
> If not, does anybody know how to find out the proportion of direct 
> speech in such a sub-corpus?
>  
> Are there any corpora of just British fiction dialogue?
>  
> The best alternative I know of is the English original fiction part of 
> the English-Norwegian Parallel Corpus, where direct speech is a possible 
> search restriction. Unfortunately, there are as yet no figures for the 
> proportion of direct speech in it.
>  
> Has anybody done research on the proportion of direct speech in British 
> fiction? There appear to be large differences for different authors (and 
> probably differences between fiction in different genres and different 
> languages).
>  
> Many thanks in advance for answers to any of these questions. 
> 
> 
> Best regards,
> 
> Karin Axelsson
> 
> PhD student
> 
> English Deparment
> 
> Göteborg University
> 
> Sweden
> 
> 


-- 
Martin Wynne
Head of the Oxford Text Archive and
AHDS Literature, Languages and Linguistics

Oxford University Computing Services
13 Banbury Road
Oxford
UK - OX2 6NN
Tel: +44 1865 283299
Fax: +44 1865 273275
martin.wynne at oucs.ox.ac.uk



More information about the Corpora mailing list