Corpora: Legal corpus?

David Lee david_lee00 at hotmail.com
Sat Aug 26 00:12:58 UTC 2000


[Hasn't someone asked this before, not long ago? Anyway..]

Pulie,

Assuming you're working with English, there are about 127,331 words (not
a lot, by today's standards) in 13 files of courtroom proceedings
(hearings/trials, including judges' summations) in the BNC, transcribed
from spoken recordings. (I haven't come across any police interviews in
the BNC.)

However (and that's a big 'however'), it seems to me that some of the
trials/court proceedings were were split between 2 or more recordings
and thus landed up in different files. This means that the 13 files
probably only represent around 7 different 'cases' (estimate: I've
obviously not checked in detail). This may or may not be a problem,
depending on your research.

The other *huge* problem (for anyone wanting to do even the most basic
sociolinguistic research) is the almost complete absence of information
about the participants recorded (i.e. age, sex, social class, etc.). The
most we get is whether they were male or female (and more than half the
time, we don't even get that) and their role (judge, solictor, witness,
plaintiff, defendant). (Plea to future corpus compilers: please
scrupulously collect and record all the information you can get your
hands on about your participants!)


You might also want to look at the ICE-GB corpus:

Legal cross-exams (dialogue)    - 10 texts; 21,179 words
Legal presentations (monologue) - 10 texts; 21,735 words
Total: 42,914 words

Thankfully, there is more information on the participants in (some of)
the ICE-GB texts (less than half of them), but not by much. It would
seem 'Unknown' or '---' is an acceptable value for sociolinguistic
categories in many contemporary corpora... how sad. Confidentiality and
difficulty in obtaining personal information from large numbers of
strangers certainly constitute problems, but surely these are not
insurmountable?

Anyway, hope this helps.


David Lee
-----------------------------------------------------------------
David YW Lee          **************************************
Dept of Linguistics        *   Stop the narrowing of minds   *
Lancaster University     *   Affirm the diversity of life         *
Lancaster LA1 4YT      ***************************************
England, UK.

Email: david_lee00 at hotmail.com
-----------------------------------------------------------------



More information about the Corpora mailing list