[Corpora-List] What is corpora and what is not?

Trevor Jenkins trevor.jenkins at suneidesis.com
Wed Oct 3 18:21:20 UTC 2012


On 3 Oct 2012, at 18:56, Graham White <graham at eecs.qmul.ac.uk> wrote:

> So the Corpus Iuris Civilis is not a corpus? …

In the same way that Stonehenge is not technically a henge … despite it's name being the origin of the word. 

A corpus is usually compiled with some purpose in mind, so the example someone used earlier of the novels of Charles Dickens would constitute a corpus if one were analysing his fiction. A more formal definition of corpus that I use is quoted in Atkins and Rundell's "Oxford Guide to Practical Lexicography" (p54), viz "a corpus is a collection of pieces of language text in electronic form, selected according to external criteria to represent, as far as possible, a language or language variety as a source of data for linguistic research."  Indeed the original questioner might do well to read chapter 3 of that book in its entirety.

However, inclusion of the Dickens Journal Online material at the same time as the novels might stop the dataset being considered a formal corpus. Or if one had a bunch of texts, that included some Dickens and Austen and Elliot (whether George or T S or the sisters) simply because the analyst likes them doesn't make up the result a corpus --- unless they are representative of some other usage, for example language variance in 19th century fiction over time.

Worse would be a collection of texts in different language just because the analyst likes to read them --- unless it is the same text in those different language and the purpose is to analyse the translation process.

So to the original questioner, what is your purpose in wanting a corpus? What are your criteria for texts being included? What analysis are you likely to apply to those texts?

Regards, Trevor.

<>< Re: deemed!


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list