[Corpora-List] What is corpora and what is not?

Williams williams at univ-ubs.fr
Wed Oct 3 18:47:56 UTC 2012


Trevor sums this up beautifully.

The story of why corpus linguistics was so named is known. It might not be ideal, but that is the way it is. As in all disciplines, it is good to read the foundation texts, and ours only date back to the 80s.

Different disciplines may use the word corpus in different ways, for a lawyer it has a particular usage. For those of us playing with computers, corpus and corpora take on a precise meaning. The wheel has turned.

Best

Geoffrey 

Sent from my iPad

On 3 Oct 2012, at 20:21, Trevor Jenkins <trevor.jenkins at suneidesis.com> wrote:

> On 3 Oct 2012, at 18:56, Graham White <graham at eecs.qmul.ac.uk> wrote:
> 
>> So the Corpus Iuris Civilis is not a corpus? …
> 
> In the same way that Stonehenge is not technically a henge … despite it's name being the origin of the word. 
> 
> A corpus is usually compiled with some purpose in mind, so the example someone used earlier of the novels of Charles Dickens would constitute a corpus if one were analysing his fiction. A more formal definition of corpus that I use is quoted in Atkins and Rundell's "Oxford Guide to Practical Lexicography" (p54), viz "a corpus is a collection of pieces of language text in electronic form, selected according to external criteria to represent, as far as possible, a language or language variety as a source of data for linguistic research."  Indeed the original questioner might do well to read chapter 3 of that book in its entirety.
> 
> However, inclusion of the Dickens Journal Online material at the same time as the novels might stop the dataset being considered a formal corpus. Or if one had a bunch of texts, that included some Dickens and Austen and Elliot (whether George or T S or the sisters) simply because the analyst likes them doesn't make up the result a corpus --- unless they are representative of some other usage, for example language variance in 19th century fiction over time.
> 
> Worse would be a collection of texts in different language just because the analyst likes to read them --- unless it is the same text in those different language and the purpose is to analyse the translation process.
> 
> So to the original questioner, what is your purpose in wanting a corpus? What are your criteria for texts being included? What analysis are you likely to apply to those texts?
> 
> Regards, Trevor.
> 
> <>< Re: deemed!
> 
> 
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list