[Corpora-List] What is corpora and what is not?

WILLIAMS Geoffrey williams at univ-ubs.fr
Wed Oct 3 17:43:22 UTC 2012

Are we not slightly reinventing the wheel?

The nature of corpora has been discussed for years, EAGLES was about 
defining it. In 2005, John Sinclair enlarged upon the 1996 definition 
when he wrote :

> A corpus is a collection of pieces of language text in electronic 
> format, selected according to external criteria to represent, as far 
> as possible, a language or language variety as a source of data for 
> linguistic research.
Sinclair J. McH. . 2005. ‘Corpus and Text: Basic Principles’. In Wynne, 
M (ed). 2005. pp. 1-16.  Wynne, M (ed). 2005. Developing Linguistic 
Corpora: A Guide to Good Practice. Oxford: AHDS 6 -

It is also on the web!

Surely anyone involved in corpora has read the seminal works and does 
not need reminding that corpora are machine-readable, maybe samples or 
whole works etc. What has changed is the rise of internet corpora, but 
here too Kilgarriff and others have commented the situation in a way 
that both NLP and corpus linguistic users can feel at home with.



UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no

More information about the Corpora mailing list