[Corpora-List] What is corpora and what is not?
WILLIAMS Geoffrey
williams at univ-ubs.fr
Wed Oct 3 17:43:22 UTC 2012
Are we not slightly reinventing the wheel?
The nature of corpora has been discussed for years, EAGLES was about
defining it. In 2005, John Sinclair enlarged upon the 1996 definition
when he wrote :
> A corpus is a collection of pieces of language text in electronic
> format, selected according to external criteria to represent, as far
> as possible, a language or language variety as a source of data for
> linguistic research.
Sinclair J. McH. . 2005. ‘Corpus and Text: Basic Principles’. In Wynne,
M (ed). 2005. pp. 1-16. Wynne, M (ed). 2005. Developing Linguistic
Corpora: A Guide to Good Practice. Oxford: AHDS 6 -
It is also on the web!
Surely anyone involved in corpora has read the seminal works and does
not need reminding that corpora are machine-readable, maybe samples or
whole works etc. What has changed is the rise of internet corpora, but
here too Kilgarriff and others have commented the situation in a way
that both NLP and corpus linguistic users can feel at home with.
Best
Geoffrey
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list