[Corpora-List] What is corpora and what is not?

Graham White graham at eecs.qmul.ac.uk
Wed Oct 3 17:56:59 UTC 2012


So the Corpus Iuris Civilis is not a corpus? This seems an unusual way 
to define things, firstly because it unduly privileges the medium of 
representation (and, as a computer scientist, that seems to me to be a 
mistake), and, secondly, because it rather orphans corpora which happen 
to be machine-readable: it ignores the considerable continuities between 
what scholars do with machine-readable texts and what scholars do with 
non-machine-readable texts. Why, after all, do we have machine-readable 
corpora other than that we are interested in human linguistic practices? 
Machine-readable corpora don't drop from space, after all.

Graham

On 03/10/12 18:43, WILLIAMS Geoffrey wrote:
> Are we not slightly reinventing the wheel?
>
> The nature of corpora has been discussed for years, EAGLES was about
> defining it. In 2005, John Sinclair enlarged upon the 1996 definition
> when he wrote :
>
>> A corpus is a collection of pieces of language text in electronic
>> format, selected according to external criteria to represent, as far
>> as possible, a language or language variety as a source of data for
>> linguistic research.
> Sinclair J. McH. . 2005. ‘Corpus and Text: Basic Principles’. In Wynne,
> M (ed). 2005. pp. 1-16.  Wynne, M (ed). 2005. Developing Linguistic
> Corpora: A Guide to Good Practice. Oxford: AHDS 6 -
>
> It is also on the web!
>
> Surely anyone involved in corpora has read the seminal works and does
> not need reminding that corpora are machine-readable, maybe samples or
> whole works etc. What has changed is the rise of internet corpora, but
> here too Kilgarriff and others have commented the situation in a way
> that both NLP and corpus linguistic users can feel at home with.
>
> Best
>
> Geoffrey
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list