[Corpora-List] What is corpora and what is not?

Ciarán Ó Duibhín ciaran at oduibhin.freeserve.co.uk
Wed Oct 3 15:48:07 UTC 2012


In the context of computer corpora, an important distinction (it seems to 
me!)  is between corpora where the full text is included, and those where, 
normally for legal reasons, the full text is withheld.  These latter 
typically consist of the text in encrypted form, together with a word-index, 
and some software to access the index, to retrieve and decrypt short 
segments of the text in response to queries.

For want of a better term, I have always called the latter "corpora", but it 
is possible to dispute the applicability of the term.  It is certainly a 
source of confusion, and sometimes of disappointment.  Is there a better 
term?  It is more than just an index to a corpus, since included in addition 
to the index may be (a) the full text, though not in legible form; and (b) 
software to facilitate access.

Ciarán Ó Duibhín


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list