[Corpora-List] What is corpora and what is not?
Ciarán Ó Duibhín
ciaran at oduibhin.freeserve.co.uk
Wed Oct 3 15:48:07 UTC 2012
In the context of computer corpora, an important distinction (it seems to
me!) is between corpora where the full text is included, and those where,
normally for legal reasons, the full text is withheld. These latter
typically consist of the text in encrypted form, together with a word-index,
and some software to access the index, to retrieve and decrypt short
segments of the text in response to queries.
For want of a better term, I have always called the latter "corpora", but it
is possible to dispute the applicability of the term. It is certainly a
source of confusion, and sometimes of disappointment. Is there a better
term? It is more than just an index to a corpus, since included in addition
to the index may be (a) the full text, though not in legible form; and (b)
software to facilitate access.
Ciarán Ó Duibhín
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list