[Corpora-List] What is corpora and what is not?

Kevin B. Cohen kevin.cohen at gmail.com
Tue Oct 2 14:59:21 UTC 2012


Hi, Yuri,

Different people have differing definitions of what constitutes a
corpus.  Here are a couple of them:

Meyer:

"a collection of texts or parts of texts upon which some general
linguistic analysis can be conducted"
"a body of text made available in computer-readable form for purposes
of linguistic analysis"

McEnery and Wilson:

McEnery & Wilson:
(i) (loosely) any body of text
(ii) (most commonly) a body of machine-readable text
(iii) (more strictly) a finite collection of machine-readable text,
sampled to be maximally representable of a language or variety

You'll notice that a common element of the definitions is the notion
of machine-readability.

Some people distinguish between a "document collection" and a corpus.
In this case, the difference is that a corpus has some sort of
annotations, while a document collection is a set of unannotated
documents.  Sorry I don't have a citation for this.

Kev


On Tue, Oct 2, 2012 at 6:21 AM, Yuri Tambovtsev <yutamb at mail.ru> wrote:
> Dear corpora members, I do not understand, what corpora is and what corpora
> is not. Is the set the text of books by Charles Dickens is a Dickens
> corpora? What about the books of Ernst Hemingway and other writers? Looking
> forward to hearing your opinion to yutamb at mail.ru  Yours sincerely Yuri
> Tambovtsev, Novosibirsk, Russia
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>



-- 
Kevin Bretonnel Cohen, PhD
Biomedical Text Mining Group Lead, Computational Bioscience Program,
U. Colorado School of Medicine
303-916-2417 (cell) 303-377-9194 (home)
http://compbio.ucdenver.edu/Hunter_lab/Cohen

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list