[Corpora-List] What is corpora and what is not?
Kevin B. Cohen
kevin.cohen at gmail.com
Tue Oct 2 14:59:21 UTC 2012
Hi, Yuri,
Different people have differing definitions of what constitutes a
corpus. Here are a couple of them:
Meyer:
"a collection of texts or parts of texts upon which some general
linguistic analysis can be conducted"
"a body of text made available in computer-readable form for purposes
of linguistic analysis"
McEnery and Wilson:
McEnery & Wilson:
(i) (loosely) any body of text
(ii) (most commonly) a body of machine-readable text
(iii) (more strictly) a finite collection of machine-readable text,
sampled to be maximally representable of a language or variety
You'll notice that a common element of the definitions is the notion
of machine-readability.
Some people distinguish between a "document collection" and a corpus.
In this case, the difference is that a corpus has some sort of
annotations, while a document collection is a set of unannotated
documents. Sorry I don't have a citation for this.
Kev
On Tue, Oct 2, 2012 at 6:21 AM, Yuri Tambovtsev <yutamb at mail.ru> wrote:
> Dear corpora members, I do not understand, what corpora is and what corpora
> is not. Is the set the text of books by Charles Dickens is a Dickens
> corpora? What about the books of Ernst Hemingway and other writers? Looking
> forward to hearing your opinion to yutamb at mail.ru Yours sincerely Yuri
> Tambovtsev, Novosibirsk, Russia
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
--
Kevin Bretonnel Cohen, PhD
Biomedical Text Mining Group Lead, Computational Bioscience Program,
U. Colorado School of Medicine
303-916-2417 (cell) 303-377-9194 (home)
http://compbio.ucdenver.edu/Hunter_lab/Cohen
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list