[Corpora-List] What I came away with from the "What is a Corpus" discussion

John F Sowa sowa at bestweb.net
Sat Oct 6 16:33:53 UTC 2012


On 10/6/2012 12:08 PM, amsler at cs.utexas.edu wrote:
> The simplest summary I came away with is that a corpus is a set of
> texts that has a proposed purpose of study. At least one person must
> have an intention for the collection to serve a purpose.

I agree.  This summary is very close to Adam's definition:

AK
> a corpus is a collection of texts/speech.  We call it a corpus when
> we view it as an object of linguistics or literary research.

And the following point is true of many (most?) words in NLs:

RA
> This definition of a corpus means that it may not be recognized as
> a corpus by anyone else other than its collector/creator.

Yes.  That's why nearly every reference to a corpus on this email list
puts some name or other qualifier in front of the word 'corpus'.

RA
> How to make a corpus that adheres to "best practices" would be more
> useful than deciding on whether someone's purposeful collection of text
> qualified to be called a corpus by everyone.

I agree.  Then the name given to the rules for those practices
could be placed in front of the word 'corpus'.

John Sowa

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list