Corpora: What is a corpus

Ute Römer ute.roemer at uni-koeln.de
Fri Jan 28 11:54:01 UTC 2000


I would like to add something to the problem of proverbs and corpus
composition. I think a proverb cannot be regarded as a "component" of a
corpus (although proverbs are of course included in corpora) because a
proverb is not a "kind of text" like newspaper articles, novels, or
telephone conversations are "text-types" (sorry, I can't find a more
adequate expression for this).

Ute Römer


-----Ursprüngliche Nachricht-----
Von: Lou Burnard <lou.burnard at computing-services.oxford.ac.uk>
An: Lucian Galescu <galescu at cs.rochester.edu>
Cc: CORPORA at hd.uib.no <CORPORA at hd.uib.no>
Datum: Freitag, 28. Januar 2000 11:50
Betreff: Re: Corpora: What is a corpus


>This turns out to be quite an interesting discussion, since it really
>hinges on what a "proverb" is. If Francois had said (say) a corpus of
>sermons, or a corpus of advertisements, or a corpus of texts composed
>by 18th century french expatriate seamen with wooden legs, I don't
>think Oliver would have turned a hair (well, maybe in the last
>example) because all of those things are definable as types of text or
>artefact or entity or whatever. But proverbs don't seem to fit in with
>that list of things somehow: where would you look for proverbs?  they
>don't typically appear in isolation -- you don't go to the book shop
>and say "What proverbs have been published lately?" -- the newspapers
>don't have lists of today's hot proverbs -- no-one ever says "I think
>I'll create a proverb today" -- all of which makes me think that a
>proverb is not a text, but a judgment about a bit of a text. A
>collection of things-judged-proverbial is an interesting text,
>certainly, but it doesn't seem to be a corpus as we currently think of
>them.
>
>So while I agree with Lucian (and everyone else) that it's the act of
>filtering which defines a corpus, I feel the need to define the nature
>of the holes in the filter a bit more precisely. In other words, I
>think we need a definition for the *components* of a corpus, which
>would accept (say) a classified advert or a conversation with a travel
>agent but reject a metaphor or a proverb or even (here I feel the
>ground a bit shaky) a sentence containing a past tense verb.
>
>Lou
>
> ----------------------------------------------------------------
> Lou Burnard                           http://users.ox.ac.uk/~lou
> ----------------------------------------------------------------
>
>



More information about the Corpora mailing list