[Corpora-List] What is corpora and what is not?

Michal Ptaszynski ptaszynski at media.eng.hokudai.ac.jp
Fri Oct 5 00:58:04 UTC 2012

I was wandering if someone knows/remembers when the word "corpus" was used  
first in the context of linguistics and by whom.


Od: Piotr Pezik <pezik at uni.lodz.pl>
Kopia dla: "CORPORA at hd.uib.no" <CORPORA at hd.uib.no>
Do: Trevor Jenkins <trevor.jenkins at suneidesis.com>
Data: Thu, 4 Oct 2012 11:38:43 +0200
Temat: Re: [Corpora-List] What is corpora and what is not?

Having been involved in the process of acquiring both conversational and  
"on-air" spoken language data for the National Corpus of Polish (NKJP),  
I'd have to strongly agree with Trevor's remarks.
I think the American Soap Operas Corpus, although a very valuable resource  
in its own right, represents written-to-be-spoken rather than spoken  
language. Soap opera scripts are essentially their authors's impressions  
of casual spoken language, not that much different from linguistically  
realistic dialogues you might fine in a novel or a play. They often are an  
accurate reflection of (a particular breed of) spoken language and  
sometimes they are even an exaggerated impression, which is why you might  
find them to be more spoken than the conversational part of the BNC (the  
plus-catholique-que-le-pape effect), but they're not the real thing simply  
because they are written and edited and not produced with the real time  
constraints of casual spoken discourse.
Live TV shows are closer to casual spoken discourse, although still very  
different, if you consider their pragmatic discourse structure among other  
dimensions of comparison. For example, it is fairly obvious that while  
speaking to anyone in the studio, politicians and celebrities generally  
tend to "communicate” to their viewers/voters. On-air spoken language is  
different from what you get when the cameras and microphones are switched  

UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no

More information about the Corpora mailing list