[Corpora-List] What is corpora and what is not?

evalacroix at free.fr evalacroix at free.fr
Fri Oct 5 05:32:37 UTC 2012


Readers of French can find a chapter about the history of corpora and the definition of "corpus" in my Ph.D. thesis . 


Answer to Michael: 
in my Ph.D. thesis, page 18: " Grouping texts with similar characteristics is a longstanding practice, which can be demonstrated by the Codex Manesse, containing songs of German poets collected in the 14th century" ( " Le regroupement de textes ayant des caractéristiques communes, est une pratique ancienne, comme en témoigne par exemple l'existence du Codex Manesse comportant des chansons de poètes allemands rassemblées au 14 ème siècle."). By the way, concordancing was not invented by corpus linguists either. Bible researchers have done it at least for 500 years. I am refering once again to my Ph.D. thesis (pages 36/37): 
"The TLFi (Trésor de la Langue Française, an online dictionary of French] tells us that in 1564, 'concordance' is defined as an alphabetical list of words used in the bible, with indication of the texts which contain them (...). I n 1680, its application for language learning is mentioned: Concordance. Small essentials with a syntax [...] for children beginning to study Latin." 



"Le TLFi ( Trésor de la Langue Française ) nous apprend qu'en 1564, on définit la concordance comme 
table alphabétique des mots employés dans la Bible, avec indication des textes qui les contiennent. 
En 1680, une application pour l'enseignement des langues est évoquée: 
Concordance. Petit rûdiment avec un sintaxe [...] pour instruire les enfans qui commençent à aprendre le latin. " 


Best regards, Eva. 



----- Mail original -----

De: "Michal Ptaszynski" <ptaszynski at media.eng.hokudai.ac.jp> 
À: corpora at uib.no, corpora-request at uib.no 
Cc: ptaszynski at hgu.jp 
Envoyé: Vendredi 5 Octobre 2012 02:58:04 
Objet: Re: [Corpora-List] What is corpora and what is not? 

I was wandering if someone knows/remembers when the word "corpus" was used 
first in the context of linguistics and by whom. 

Best, 
Michal 

---------------- 
Od: Piotr Pezik <pezik at uni.lodz.pl> 
Kopia dla: "CORPORA at hd.uib.no" <CORPORA at hd.uib.no> 
Do: Trevor Jenkins <trevor.jenkins at suneidesis.com> 
Data: Thu, 4 Oct 2012 11:38:43 +0200 
Temat: Re: [Corpora-List] What is corpora and what is not? 

Having been involved in the process of acquiring both conversational and 
"on-air" spoken language data for the National Corpus of Polish (NKJP), 
I'd have to strongly agree with Trevor's remarks. 
I think the American Soap Operas Corpus, although a very valuable resource 
in its own right, represents written-to-be-spoken rather than spoken 
language. Soap opera scripts are essentially their authors's impressions 
of casual spoken language, not that much different from linguistically 
realistic dialogues you might fine in a novel or a play. They often are an 
accurate reflection of (a particular breed of) spoken language and 
sometimes they are even an exaggerated impression, which is why you might 
find them to be more spoken than the conversational part of the BNC (the 
plus-catholique-que-le-pape effect), but they're not the real thing simply 
because they are written and edited and not produced with the real time 
constraints of casual spoken discourse. 
Live TV shows are closer to casual spoken discourse, although still very 
different, if you consider their pragmatic discourse structure among other 
dimensions of comparison. For example, it is fairly obvious that while 
speaking to anyone in the studio, politicians and celebrities generally 
tend to "communicate” to their viewers/voters. On-air spoken language is 
different from what you get when the cameras and microphones are switched 
off. 
Regards, 
Piotr 

_______________________________________________ 
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora 
Corpora mailing list 
Corpora at uib.no 
http://mailman.uib.no/listinfo/corpora 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20121005/c8a0b46d/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list