<html><head><style type='text/css'>p { margin: 0; }</style></head><body><div style='font-family: arial,helvetica,sans-serif; font-size: 12pt; color: #000000'>Hi<div>Whether soaps etc are close to informal, unscripted, conversational speech is surely an empirical question; I'm also surprised, but there seems to be increasing evidence that it's more faithful than we might have expected (well, me anyway). But no doubt it also depends on what you're looking at: words, n-grams, conversational turns, non-explicit speech, 'dysfluencies', intonation patterns, etc.</div><div>There will always be a compromise between what you really want and what you can reasonably collect.</div><div>Best</div><div>alex<br><div><br><div><span name="x"></span><font size="2" style="color: rgb(0, 0, 0); font-family: arial, helvetica, sans-serif; font-size: 12pt; "><b><span style="font-family: Arial;">
</span></b><span style="font-family: Arial;">_____________________________</span><b><span style="font-family: Arial;"><br>Alex
Boulton</span><br>
</b></font><div style="color: rgb(0, 0, 0); font-family: arial, helvetica, sans-serif; font-size: 12pt; text-align: left; "><font size="2"><span style="font-family: Arial;">Université de Lorraine</span><br><span style="font-family: Arial;">
<span class="Object" id="OBJ_PREFIX_DWT517"><a href="mailto:boulton@univ-nancy2.fr" target="_blank"><br></a></span></span></font>
</div><p style="text-align: left; " class="MsoNormal"><font size="2"><span class="SpellE" style="color: rgb(0, 0, 0); font-family: Arial; ">homepage</span><font face="Arial">: </font><span class="Object" id="OBJ_PREFIX_DWT518"><font color="#0000ee" face="Arial"><u>bit.ly/STZegS</u></font><font face="Arial" size="3"> </font></span></font></p>
<p class="MsoNormal" style="color: rgb(0, 0, 0); font-family: arial, helvetica, sans-serif; font-size: 12pt; "><font size="2"><span style="font-family: Arial;"><font color="#ff0000"><br></font></span></font></p><p class="MsoNormal" style="color: rgb(0, 0, 0); font-family: arial, helvetica, sans-serif; font-size: 12pt; "><font size="2"><span style="font-family: Arial;"><font color="#ff0000">ATILF : CNRS, UL (équipe Crapel)</font></span><span style="font-family: Arial;"> <font color="#ff0000">ERUDI : UL</font><br>
Tél : (+33) 03 54 50 51 12 Tél : (+33) 03 54 50 46 70<br><span class="Object" id="OBJ_PREFIX_DWT521"><a href="http://www.univ-nancy2.fr/CRAPEL/" target="_blank">www.atilf.fr</a></span> </span></font><span class="Object" id="OBJ_PREFIX_DWT521" style="font-family: Arial; font-size: small; "><a href="http://www.univ-nancy2.fr/CRAPEL/" target="_blank">www.univ-nancy2.fr/erudi</a></span></p><p class="MsoNormal" style="color: rgb(0, 0, 0); font-family: arial, helvetica, sans-serif; font-size: 12pt; "><span class="Apple-style-span" style="font-size: small; "><br></span></p><p class="MsoNormal" style="color: rgb(0, 0, 0); font-family: arial, helvetica, sans-serif; font-size: 12pt; "><span class="Apple-style-span" style="font-size: small; ">NB Nouveau courriel / new email address</span><span class="Apple-style-span" style="font-size: small; ">: </span><b style="font-size: small; "><font class="Apple-style-span" color="#ff0000">alex.boulton@univ-lorraine.fr</font></b></p><span name="x"></span><br></div><hr id="zwchr"><div style="color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><b>De: </b>"Trevor Jenkins" <trevor.jenkins@suneidesis.com><br><b>À: </b>CORPORA@hd.uib.no<br><b>Envoyé: </b>Jeudi 4 Octobre 2012 10:31:18<br><b>Objet: </b>Re: [Corpora-List] What is corpora and what is not?<br><br><div><div>On 4 Oct 2012, at 01:18, Mark Davies <<a href="mailto:Mark_Davies@byu.edu" target="_blank">Mark_Davies@byu.edu</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote><blockquote><blockquote>Even the newly appeared American Soap Operas corpus on Mark Davies site is still constructed and ultimately "high-brow".<br></blockquote></blockquote><br>Maybe, but see:<br><br><a href="http://corpus2.byu.edu/soap/overview.asp" target="_blank">http://corpus2.byu.edu/soap/overview.asp</a> (comparison to the spoken portion of the BNC; in many respects, much more colloquial than the BNC spoken)<br><a href="http://corpus2.byu.edu/soap/" target="_blank">http://corpus2.byu.edu/soap/</a> (Soap Opera corpus itself; 100 million words)<br></blockquote><br></div><div>I think you make my point in that overview with the observations about the costs of compiling spoken corpora.</div><div><br></div><div>In the description of American Soap Operas there are these claims…</div><div><br></div><div><blockquote>the theory that the dialogue
in most TV shows and movies represents the spoken language pretty well.</blockquote>and </div><div><blockquote>we would suggest that subtitles from
<strong>informal </strong>TV shows and movies does represent the
informal, everyday language quite well -- especially <strong>soap operas</strong>. </blockquote><br></div><div>It's that theory I challenge. I don't believe that scripted and rehearsed productions represent spoken language very well at all because soap operas are not *<b>informal*</b> at all being as they are scripted productions created in highly formalised environments (writers' room, studio, post-production suite, etc). Soap operas are no more informal than presidential candidate debates are unscripted. Soaps *<b>settings</b>* may be informal --- homes, offices, schools --- but the language is every bit as formal as that in the printed material collected in other corpora.</div><div><br></div><div><div>It's a 2-space issue formal/informal x language/setting. Setting is no guide to language. Sadly many, confuse the informal nature of setting with the informal use of language; I encounter this with teachers of (British) sign language were informal /setting/ is often labeled as informal language. The problem is more likely to be an N-space as one has to account for L1/L2 interaction, varying age of participants, differing status of participants (parent/child, employer/employee, teacher/pupil/ lecturer/students) and many other factors not least the Humpty Dumpty effect. </div><div><br></div></div><div>Of the 10 soap exemplars you use their credits all include the telling phrase "written by". While the language use may reflect some current informal phrase usages none of the content is primary source. We also have intrusion; the British sketch shows Little Britain and the Catherine Tate Show pretty much created new phrases. Little Britain's character Vicky Pollard with her "yeah but no but yeah" marker and Catherine Tate with her "am I bovered" comment were taken up with great alacrity by school children and young adults.</div><div><br></div><div>There is a British sitcom called Outnumbered that has some improvised dialogue because of the ages of some of the cast (when the series started the oldest child cast member was 11 and the youngest 5 or 6 the third was 7 or 8). Their contributions were guided in rehearsal but not explicitly scripted so their specific lexical choices represent the social class the children were born into but that is still somewhat high brow as the parents porn film producers, sports reporters and actors. Even with the freedom to improvise the language reflects a specific class (A/B1) rather than being representative of "the 99%".</div><div><br></div><div>Having a corpus of transcripts of confrontational shows such as those of Jerry Springer, perhaps Ricki Lake, or the British Jeremy Kyle in which more vernacular language is included, although often censored by the sound department, might meet those two claims (good representation of normal spoken English and informal usage). Or possibly better transcripts of the unredacted 24hour live feeds of reality shows like Big Brother. But there's still a selection process involved which skews the language used. </div><div><br></div><div>Now the irony is that until such time as a large scale corpus of truly informal unrehearsed unscripted utterances exists we won't be able to do any comparisons between the lexical choices and grammar constructions of normal language.</div><div><br></div><div>Regards, Trevor.<div><br></div><div><>< Re: deemed!</div>
</div>
<br><br>_______________________________________________<br>UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora<br>Corpora mailing list<br>Corpora@uib.no<br>http://mailman.uib.no/listinfo/corpora<br></div><br></div></div></div></body></html>