<div>Dear all,</div>
<div>This thread and the second blog link Brett included both remind me of a related issue: exposure to textual input (reading). My interest in this has a story that goes like this:</div>
<div> </div>
<div>There is growing pressure in graduate programs in countries and universities where English is not the medium of education for graduate students to publish in international journals (in English), say in journals listed in SSCI, SCI, A&HI etc. At the same time, there is precious little awareness evident on the part of those imposing this requirement (at least where I work) of the enormity of such an expectation. As a consequence, the resources devoted to the level of language instruction, mentoring, practice needed to bring these students to the level of language proficiency needed to produce such publications is scandaluosly underestimated...I think.</div>
<div> </div>
<div>So I'm wondering if there are any estimates on what amount of exposure to, say, English text (reading) it is reasonable to assume that authors typically have under their belts who publish in the sorts of journals these students are expected to publish in. My real agenda is that it would help make a concrete case for devoting more language education resources to these students (perhaps esp for reading).</div>
<div> </div>
<div>Thanks.</div>
<div>David Wible<br><br> </div>
<div><span class="gmail_quote">On 8/18/10, <b class="gmail_sendername">Ali SH</b> <<a href="mailto:asaegyn%2Bout@gmail.com">asaegyn+out@gmail.com</a>> wrote:</span>
<blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">Hi all,<br><br>Thanks for the responses. I got a number of offline responses, here is a collation of all the pointers I received.<br>
While there doesn't seem to be one conclusive or even complete study, piecing together various studies I think gives a pretty good estimate. See the second blog post below for a fairly thorough <span style="COLOR: rgb(153,153,153)">(and sourced!)</span> analysis.<br>
<br>====<br>Brett Reynolds:<br><br>All I can offer you is my own back-of-the-envelope calculation here:<br><br><<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://english-jack.blogspot.com/2007/07/idioms-interpreting-frequencies.html" target="_blank">http://english-jack.blogspot.com/2007/07/idioms-interpreting-frequencies.html</a>><br>
<br>Best,<br>Brett<br><br>====<br><br>a comment on his post, also led to this very interesting blog as well (with relevant information):<br><a onclick="return top.js.OpenExtLink(window,event,this)" href="http://learnalanguageortwo.blogspot.com/2009/06/alls-well-in-tv-land.html" target="_blank">http://learnalanguageortwo.blogspot.com/2009/06/alls-well-in-tv-land.html</a><br>
<br>===<br><br>There is also the Human Speechome Project<br><a onclick="return top.js.OpenExtLink(window,event,this)" href="http://en.wikipedia.org/wiki/Human_Speechome_Project" target="_blank">http://en.wikipedia.org/wiki/Human_Speechome_Project</a><br>
<br>and lastly the links provided by Marco below.<br><br>Cheers,<br>Ali<br><br>
<div class="gmail_quote">On Tue, Aug 17, 2010 at 12:49 PM, Marco Baroni <span dir="ltr"><<a onclick="return top.js.OpenExtLink(window,event,this)" href="mailto:marco.baroni@unitn.it" target="_blank">marco.baroni@unitn.it</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid">Hi there.<br><br>I asked a similar question a few years ago, without much success. I paste the summary below.<br>
<br>If you find out more, please keep me posted!<br><br>Regards,<br><br>Marco<br><br>Dear all,<br><br>Two weeks ago I asked if somebody knew of work reporting estimates of how<br>many words/sentences/etc. (adult) speakers of a language hear/write.<br>
<br>I paste below the responses I got.<br><br>Thanks a lot to all who responded!<br><br>Regards,<br><br>Marco<br><br><br>******************************************<br>Reinhard Rapp<br>******************************************<br>
<br>Dear Marco,<br><br>I am also interested in the answer to your question. Some discussion<br>can be found in a Psychological Review paper by Landauer & Dumais<br>(1997) which is on the web at<br><br><a onclick="return top.js.OpenExtLink(window,event,this)" href="http://lsa.colorado.edu/papers/plato/plato.annote.html" target="_blank">http://lsa.colorado.edu/papers/plato/plato.annote.html</a><br>
<br>This is a citation from the most relevant part, which is footnote 6:<br><br>----------- start citation ------------<br><br>> From his log-normal model of word frequency distribution and the<br>observations in Carroll et al.<br>
<br>(1971), Carroll estimated a total vocabulary of 609,000 words in the<br>universe of text to which students through highschool might be exposed.<br>Dahl (1979), whose distribution function agrees with a different but<br>
smaller sample of Howes (1966), found 17,871 word types in 1,058,888 tokens<br>of spoken American English, compared to 50,406 in the comparable sized<br>adult sample of Kucera & Francis (1967). By Carroll's (1971) model, Dahl's<br>
data imply a total of roughly 150,000 word types in spoken English, thus<br>approximately one-fourth the total, less to the extent that there are<br>spoken words that do not appear in print. Moreover, the ratio of spoken to<br>
printed words to which a particular individual is exposed must be even more<br>lopsided because local, ethnic and family usage undoubtedly restrict the<br>variety of vocabulary more than published works intended for the general<br>
school-aged readership.<br>If we assume that our seventh-grader has met a total of 50 million word<br>tokens of spoken English (140 minutes a day at 100 words per minute for 10<br>years) then the expected number of occasions on which the she would have<br>
heard a spoken word of mean frequency would be about 370. Carroll's<br>estimate for the total vocabulary of seventh grade texts is 280,000, and we<br>estimate below that the typical student would have read about 3.8 million<br>
words of print. Thus, the mean number of times she would have seen a<br>printed word to which she might be exposed is only about 14. The rest of<br>the frequency distributions for heard and seen words, while not<br>proportional, would, at every point, show that spoken words have already<br>
had much greater opportunity to be learned than printed words, so will<br>profit much less from an additional occurrence.<br><br>----------- end citation ------------<br><br>...<br><br>With kind regards,<br><br>Reinhard<br>
<br><br><br>******************************************<br>Paula Newman<br>******************************************<br><br>Marco,<br>That's an interesting question. A little googling suggested that a lower<br>bound might come from data on the average number of hours of TV watching<br>
per adult (multiplied by average words per minute on TV broadcasts).<br>Paula<br><br><br><br>******************************************<br>Paul Bennett<br>******************************************<br><br><br>Geoffrey Pullum and Barbara Scholze (in Linguistic Review 19, 2002, p44) cite<br>
evidence that by the age of three a child in a professional household might<br>have heard 30 million word tokens (but far fewer for children in other social<br>classes). I know this relates to children rather than adults, but presumably<br>
the amount of language heard does not differ much by age.<br><br>Their source is B. Hart and T. Risley: Meaningful Differences in the Everyday<br>Experiences of Young Children (Paul H Brookes, 1995). I haven't read this, but<br>
I guess this would be a place to look for more information.<br><br>Paul Bennett<br><br><br><br>******************************************<br>Ilana Bromberg<br>******************************************<br><br><br>Marco,<br>
<br>There is some information regarding how much school-age children (up<br>through HS I think) read in the following article. It's possible that some<br>of the sources they cite may have more information about adults.<br>
<br>Landuaer, Thomas K and Dumais, Susan T. 1997. A Solution to Plato's<br>Problem: The Latent Semantic Analysis Theory of the Acquisition, Induction,<br>and Representation of Knowledge. Psychological Review, 104:2, 211-240.<br>
<br>Good luck,<br>Ilana<br><br><br>* Then, there as somebody who wanted to remain anonymous, who answered:<br><br><br>I was interested in your query to the list, but had nothing scientific to offer. Nevertheless, for many years I have had to make estimates of how much of a person's experience of language is represented by a corpus of such-and-such a size. It has been necessary to wow the public by suggesting that a query to EDIT scans several years of an individual's language experience, and, on the other hand, to convince sponsors that even half a billion words is just chickenfeed compared with the amount of text produced in a speech community.<br>
<br>In EDIT 15 years ago we established a monitor corpus with 100mw of The Times, and discovered that the weekly output of that paper, including The Sunday Times, was over half a million words. Genuine neologisms, and not just trivial variations or proper names, were coming in at around a dozen every day. But of course not even the most devoted reader gets through anything like the whole paper.<br>
<br>Back when I was doing discourse analysis I read somewhere that speech is produced at an average of 1500 clauses an hour, and in speech, by my calculations at the time, a clause seemed to average 5/6 words. I imagine that reading is not very different from that, maybe towards the faster end, but I haven't checked. Then you have to guess how many hours, on average, people are engaged in communicative activity, which I put at 12 hours. 1500 x 6 x 12 gives an estimate of 108000 daily, 39420000 annually.<br>
<br>If you are suspicious about any of the assumptions, you can just change them.<br><br></blockquote></div>
<div><span class="e" id="q_12a858be90533c22_1"><br><br clear="all"><br>-- <br><a onclick="return top.js.OpenExtLink(window,event,this)" href="http://www.reseed.ca/" target="_blank">www.reseed.ca</a><br><a onclick="return top.js.OpenExtLink(window,event,this)" href="http://www.pinkarmy.org/" target="_blank">www.pinkarmy.org</a><br>
<br>(•`'·.¸(`'·.¸(•)¸.·'´)¸.·'´•) .,., <br></span></div><br>_______________________________________________<br>Corpora mailing list<br><a onclick="return top.js.OpenExtLink(window,event,this)" href="mailto:Corpora@uib.no">Corpora@uib.no</a><br>
<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br><br></blockquote></div><br>