random sampling - longitudinal corpus

Brian MacWhinney macw at cmu.edu
Mon Feb 20 16:13:58 UTC 2012


Dear Coralie,

     I would vote for sampling with replacement.  But perhaps the issue your supervisors raise is not the most important one.  Our transcripts are, of course, samples in the first place, taken from the larger population of all things said at the home and in the other recording situations.  The biggest methodological problem here involves the sampling of situations in the first place.  To address this problem, projects like the one organized by Gordon Wells used a method of turning on the tape recorder at random times during the day through some program control.  Or one could try to implement methods such as those used by Deb Roy or perhaps the Lena system to record everything all the time (but then you still end up sampling).  
   How much this all matters may depend on what you are studying.  Jean Berko-Gleason showed convincingly how vocabulary from alternative situation types can be highly non-overlapping.  We know that the "noun bias" is much reduced during certain types of play.  And so on.  There is somewhat less evidence for situational bias in things like syntax.  But I would certainly not exclude it.
   However, bypassing these issues, it is still fair enough to think about sampling with and without replacement.  The advantage of sampling with replacement is that you have not biased the shape of the pool by the extraction of utterances or examples.  The sample remains the same sample throughout.  You just get to roll the dice again.   If the sample is very small, non-replacement could have a biasing effect.  However, as samples get larger, the contrast between the methods should become minimal.
   For a real answer on this second issue, one should always, of course, consult a statistician.

-- Brian MacWhinney

On Feb 20, 2012, at 3:10 AM, Coralie Herve wrote:

> Dear Childes Community,
> 
> For my PhD on cross-linguistic influence in French-English bilingual children, I am using my own longitudinal corpus. In addition to analysing the children's productions, I would like to examine a sample of the mothers' productions.
> My supervisors and I were thinking of using the R software in order to select a random sample of maternal utterances. I have two options either randomly sample with replacement or randomly sample without replacements.
> 
> What do you think are the pros and cons of using the replacement or not in random sampling maternal utterances? 
> 
> 
> Best wishes,
> 
> Coralie
> 
> _________________________________________________
> Coralie Hervé
> PhD Candidate
> School of Psychological Sciences | University of Manchester | Manchester | M13 9PL | U.K.
> E-mail: coralie.herve at postgrad.manchester.ac.uk
> 
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups "Info-CHILDES" group.
> To post to this group, send email to info-childes at googlegroups.com.
> To unsubscribe from this group, send email to info-childes+unsubscribe at googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/info-childes?hl=en.

-- 
You received this message because you are subscribed to the Google Groups "Info-CHILDES" group.
To post to this group, send email to info-childes at googlegroups.com.
To unsubscribe from this group, send email to info-childes+unsubscribe at googlegroups.com.
For more options, visit this group at http://groups.google.com/group/info-childes?hl=en.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/info-childes/attachments/20120220/8d464c2c/attachment.htm>


More information about the Info-childes mailing list