Large scale combining CHILDES files

Brian MacWhinney macw at cmu.edu
Tue Mar 21 17:23:06 UTC 2017


Dear Amanda,

  From what you write, the problem occurs during your use of FLO.  For us (Leonid or me) to replicate the problem, we would need the complete collection of 340 files for this 0-15 months period.  It could be that some particular file is causing the problem, but it could also be the case that you are running up against a machine limitation or a CLAN limitation.  In any case, we would need to receive the collection that triggers the problem, along with the command you are using to replicate the problem.  You could send this to me or, better, Leonid (spektor at andrew.cmu.edu) as a zipped email attachment, preserving the folder structure you are using.  Before sending to us,  please make sure that this problem is replicable on your side.  You might also want to test on a second computer.  Also please make sure you are using a current version of CLAN.

--Brian

From: <chibolts at googlegroups.com> on behalf of Amanda Owen Van Horne <ajowen at gmail.com>
Reply-To: "chibolts at googlegroups.com" <chibolts at googlegroups.com>
Date: Wednesday, March 22, 2017 at 1:10 AM
To: "chibolts at googlegroups.com" <chibolts at googlegroups.com>
Subject: Large scale combining CHILDES files

Hi,

I'm try to combine all available English/non clinical CHILDES files based on the target child's age.  I've organized my files (by hand) into folders binned by month based on the child's age reported in the header information and now I would like to strip CHILDES codes from the speaker tier and output all of those files into a temp file, then I will use this temp file to create a single file of only adult/only child speakers.  The trouble I am running into is as the number of files I am working with gets larger, CLAN seems to skip files. When I run for 0-12 months I get the (expected) 192 files following FLO.  When I run for 0-15 months I get 340 files in the TEMP folder, when I should be getting 372.  This dropping of files continues and becomes more problematic as we move to broader and broader age ranges.  It's hard to track down individual files that might be contributing because so many files are involved.  Can anyone provide any guidance?

Amanda

working directory: CHILDES by Age Folder
output directory: TEMP

FLO *.cha -t% +d +r1 +re +ffin

  *   FLO -- command to strip codes from main tier
  *   *.cha -- apply to all files in working directory
  *   -t% - get rid of non-speaker related tiers like mor and spa
  *   +d - output in chat format
  *   +r1 - if something is in () remove () and keep content (e.g., (be)cause = because)
  *   +re works recursively through subfolders
  *   +ffin - output to a file with the code .fin before .cex

output (TEMP) will fill with *.fin.cex files (one per original file)

then change your working directory to the temp file. reset your output directory to someplace memorable.

KWAL *.cex -t*CHI +d +r1 +x>0w +u +f

  *   KWAL - keyword analysis with no keyword specified outputs all content
  *   *.cex  - all files in working directory
  *   -t*CHI - only adult speakers
  *   +d  - in chat format
  *   +r1 - - if something is in () remove () and keep content (e.g., (be)cause = because)
  *   +x>0w - only lines with 1 or more words; no empty utterances or utterances that only have info on other tiers
  *   +u - combine all output into one file
  *   +f - print to file (not to the screen)

Final output from these two processes will end with *.fin.kwal.cex (a single combined file)


Amanda J. Owen Van Horne, PhD CCC-SLP
Associate Professor
University of Iowa
amanda-owen-vanhorne at uiowa.edu<mailto:amanda-owen-vanhorne at uiowa.edu>


--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com<mailto:chibolts+unsubscribe at googlegroups.com>.
To post to this group, send email to chibolts at googlegroups.com<mailto:chibolts at googlegroups.com>.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/CA%2BUfwo47syFFvAc9T-F9m%3DxNhRt8FxmOPBEK9okjaP3iBG%2BTdQ%40mail.gmail.com<https://groups.google.com/d/msgid/chibolts/CA%2BUfwo47syFFvAc9T-F9m%3DxNhRt8FxmOPBEK9okjaP3iBG%2BTdQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To post to this group, send email to chibolts at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/cda1a20adfc14a22974bd654396bc4d6%40PGH-MSGMLT-01.andrew.ad.cmu.edu.
For more options, visit https://groups.google.com/d/optout.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20170321/b96a25a3/attachment.htm>


More information about the Chibolts mailing list