Large scale combining CHILDES files

Amanda Owen Van Horne ajowen at gmail.com
Tue Mar 21 17:10:44 UTC 2017


Hi,

I'm try to combine all available English/non clinical CHILDES files based
on the target child's age.  I've organized my files (by hand) into folders
binned by month based on the child's age reported in the header information
and now I would like to strip CHILDES codes from the speaker tier and
output all of those files into a temp file, then I will use this temp file
to create a single file of only adult/only child speakers.  The trouble I
am running into is as the number of files I am working with gets larger,
CLAN seems to skip files. When I run for 0-12 months I get the (expected)
192 files following FLO.  When I run for 0-15 months I get 340 files in the
TEMP folder, when I should be getting 372.  This dropping of files
continues and becomes more problematic as we move to broader and broader
age ranges.  It's hard to track down individual files that might be
contributing because so many files are involved.  Can anyone provide any
guidance?

Amanda

working directory: CHILDES by Age Folder
output directory: TEMP

FLO *.cha -t% +d +r1 +re +ffin

   - FLO -- command to strip codes from main tier
   - *.cha -- apply to all files in working directory
   - -t% - get rid of non-speaker related tiers like mor and spa
   - +d - output in chat format
   - +r1 - if something is in () remove () and keep content (e.g.,
   (be)cause = because)
   - +re works recursively through subfolders
   - +ffin - output to a file with the code .fin before .cex

output (TEMP) will fill with *.fin.cex files (one per original file)

then change your working directory to the temp file. reset your output
directory to someplace memorable.

KWAL *.cex -t*CHI +d +r1 +x>0w +u +f

   - KWAL - keyword analysis with no keyword specified outputs all content
   - *.cex  - all files in working directory
   - -t*CHI - only adult speakers
   - +d  - in chat format
   - +r1 - - if something is in () remove () and keep content (e.g.,
   (be)cause = because)
   - +x>0w - only lines with 1 or more words; no empty utterances or
   utterances that only have info on other tiers
   - +u - combine all output into one file
   - +f - print to file (not to the screen)

Final output from these two processes will end with *.fin.kwal.cex (a
single combined file)


Amanda J. Owen Van Horne, PhD CCC-SLP
Associate Professor
University of Iowa
amanda-owen-vanhorne at uiowa.edu

-- 
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To post to this group, send email to chibolts at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/CA%2BUfwo47syFFvAc9T-F9m%3DxNhRt8FxmOPBEK9okjaP3iBG%2BTdQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20170321/d3b39f30/attachment.htm>


More information about the Chibolts mailing list