Large scale combining CHILDES files

Amanda Owen Van Horne ajowen at gmail.com
Tue Mar 21 19:34:59 UTC 2017


Hi Leonid, 
  Thank you - the commands in step 1 identified a set of corrupted files. 
I'm very grateful. It will take me some time to restore those files to the 
right directories, but I suspect that will solve the problem. 

Amanda 

On Tuesday, March 21, 2017 at 1:48:02 PM UTC-5, Spektor, Leonid: CMU wrote:
>
> Amanda,
>
>     I want you to try two thing. 
>
> 1. Please set working directory: CHILDES by Age Folder to 0-15 months and 
> run command "dir -r *.cha" at the end of the output in "CLAN Output" window 
> you will see how many files CLAN has found. If the number is 340 files, 
> then for some reason, maybe bad file extension or bad directory name or 
> file protection, CLAN can't see other files as .cha files. In this case run 
> command "dir -r -n *.cha" and you will see files that CLAN doesn't 
> recognize as .cha files.
>
> 2. If "dir -r *.cha" command finds 372 files, then the problem might be 
> with FLO command or "+re" function. Please get data from our server at URL 
> "http://childes.talkbank.org/data/Eng-NA/Braunwald.zip" 
> <http://childes.talkbank.org/data/Eng-NA/Braunwald.zip>. Unzip it and in 
> CLAN set working directory to unzipped Braunwald directory. Set output to 
> TEMP directory that is empty and run command "FLO *.cha -t% +d +r1 +re 
> +ffin". On my Mac and Windows 10 PC I get 900 .fin.cex files in TEMP 
> directory. If you get the same number, then something is wrong with files 
> in your 0-15 months set. If you get a different number, then make sure you 
> have the latest CLAN. Maybe even reboot your computer and try the same 
> above command again. 
>
> If you still get less than 900 files in TEMP directory, then please email 
> to me directly the full output of CLAN Output window after you run "FLO 
> *.cha -t% +d +r1 +re +ffin" command, tell me if you are using Mac or PC.
>
> If you get 900 files in TEMP, but you still can't figure out why in step 1 
> you get 340 files, then zip and email your 0-15 months directory to me and 
> I will see if I can figure out what is wrong.
>
> Leonid.
>
>
> On 21-03-17 13:23, Brian MacWhinney wrote:
>
> Dear Amanda,
>
>  
>
>   From what you write, the problem occurs during your use of FLO.  For us 
> (Leonid or me) to replicate the problem, we would need the complete 
> collection of 340 files for this 0-15 months period.  It could be that some 
> particular file is causing the problem, but it could also be the case that 
> you are running up against a machine limitation or a CLAN limitation.  In 
> any case, we would need to receive the collection that triggers the 
> problem, along with the command you are using to replicate the problem.  
> You could send this to me or, better, Leonid (spe... at andrew.cmu.edu 
> <javascript:>) as a zipped email attachment, preserving the folder 
> structure you are using.  Before sending to us,  please make sure that this 
> problem is replicable on your side.  You might also want to test on a 
> second computer.  Also please make sure you are using a current version of 
> CLAN.
>
>  
>
> --Brian
>
>  
>
> *From: *<chi... at googlegroups.com> <javascript:> on behalf of Amanda Owen 
> Van Horne <aj... at gmail.com> <javascript:>
> *Reply-To: *"chi... at googlegroups.com" <javascript:> 
> <chi... at googlegroups.com> <javascript:>
> *Date: *Wednesday, March 22, 2017 at 1:10 AM
> *To: *"chi... at googlegroups.com" <javascript:> <chi... at googlegroups.com> 
> <javascript:>
> *Subject: *Large scale combining CHILDES files
>
>  
>
> Hi,  
>
>  
>
> I'm try to combine all available English/non clinical CHILDES files based 
> on the target child's age.  I've organized my files (by hand) into folders 
> binned by month based on the child's age reported in the header information 
> and now I would like to strip CHILDES codes from the speaker tier and 
> output all of those files into a temp file, then I will use this temp file 
> to create a single file of only adult/only child speakers.  The trouble I 
> am running into is as the number of files I am working with gets larger, 
> CLAN seems to skip files. When I run for 0-12 months I get the (expected) 
> 192 files following FLO.  When I run for 0-15 months I get 340 files in the 
> TEMP folder, when I should be getting 372.  This dropping of files 
> continues and becomes more problematic as we move to broader and broader 
> age ranges.  It's hard to track down individual files that might be 
> contributing because so many files are involved.  Can anyone provide any 
> guidance? 
>
>  
>
> Amanda 
>
>  
>
> working directory: CHILDES by Age Folder
>
> output directory: TEMP
>
>  
>
> FLO *.cha -t% +d +r1 +re +ffin
>
>    - FLO -- command to strip codes from main tier 
>    - *.cha -- apply to all files in working directory 
>    - -t% - get rid of non-speaker related tiers like mor and spa 
>    - +d - output in chat format 
>    - +r1 - if something is in () remove () and keep content (e.g., 
>    (be)cause = because) 
>    - +re works recursively through subfolders 
>    - +ffin - output to a file with the code .fin before .cex 
>
> output (TEMP) will fill with *.fin.cex files (one per original file) 
>
> then change your working directory to the temp file. reset your output 
> directory to someplace memorable.
>
> KWAL *.cex -t*CHI +d +r1 +x>0w +u +f
>
>    - KWAL - keyword analysis with no keyword specified outputs all 
>    content 
>    - *.cex  - all files in working directory 
>    - -t*CHI - only adult speakers 
>    - +d  - in chat format 
>    - +r1 - - if something is in () remove () and keep content (e.g., 
>    (be)cause = because) 
>    - +x>0w - only lines with 1 or more words; no empty utterances or 
>    utterances that only have info on other tiers 
>    - +u - combine all output into one file 
>    - +f - print to file (not to the screen)  
>
> Final output from these two processes will end with *.fin.kwal.cex (a 
> single combined file) 
>
>  
>
>
> Amanda J. Owen Van Horne, PhD CCC-SLP
>
> Associate Professor
>
> University of Iowa
>
> amanda-owe... at uiowa.edu <javascript:> 
>
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to chibolts+u... at googlegroups.com <javascript:>.
> To post to this group, send email to chib... at googlegroups.com 
> <javascript:>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/chibolts/CA%2BUfwo47syFFvAc9T-F9m%3DxNhRt8FxmOPBEK9okjaP3iBG%2BTdQ%40mail.gmail.com 
> <https://groups.google.com/d/msgid/chibolts/CA%2BUfwo47syFFvAc9T-F9m%3DxNhRt8FxmOPBEK9okjaP3iBG%2BTdQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to chibolts+u... at googlegroups.com <javascript:>.
> To post to this group, send email to chib... at googlegroups.com 
> <javascript:>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/chibolts/cda1a20adfc14a22974bd654396bc4d6%40PGH-MSGMLT-01.andrew.ad.cmu.edu 
> <https://groups.google.com/d/msgid/chibolts/cda1a20adfc14a22974bd654396bc4d6%40PGH-MSGMLT-01.andrew.ad.cmu.edu?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To post to this group, send email to chibolts at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/47f21e59-151b-4866-9709-e4f971efb432%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20170321/725b31bf/attachment.htm>


More information about the Chibolts mailing list