Large scale combining CHILDES files
Amanda Owen Van Horne
ajowen at gmail.com
Tue Mar 21 19:34:59 UTC 2017
Hi Leonid,
Thank you - the commands in step 1 identified a set of corrupted files.
I'm very grateful. It will take me some time to restore those files to the
right directories, but I suspect that will solve the problem.
Amanda
On Tuesday, March 21, 2017 at 1:48:02 PM UTC-5, Spektor, Leonid: CMU wrote:
>
> Amanda,
>
> I want you to try two thing.
>
> 1. Please set working directory: CHILDES by Age Folder to 0-15 months and
> run command "dir -r *.cha" at the end of the output in "CLAN Output" window
> you will see how many files CLAN has found. If the number is 340 files,
> then for some reason, maybe bad file extension or bad directory name or
> file protection, CLAN can't see other files as .cha files. In this case run
> command "dir -r -n *.cha" and you will see files that CLAN doesn't
> recognize as .cha files.
>
> 2. If "dir -r *.cha" command finds 372 files, then the problem might be
> with FLO command or "+re" function. Please get data from our server at URL
> "http://childes.talkbank.org/data/Eng-NA/Braunwald.zip"
> <http://childes.talkbank.org/data/Eng-NA/Braunwald.zip>. Unzip it and in
> CLAN set working directory to unzipped Braunwald directory. Set output to
> TEMP directory that is empty and run command "FLO *.cha -t% +d +r1 +re
> +ffin". On my Mac and Windows 10 PC I get 900 .fin.cex files in TEMP
> directory. If you get the same number, then something is wrong with files
> in your 0-15 months set. If you get a different number, then make sure you
> have the latest CLAN. Maybe even reboot your computer and try the same
> above command again.
>
> If you still get less than 900 files in TEMP directory, then please email
> to me directly the full output of CLAN Output window after you run "FLO
> *.cha -t% +d +r1 +re +ffin" command, tell me if you are using Mac or PC.
>
> If you get 900 files in TEMP, but you still can't figure out why in step 1
> you get 340 files, then zip and email your 0-15 months directory to me and
> I will see if I can figure out what is wrong.
>
> Leonid.
>
>
> On 21-03-17 13:23, Brian MacWhinney wrote:
>
> Dear Amanda,
>
>
>
> From what you write, the problem occurs during your use of FLO. For us
> (Leonid or me) to replicate the problem, we would need the complete
> collection of 340 files for this 0-15 months period. It could be that some
> particular file is causing the problem, but it could also be the case that
> you are running up against a machine limitation or a CLAN limitation. In
> any case, we would need to receive the collection that triggers the
> problem, along with the command you are using to replicate the problem.
> You could send this to me or, better, Leonid (spe... at andrew.cmu.edu
> <javascript:>) as a zipped email attachment, preserving the folder
> structure you are using. Before sending to us, please make sure that this
> problem is replicable on your side. You might also want to test on a
> second computer. Also please make sure you are using a current version of
> CLAN.
>
>
>
> --Brian
>
>
>
> *From: *<chi... at googlegroups.com> <javascript:> on behalf of Amanda Owen
> Van Horne <aj... at gmail.com> <javascript:>
> *Reply-To: *"chi... at googlegroups.com" <javascript:>
> <chi... at googlegroups.com> <javascript:>
> *Date: *Wednesday, March 22, 2017 at 1:10 AM
> *To: *"chi... at googlegroups.com" <javascript:> <chi... at googlegroups.com>
> <javascript:>
> *Subject: *Large scale combining CHILDES files
>
>
>
> Hi,
>
>
>
> I'm try to combine all available English/non clinical CHILDES files based
> on the target child's age. I've organized my files (by hand) into folders
> binned by month based on the child's age reported in the header information
> and now I would like to strip CHILDES codes from the speaker tier and
> output all of those files into a temp file, then I will use this temp file
> to create a single file of only adult/only child speakers. The trouble I
> am running into is as the number of files I am working with gets larger,
> CLAN seems to skip files. When I run for 0-12 months I get the (expected)
> 192 files following FLO. When I run for 0-15 months I get 340 files in the
> TEMP folder, when I should be getting 372. This dropping of files
> continues and becomes more problematic as we move to broader and broader
> age ranges. It's hard to track down individual files that might be
> contributing because so many files are involved. Can anyone provide any
> guidance?
>
>
>
> Amanda
>
>
>
> working directory: CHILDES by Age Folder
>
> output directory: TEMP
>
>
>
> FLO *.cha -t% +d +r1 +re +ffin
>
> - FLO -- command to strip codes from main tier
> - *.cha -- apply to all files in working directory
> - -t% - get rid of non-speaker related tiers like mor and spa
> - +d - output in chat format
> - +r1 - if something is in () remove () and keep content (e.g.,
> (be)cause = because)
> - +re works recursively through subfolders
> - +ffin - output to a file with the code .fin before .cex
>
> output (TEMP) will fill with *.fin.cex files (one per original file)
>
> then change your working directory to the temp file. reset your output
> directory to someplace memorable.
>
> KWAL *.cex -t*CHI +d +r1 +x>0w +u +f
>
> - KWAL - keyword analysis with no keyword specified outputs all
> content
> - *.cex - all files in working directory
> - -t*CHI - only adult speakers
> - +d - in chat format
> - +r1 - - if something is in () remove () and keep content (e.g.,
> (be)cause = because)
> - +x>0w - only lines with 1 or more words; no empty utterances or
> utterances that only have info on other tiers
> - +u - combine all output into one file
> - +f - print to file (not to the screen)
>
> Final output from these two processes will end with *.fin.kwal.cex (a
> single combined file)
>
>
>
>
> Amanda J. Owen Van Horne, PhD CCC-SLP
>
> Associate Professor
>
> University of Iowa
>
> amanda-owe... at uiowa.edu <javascript:>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to chibolts+u... at googlegroups.com <javascript:>.
> To post to this group, send email to chib... at googlegroups.com
> <javascript:>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/chibolts/CA%2BUfwo47syFFvAc9T-F9m%3DxNhRt8FxmOPBEK9okjaP3iBG%2BTdQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/chibolts/CA%2BUfwo47syFFvAc9T-F9m%3DxNhRt8FxmOPBEK9okjaP3iBG%2BTdQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to chibolts+u... at googlegroups.com <javascript:>.
> To post to this group, send email to chib... at googlegroups.com
> <javascript:>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/chibolts/cda1a20adfc14a22974bd654396bc4d6%40PGH-MSGMLT-01.andrew.ad.cmu.edu
> <https://groups.google.com/d/msgid/chibolts/cda1a20adfc14a22974bd654396bc4d6%40PGH-MSGMLT-01.andrew.ad.cmu.edu?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>
>
--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To post to this group, send email to chibolts at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/47f21e59-151b-4866-9709-e4f971efb432%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20170321/725b31bf/attachment.htm>
More information about the Chibolts
mailing list