Frequency of words

Leonid Spektor spektor at andrew.cmu.edu
Wed Sep 13 20:51:36 UTC 2017


Brie,

	The answer depends on whether you are interested in words on speaker tier or lemmas on %mor tier. Your command lines ask for both. I have changed you sample file "Content_Tester.cha" by adding word "fairy" that is not an error or a replacement, so you should get 1 "fairy" count in the output of the following two command lines:

For "fairy" lemmas, except errors and target replacement, you want:

freq -sm** -sm@* +sm;fairy Content_Tester.cha

For "fairy" words on speaker tier, except errors and target replacement, you want:

freq -s<**> -s<:*> +sfairy Content_Tester.cha


Leonid.




> On Sep 13, 2017, at 13:49, Brielle Stark <brielle.stark at gmail.com> wrote:
> 
> Hello all.
> 
> I have a question about calculating word frequency. We're working with aphasia participants who will often make mistakes, and when they do make mistakes, we'll put in the intended word into [: target] if we know what the intention was. However, I do not want to count [: target] words in the frequency tally of words. Basically, if someone said furry [: fairy] in one instance, and I am looking for a frequency count of the correctly spoken 'fairy,' I want the frequency calculation for 'fairy' to be 0, thus ignoring the word in the target. Further, I'd also like to run for lemmas and not morphological changes. In other words, if I'm looking for "stair," I want 'stairs' to be counted in the frequency of 'stair' usage.
> 
> Detail:
> 
> When I run the command:
> 
> freq -sm** -sm@* +sCinderella +sstair +sfairy
> 
> on the attached transcript [completely made up, by the way], it evaluates the %mor line but doesn't ignore the target [: target] words like I thought it would. It does do the correct job in tagging 'stair' even though the participant said 'stairs,' a correct usage from the %mor line. Output of frequency for this command was:
> Cinderella: 1
> stair: 1
> fairy: 1
> 
> However, as I said, I wouldn't want the incorrect furry [: fairy] to count. So, I tried:
> 
> freq -sm** -sm@* +t*PAR +sCinderella +sstair +sfairy
> 
> Now that I've told CLAN to stick to the speaker tier, it then ignores 'stair' because 'stairs' was written, which isn't what we were going for. However, it correctly does not look within the [: target] and correctly states that 'fairy' was said 0 times. As an added point, I've also found that when I run the above command on transcripts, it sometimes gets the counts incorrect. For this command, I get the count:
> Cinderella: 1
> stair: 0
> fairy: 0
> 
> So basically, is there any way to tell CLAN to run the analysis on the %mor tier for frequencies of words [specifically, lemmas], but somehow to specify to ignore [: target] words on the speaker tier? 
> 
> In an ideal world, from the attached transcript, I'd be getting the frequency counts as:
> Cinderella: 1
> stair: 1
> fairy: 0
> 
> Thank you very much,
> 
> Brie
> 
> -- 
> Brielle Stark, PhD
> Post Doctoral Fellow in Communication Sciences and Disorders, University of South Carolina
> t: +1 803-777-9240, alternate email: stark2 at mailbox.sc.edu <mailto:stark2 at mailbox.sc.edu>
> Aphasia Lab: http://web.asph.sc.edu/aphasia/ <http://web.asph.sc.edu/aphasia/>
> Center for the Study of Aphasia Recovery: http://web.asph.sc.edu/cstar/ <http://web.asph.sc.edu/cstar/>
> 
> -- 
> You received this message because you are subscribed to the Google Groups "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com <mailto:chibolts+unsubscribe at googlegroups.com>.
> To post to this group, send email to chibolts at googlegroups.com <mailto:chibolts at googlegroups.com>.
> To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/CAEs2yToSuaOv1de5DWc4CS3h6HR7YEdgUYm0SQ1oxBDC1%2BRcFg%40mail.gmail.com <https://groups.google.com/d/msgid/chibolts/CAEs2yToSuaOv1de5DWc4CS3h6HR7YEdgUYm0SQ1oxBDC1%2BRcFg%40mail.gmail.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
> 

-- 
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To post to this group, send email to chibolts at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/ADD234CB-2730-4C0E-929D-7A8BABA3DFD4%40andrew.cmu.edu.
For more options, visit https://groups.google.com/d/optout.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20170913/c980508f/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Content_Tester.cha
Type: application/octet-stream
Size: 499 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20170913/c980508f/attachment-0002.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20170913/c980508f/attachment-0001.htm>


More information about the Chibolts mailing list