Running FREQ for bilingual transcripts

Lulu lulusong at gmail.com
Wed Jun 29 20:56:36 UTC 2016


Dear Brian,

But I'd like to get separate counts for English and Chinese words. Let me 
rephrase my question. In a predominantly English transcript, I'd like to 
get a count of ALL English words, including the ones embedded in [- zho] 
lines marked with @s. I can now achieve this by running two separate 
commands (TCH is teacher):

freq +tTCH -s"[- zho-yue]" -s"[- zho]" *.cha
freq +tTCH +s*@s* *.cha

There are two issues with the 2-command solution:
1. I get two sets of counts that need to be summed manually.
2. The same word with and without @s are counted as two types.

I was wondering if there's a way to combine these two commands and resolve 
these two issues (or at least one). Below is an excerpt of my transcript 
(TCH is teacher) (for IRB reasons, I cannot provide the whole transcript):

*TCH:    you can take this heart .
*TCH:    [- zho] this at s$n 星 .

Thank you so much for your patience and kindness.

Lulu

On Wednesday, June 29, 2016 at 2:41:46 PM UTC-4, Brian MacWhinney wrote:
>
> Dear Lulu,
>
> I think you want +s:[- zho]” in this case, not –s”[- zho]”  When I run
>
>  
>
> freq +s"[- yue]" +t*CHI *.cha +u
>
>  
>
> on CharlotteEng, I get both the English words marked as @s and the 
> Cantonese.  
>
>  
>
> --Brian
>
>  
>
> *From: *ChiBolts <chib... at googlegroups.com <javascript:>> on behalf of 
> Lulu <lulu... at gmail.com <javascript:>>
> *Reply-To: *ChiBolts <chib... at googlegroups.com <javascript:>>
> *Date: *Tuesday, June 28, 2016 at 10:34 PM
> *To: *ChiBolts <chib... at googlegroups.com <javascript:>>
> *Subject: *Re: Running FREQ for bilingual transcripts
>
>  
>
> Hi Brian,
>
> I tried to run the reverse command on the same transcript (mostly English 
> with a dozen words in Chinese)
>
> freq +tTCH -s"[- zho]" +s”*@s” *.cha (I added * after @s because my 
> transcript also tags if the @s word is a noun or a verb)
>
> hoping to add the few @s English words embedded in [- zho] lines to the 
> English word counts, but only got 0's. With +s"*@s*" removed, I get good 
> results which don't include the @s English words. Not sure how I can fix 
> this.
>
> Thanks!
>
> Lulu
>
> On Tuesday, June 28, 2016 at 10:22:40 PM UTC-4, Lulu wrote: 
>
> Dear Brian,
>
> That just did magic! Thank you so much!
>
> Best,
> Lulu
>
> On Tuesday, June 28, 2016 at 10:15:26 PM UTC-4, Brian MacWhinney wrote: 
>
> Dear Lulu,
>
> Without seeing your transcripts, I can’t say exactly what is wrong.  
> However, if you run this similar command on the CharlotteEng folder in the 
> YipMatthews corpus, you get good results:
>
>  
>
> freq +t*CHI +s"[- yue]" *.cha
>
>  
>
> The idea is that this will include all words on the [- yue] lines 
> including those with @s, although the latter are pretty rare.  If you want 
> to exclude those, just add –s”*@s”
>
>  
>
> -- Brian MacWhinney
>
>  
>
> *From: *ChiBolts <chib... at googlegroups.com> on behalf of Lulu <
> lulu... at gmail.com>
> *Reply-To: *ChiBolts <chib... at googlegroups.com>
> *Date: *Tuesday, June 28, 2016 at 5:10 PM
> *To: *ChiBolts <chib... at googlegroups.com>
> *Subject: *Running FREQ for bilingual transcripts
>
>  
>
> Hi Brian and team members,
>
> I ran the freq command
>
> freq +tTCH +s"[- zho]" *.cha
>
> for transcripts that contain bilingual utterances (e.g., *TCH:    [- zho] 
> this at s$n 星). The dominant language of the transcripts was English so we 
> marked utterances that contained Chinese with [- zho]. The output types and 
> tokens included all the English words that were marked @s. I thought I 
> would get the types and tokens of all the Chinese words by running the 
> above command. Is the problem with the transcript or the command?
>
> Thank you!
>
> Lulu
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to chibolts+u... at googlegroups.com.
> To post to this group, send email to chib... at googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/chibolts/0e5d867c-79b1-4d36-87be-1303f390a83b%40googlegroups.com 
> <https://groups.google.com/d/msgid/chibolts/0e5d867c-79b1-4d36-87be-1303f390a83b%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to chibolts+u... at googlegroups.com <javascript:>.
> To post to this group, send email to chib... at googlegroups.com 
> <javascript:>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/chibolts/56bb5824-2983-4e64-9111-42841037333f%40googlegroups.com 
> <https://groups.google.com/d/msgid/chibolts/56bb5824-2983-4e64-9111-42841037333f%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To post to this group, send email to chibolts at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/948766f2-00d7-4ce1-a775-af37ae759393%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20160629/dc3d5f26/attachment.htm>


More information about the Chibolts mailing list