Excluding stories and songs from corpus

Brian MacWhinney macw at andrew.cmu.edu
Wed Apr 10 01:25:25 UTC 2019


Dear Simge,
    I'm not sure that I fully understand your criteria for excluding utterances with repeated words.  For example, what if a common word like "the" or "of" is used in both utterances?  Do you then really want to exclude the second one?  There is a program called CHIP that carefully analyzes th overlap between sentences in terms of repeated words, but it might not do exactly what you want.  
    I am curious why you think it is important to conduct these different types of exclusions.  What exactly are you looking for?  What hypothesis might you be testing?

-- Brian MacWhinney

> On Apr 9, 2019, at 8:38 PM, sit591 at g.harvard.edu wrote:
> 
> Hi Prof. MacWhinney,
> 
> Thanks for your reply! Well, I guess it will take me a while to do this.
> 
> I have another question regarding the same study. Right now, I am using the code kwal +sX -w10 +w5 -t*CHI, where X is meant to be a placeholder for the words that I am interested in searching in the input. Ideally, however, I would prefer selecting a stretch of talk like this only if the target utterance that contains the word X does not constitute a repetition of the immediately preceding line (e.g., the parent only uses X because another speaker said X in the immediately preceding line). My question is pretty much the same as above: is there a practical way to exclude repetitive utterances of this sort?
> 
> Thank you so much!
> 
> Simge
> 
> 
> 
> On Monday, April 8, 2019 at 5:00:42 PM UTC-4, sit... at g.harvard.edu wrote:
> Hi all,
> 
> I am doing a corpus study using the Providence corpus right now. For the purposes of this study, I am interested in analyzing only the utterances that are produced by the speakers during their natural conversational exchanges, but the corpus also includes many stretches of talk that consist of the stories that parents read to the children, or songs and nursery rhymes they sing, etc. Is there a practical way to weed out these parts from the corpus or do I have to face the gargantuan task of eliminating them manually?
> 
> Thanks in advance for your help!
> 
> Simge Topaloglu
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com <mailto:chibolts+unsubscribe at googlegroups.com>.
> To post to this group, send email to chibolts at googlegroups.com <mailto:chibolts at googlegroups.com>.
> To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/53006962-d369-49f6-9e9e-809ca73708b5%40googlegroups.com <https://groups.google.com/d/msgid/chibolts/53006962-d369-49f6-9e9e-809ca73708b5%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.

-- 
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To post to this group, send email to chibolts at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/7DC7A463-29C5-4F87-9E82-22C907F3E2EF%40andrew.cmu.edu.
For more options, visit https://groups.google.com/d/optout.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20190409/0f829bda/attachment.htm>


More information about the Chibolts mailing list