adult-adult conversation, Santa Barbara corpus

Brian MacWhinney macw at cmu.edu
Fri Oct 14 18:49:17 UTC 2011


Dear Virginia,

    Thanks for your feedback on the SBCSAE corpus.  Right now, I am still working on remapping some of the SBCSAE comment fields to the traditional CA codes. After that, I agree that it would be interesting to tag the corpus using MOR and POST.   I may end up retracing a few of Paul's steps when I do that.   In general, we probably should  be doing a bit more application of MOR and POST to these various adult corpora in order to facilitate interesting comparative analyses of the type you are running.

-- Brian MacWhinney

On Oct 14, 2011, at 12:51 PM, Virginia Valian wrote:

> Dear Colleagues,
> 
> I sent a query about sources of adult-adult conversations earlier this year.  My thanks to those of you who responded.  Here is a follow-up about what we did.  We settled on the Santa Barbara of spoken American English corpus (SBCSAE), but we are also looking into the Buckeye corpus.
> 
> Information about the SBCSAE can be found here:  http://www.linguistics.ucsb.edu/research/sbcorpus.html
> 
> And here:
> Du Bois, John W., Chafe, Wallace L., Meyer, Charles, and Thompson, Sandra A. 2000. Santa Barbara corpus of spoken American English, Part 1. Philadelphia: Linguistic Data Consortium. ISBN 1-58563-164-7.
> 
> Du Bois, John W., Chafe, Wallace L., Meyer, Charles, Thompson,Sandra A., and Martey, Nii. 2003. Santa Barbara corpus of spoken
> American English, Part 2. Philadelphia: Linguistic Data Consortium. ISBN 1-58563-272-4.
> 
> Du Bois, John W., and Englebretson, Robert. 2004. Santa Barbara corpus of spoken American English, Part 3. Philadelphia: Linguistic
> Data Consortium. ISBN 1-58563-308-9.
> 
> Du Bois, John W., and Englebretson, Robert. 2005. Santa Barbara corpus of spoken American English, Part 4. Philadelphia: Linguistic
> Data Consortium. ISBN: 158563-348-8.
> 
> There were various glitches in the Santa Barbara files that prevented us from using them as they were.  We had to clean them.
> 
> The 60 cleaned cha and XML tagged Santa Barbara files that we used are here, if people want to access them:
> http://www.hunter.cuny.edu/littlelinguist/data/SBCSAE/
> 
> Paul Feitzinger, the excellent computer scientist in the Language Acquisition Research Center who cleaned the files, has this to say about how he proceeded:
> We wanted to quickly tag the SBCSAE and convert it to XML, using Chatter so that we could run custom analysis scripts on it.
> We removed all occurrences of "ʔ", trailing and compound-joining "-", and trailing " ' " before tagging.
> After running MOR and POST, we converted all instances of word|? into word|unk.  An appearance of "?" would cause the file to fail CHECK and break Chatter.
> After some hand disambiguation, the files passed CHECK and could run through Chatter.
> There was an issue in a couple of spots (e.g., 40.cha: lines 673, 1124) where a "." on the main tier would be represented on the MOR tier with "none", which CHECK and Chatter rejected.
> There are conceptual issues about which examples of adult-adult speech should be compared with adult-child speech.  We have not addressed that directly.  Our comparisons are on-going, but in our *syntactic* analyses of part-of-speech bigrams, we see little difference between adults talking to adults and adults talking to children, per our poster at AMLaP in September of this year:
> 
> Quirk, E., Feitzinger, P., Richter, C., Zeitlin, M., Chodorow, M., & Valian, V.  (2011, September).  A computational analysis of grammar change and grammar similarity.  Poster presented at AMLaP, Paris, France.
> 
> Best wishes,
> 
> VVV
> -- 
> Virginia Valian
> Distinguished Professor
> Department of Psychology, Hunter College
> PhD Programs in Linguistics, Psychology, and Speech-Language-Hearing Sciences, CUNY Grad Center
> vvvstudents at gmail.com
> 
> -- 
> You received this message because you are subscribed to the Google Groups "Info-CHILDES" group.
> To post to this group, send email to info-childes at googlegroups.com.
> To unsubscribe from this group, send email to info-childes+unsubscribe at googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/info-childes?hl=en.

-- 
You received this message because you are subscribed to the Google Groups "Info-CHILDES" group.
To post to this group, send email to info-childes at googlegroups.com.
To unsubscribe from this group, send email to info-childes+unsubscribe at googlegroups.com.
For more options, visit this group at http://groups.google.com/group/info-childes?hl=en.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/info-childes/attachments/20111014/de84d688/attachment.htm>


More information about the Info-childes mailing list