adult-adult conversation, Santa Barbara corpus
Brian MacWhinney
macw at
Fri Oct 14 18:49:17 UTC 2011
Dear Virginia,
Thanks for your feedback on the SBCSAE corpus. Right now, I am still working on remapping some of the SBCSAE comment fields to the traditional CA codes. After that, I agree that it would be interesting to tag the corpus using MOR and POST. I may end up retracing a few of Paul's steps when I do that. In general, we probably should be doing a bit more application of MOR and POST to these various adult corpora in order to facilitate interesting comparative analyses of the type you are running.
-- Brian MacWhinney
On Oct 14, 2011, at 12:51 PM, Virginia Valian wrote:
> Dear Colleagues,
> I sent a query about sources of adult-adult conversations earlier this year. My thanks to those of you who responded. Here is a follow-up about what we did. We settled on the Santa Barbara of spoken American English corpus (SBCSAE), but we are also looking into the Buckeye corpus.
> Information about the SBCSAE can be found here:
> And here:
> Du Bois, John W., Chafe, Wallace L., Meyer, Charles, and Thompson, Sandra A. 2000. Santa Barbara corpus of spoken American English, Part 1. Philadelphia: Linguistic Data Consortium. ISBN 1-58563-164-7.
> Du Bois, John W., Chafe, Wallace L., Meyer, Charles, Thompson,Sandra A., and Martey, Nii. 2003. Santa Barbara corpus of spoken
> American English, Part 2. Philadelphia: Linguistic Data Consortium. ISBN 1-58563-272-4.
> Du Bois, John W., and Englebretson, Robert. 2004. Santa Barbara corpus of spoken American English, Part 3. Philadelphia: Linguistic
> Data Consortium. ISBN 1-58563-308-9.
> Du Bois, John W., and Englebretson, Robert. 2005. Santa Barbara corpus of spoken American English, Part 4. Philadelphia: Linguistic
> Data Consortium. ISBN: 158563-348-8.
> There were various glitches in the Santa Barbara files that prevented us from using them as they were. We had to clean them.
> The 60 cleaned cha and XML tagged Santa Barbara files that we used are here, if people want to access them:
> Paul Feitzinger, the excellent computer scientist in the Language Acquisition Research Center who cleaned the files, has this to say about how he proceeded:
> We wanted to quickly tag the SBCSAE and convert it to XML, using Chatter so that we could run custom analysis scripts on it.
> We removed all occurrences of "ʔ", trailing and compound-joining "-", and trailing " ' " before tagging.
> After running MOR and POST, we converted all instances of word|? into word|unk. An appearance of "?" would cause the file to fail CHECK and break Chatter.
> After some hand disambiguation, the files passed CHECK and could run through Chatter.
> There was an issue in a couple of spots (e.g., 40.cha: lines 673, 1124) where a "." on the main tier would be represented on the MOR tier with "none", which CHECK and Chatter rejected.
> There are conceptual issues about which examples of adult-adult speech should be compared with adult-child speech. We have not addressed that directly. Our comparisons are on-going, but in our *syntactic* analyses of part-of-speech bigrams, we see little difference between adults talking to adults and adults talking to children, per our poster at AMLaP in September of this year:
> Quirk, E., Feitzinger, P., Richter, C., Zeitlin, M., Chodorow, M., & Valian, V. (2011, September). A computational analysis of grammar change and grammar similarity. Poster presented at AMLaP, Paris, France.
> Best wishes,
> --
> Virginia Valian
> Distinguished Professor
> Department of Psychology, Hunter College
> PhD Programs in Linguistics, Psychology, and Speech-Language-Hearing Sciences, CUNY Grad Center
> vvvstudents at
> --
> You received this message because you are subscribed to the Google Groups "Info-CHILDES" group.
> To post to this group, send email to info-childes at
> To unsubscribe from this group, send email to info-childes+unsubscribe at
> For more options, visit this group at
You received this message because you are subscribed to the Google Groups "Info-CHILDES" group.
To post to this group, send email to info-childes at
To unsubscribe from this group, send email to info-childes+unsubscribe at
For more options, visit this group at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>
More information about the Info-childes
mailing list