CHAT/CLAN discussion
Brian MacWhinney
macw at mac.com
Fri Jun 17 16:01:45 UTC 2005
Dear Bracha, Yahya, and Katherine,
Thanks for these comments. Bracha's clarifications on this were
very useful and much appreciated. Katherine's notes, while briefer,
were equally accurate and helpful. Let me add a few points on the
three issues Yahya raises.
1. Import and export from CHAT files. Katherine is right that there
are advantages to using CHAT from the beginning, but Bracha is also
right in saying that you can also use Word if you prefer. However,
if you use Word, you have to take great care to export frequently to
make sure that, when you run CHECK in CLAN, you are not making CHAT
errors. If you want to link audio to transcripts, work with video,
or use CA notation, you have to stay within CLAN. But if you are
just typing, you can create text-only files in Word and then open
them in CLAN. Word and other word processors will open CHAT files
with no problem at all. You just open the file as text only.
2. Regarding numerical analysis, Yahya is right that most analysis
involves counting things.
I think that many CLAN users do not realize how easy it is to cut
and paste between CLAN and Excel. The main trick is that you have to
know how to use the Excel import function and to use tabs as the
delimiters between fields.
3. As Bracha notes, CLAN does fine with highly nested hierarchies.
A good example of this is the Ninio-Snow-Pan-Rollins INCA speech act
coding system which is usually elaborated in terms of a three-level
hierarchy. We have also defined four-level hierarchies for speech
errors. These codes are typically placed on something like a %cod
line. They can be inserted by Coder's Editor or by hand. It is true
that CLAN takes a very different approach to counting across merged
and not merged levels. The CLAN approach uses the wild cards of *
and % to aggregate in +s search strings. I have never worked in
Excel to aggregate and disaggregate, but I can imagine that it works
easily there. The disadvantage of something like Excel is that you
have no real link between the transcript (or the media) and the
codes. You can't click on a cell in an Excel spreadsheet and then
replay the original transcript to verify the accuracy of your codes.
Of course, if you have tested for the reliability of your coding, and
that rate is extremely high across all codes, then you are safe. But
my own experience with real-life coding suggests that things are
seldom all that simple. This is a core methodological danger
involved in relying primarily on an Excel-based format.
We have often thought about the possibility of developing improved
methods of exporting numerical data to Excel. It is possible that
the hierarchical codes of the %cod line could be exported in this
way. If people have suggestions about what Excel formats would be
useful, we would like to hear about them in detail. Katherine points
to a specific CHILDES > Excel conversion program. Perhaps that is
something I should learn more about.
Finally, Yahya, let me note that projects that choose to code in
Excel and SPSS or to transcribe in unstructured Word documents will
produce data that is not amenable to data-sharing. Research in
language learning has benefitted greatly from the willingness of
researchers to engage in data-sharing. If we can provide tools that
correctly address your analytic needs, we hope that you can use these
tools to produce data that will be shared with the larger community.
--Brian MacWhinney, CMU
More information about the Info-childes
mailing list