the transition to XML and Unicode

Brian MacWhinney macw at cmu.edu
Thu Oct 24 02:59:08 UTC 2002


Dear Colleagues,

  The CHILDES database is now available in XML format from
http://xml.talkbank.org.

XML is the new "language" of the world-wide web.  It is linked up to all
sorts of new and powerful tools for running analyses over the web.   We will
be building those tools over the next months.  Right now, you can only view
the database over the web, but with the new tools you will be able to run
analyses directly.  Eventually, it may also be possible to support some
forms of streaming audio or video from transcripts.

However, in order to match up with the requirements of XML, it was necessary
to devise an XML Schema for the CHAT format and to apply that newer, more
restrictive format to the whole database.  It was also necessary to convert
dozens of earlier font types to the single new Unicode standard.  This was a
really big job.  Except for English files that do not use IPA, all of the
CHILDES files are now in Unicode.

The CLAN editor is now capable of handling Unicode on the Macintosh.  On
Windows, the editor can display Unicode, but it is not yet capable of fully
editing Unicode, although we hope to have that facility available soon.  In
the meantime, as a a stopgap, you can use Windows editors like MS-Word to
edit CLAN files.

We have also tightened up the CHECK program so that it matches more closely
the requirements of the new XML Schema.  Nothing has actually changed in
CHAT.  Rather, CHECK now fully enforces all of the details of CHAT.

If you have any questions about these new facilities, please feel free to
send me notes.  I will also soon post a note about some of these new
developments on a link from the CHILDES home page.

Best wishes,

Brian MacWhinney



More information about the Info-childes mailing list