Corpora: Santa Barbara Corpus

Chris Manning manning at CS.Stanford.EDU
Mon Aug 7 15:50:28 UTC 2000


On 7 August 2000, Lou Burnard wrote:
 > Hmm. So instead of using pre-existing standards which at least have a
 > chance of being implemented across different computer platforms, it's
 > better to make up an entirely arbitrary set of codes of your own for
 > which *everyone* has to write their own software?

This is a little harsh.  The transcription format used has existed and
been developed for many years in the conversational/discourse analysis
community -- and versions of it can be found in books such as Edwards'
Talking Data: Transcription and Coding in Discourse Research or
Schiffrin's Approaches to Discourse.

At most the LDC could be faulted for leaving the data in such a format
-- one clearly designed more for human observation than easy computer
manipulation -- rather than converting it to a more computer friendly
standard markup.

Chris Manning



More information about the Corpora mailing list