Corpora: Santa Barbara Corpus
Chris Manning
manning at CS.Stanford.EDU
Mon Aug 7 15:50:28 UTC 2000
On 7 August 2000, Lou Burnard wrote:
> Hmm. So instead of using pre-existing standards which at least have a
> chance of being implemented across different computer platforms, it's
> better to make up an entirely arbitrary set of codes of your own for
> which *everyone* has to write their own software?
This is a little harsh. The transcription format used has existed and
been developed for many years in the conversational/discourse analysis
community -- and versions of it can be found in books such as Edwards'
Talking Data: Transcription and Coding in Discourse Research or
Schiffrin's Approaches to Discourse.
At most the LDC could be faulted for leaving the data in such a format
-- one clearly designed more for human observation than easy computer
manipulation -- rather than converting it to a more computer friendly
standard markup.
Chris Manning
More information about the Corpora
mailing list