corpus based research at Stanford

Brian MacWhinney macw at CMU.EDU
Wed Dec 15 23:59:20 UTC 1999


Marianne, Joan, and other people interested in corpora,

  Mariane is right in saying that "there may be considerably more active,
corpus-based linguistics going on than many realize."  But why is this
pivotal aspect of linguistics so low-profile?
  One way of increasing the profile of corpus-based work is for linguists and
their allies to begin to develop more effective ways of sharing corpora,
including spoken language corpora.  One move in this direction is the new
TalkBank project, which NSF (KDI/Linguistics/SBE) has recently funded.  (see
http://talkbank.org)
  The goal of TalkBank is to provide computational tools that support
corpus-based linguistics and related efforts in about a dozen disciplines
devoted to the study of spoken communication.
  On Dec 4-5, we held a first TalkBank workshop that explored the
construction of a database for the study of language used in classrooms and
tutorial interactions.  http://www.talkbank.org/meetings.html
  The next TalkBank meeting is devoted to Linguistic Exploration (or what
some people might call "field linguistics").  If you are attending LSA this
year and are interested in sharing data on spoken communications, please take
a look at the program for January 6 at
http://www.talkbank.org/exploration.html
   My guess is that there is a wealth of fantastic spoken language data out
there from languages such as "Mohawk, Tuscarora, Cayuga, Seneca, Caddo,
Central Pomo, Central Alaskan Yup'ik, Kapampangan, Mandarin, Korean, and
Japanese" to name just a few that would greatly benefit the progress of
empirically-grounded research across linguistics and allied areas.  With our
new computational tools we can access these data directly over the Internet
(while respecting confidentiality as required).  Sounds can be directly
linked to transcripts and data can be elaborated with collaborative
commentary.
   TalkBank can provide us all with a way of gaining shared access to these
data.  In this way, we can also gain a better understanding of the actual
data our colleagues are looking at.

--Brian MacWhinney



More information about the Funknet mailing list