[Corpora] [Corpora-List] KTH Human-Computer Map Task Corpora

Raveesh Meena raveesh at csc.kth.se
Thu Nov 20 09:09:32 UTC 2014


*KTH Human-Computer Map Task Corpora <http://www.speech.kth.se/maptask/>*

A common procedure in modelling human-like dialogue systems is to collect
data on human–human dialogue and then train models that predict the
behaviour of the interlocutors. However, we think that it might be
problematic to use a corpus of human–human dialogue as a basis for
implementing dialogue system components. One problem is the interactive
nature of the task. If the system produces a slightly different behaviour
than what was found in the original data, this would likely result in a
different behaviour in the interlocutor. Another problem is that humans are
likely to behave differently towards a system as compared to another human
(even if a more human-like behaviour is being modelled). Yet another
problem is that much dialogue behaviour is optional and therefore makes the
actual behaviour hard to use as a gold standard.

The KTH Human-Computer Map Task Corpora has been collected as part of our
efforts towards building data-driven models for Response Location
Detection: detecting when in the user's speech is it appropriate for a
system to provide a response (Skantze, 2012; Meena et al., 2013a; Meena et
al., 2014). Map Task is a common experimental paradigm for studying
human-human dialogue, where one subject (the information giver) is given
the task of describing a route on a map to another subject (the information
follower) (Anderson et al., 1991). For example, Cathcart et al. (2003) used
the HCRC Map Task data to train a shallow model for prediction of
backchannel continuers in an interaction. Similarly, Koiso et al. (1998)
conducted the Japanese Map Task dialogue and presented their findings on
turn-transition and backchannel relevant places. In the KTH Human-Computer
Map Task the user acts as the giver and the system (a dialogue system) as
the follower. Our main objective behind this was to be able to collect
corpus and build data-driven models for detecting when in the user's speech
is it appropriate for a system to provide a response. The nature of the
response could be anything: a back-channel, a clarification request or a
question. However, it was not the objective of the study to identify the
nature of the response. We only wanted to predict the appropriateness in
terms of timing.


We believe the data could be useful for researchers in the dialogue system
community and have now made it public. The corpora can be downloaded from
http://www.speech.kth.se/maptask/ . It comprises of two data-sets, the
first is the Training-Set, which was collected to train various data-driven
models of RLD (Skantze, 2012). The trained model was then integrated into
the same system (used for data collection) and evaluated through new users
in the same Map Task interaction (Meena et al., 2013a; Meena et al.,
2013b; Meena
et al., 2014). The interaction data collected from the user evaluation
comprises the second data-set.


The corpora is released only for research purpose and presentation at
scientific conferences. If you use this corpus in your research, please
cite the following article:

   - Meena, R., Skantze, G., & Gustafson, J. (2014). Data-driven Models for
   timing feedback responses in a Map Task dialogue system. Computer Speech
   and Language, 28(4), 903-922.

Please contact Raveesh Meena (raveeshATcsc.kth.se) or

Gabriel Skantze (gskantzeATspeech.csc.kth) if you have any questions.


Further instructions about the corpora are available at
http://www.speech.kth.se/maptask/

Anderson, A., Bader, M., Bard, E., Boyle, E., Doherty, G., Garrod, S.,
Isard, S., Kowtko, J., McAllister, J., Miller, J., Sotillo, C., Thompson,
H., & Weinert, R. (1991). The HCRC Map Task corpus. Language and Speech, 34(4),
351-366.

Cathcart, N., Carletta, J., & Klein, E. (2003). A shallow model of
backchannel continuers in spoken dialogue. In 10th Conference of the
European Chapter of the Association for Computational Linguistics. Budapest.

Koiso, H., Horiuchi, Y., Tutiya, S., Ichikawa, A., & Den, Y. (1998). An
analysis of turn-taking and backchannels based on prosodic and syntactic
features in Japanese Map Task dialogs. Language and Speech, 41, 295-321.

Meena, R., Skantze, G., & Gustafson, J. (2013a). A Data-driven Model for
Timing Feedback in a Map Task Dialogue System. In 14th Annual Meeting of
the Special Interest Group on Discourse and Dialogue - SIGdial (pp.
375-383). Metz, France.

Meena, R., Skantze, G., & Gustafson, J. (2013b). The Map Task Dialogue
System: A Test-bed for Modelling Human-Like Dialogue. In 14th Annual
Meeting of the Special Interest Group on Discourse and Dialogue - SIGdial
(pp. 366-368). Metz, France.

Meena, R., Skantze, G., & Gustafson, J. (2014). Data-driven Models for
timing feedback responses in a Map Task dialogue system. Computer Speech
and Language, 28(4), 903-922.

Skantze, G. (2012). A Testbed for Examining the Timing of Feedback using a
Map Task. In Proceedings of the Interdisciplinary Workshop on Feedback
Behaviors in Dialog. Portland, OR.
best
Raveesh

-- 
Raveesh Meena
PhD / Graduate Student in CS

Department of Speech, Music and Hearing
Royal Institute of Technology (KTH)
Lindstedtsvägen 24
SE-100 44 Stockholm,
Sweden

Phone: +46-(0)-8-790 7872
Fax : +46-(0)-8-790 7854

Email:  raveesh[at]csc.kth.se
http://www.speech.kth.se/~raveesh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20141120/7d93e8b8/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list