19.2459, Diss: Comp Ling: Rieser: 'Bootstrapping Reinforcement ...'

Fri Aug 8 19:06:18 UTC 2008

LINGUIST List: Vol-19-2459. Fri Aug 08 2008. ISSN: 1068 - 4875.

Subject: 19.2459, Diss: Comp Ling: Rieser: 'Bootstrapping Reinforcement ...'

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews: Randall Eggert, U of Utah  
         <reviews at linguistlist.org> 

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, 
and donations from subscribers and publishers.

Editor for this issue: Evelyn Richter <evelyn at linguistlist.org>
================================================================  

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.

===========================Directory==============================  

1)
Date: 08-Aug-2008
From: Verena Rieser < vrieser at inf.ed.ac.uk >
Subject: Bootstrapping Reinforcement Learning-Based Dialogue Strategies from Wizard-of-Oz Data

-------------------------Message 1 ---------------------------------- 
Date: Fri, 08 Aug 2008 15:04:56
From: Verena Rieser [vrieser at inf.ed.ac.uk]
Subject: Bootstrapping Reinforcement Learning-Based Dialogue Strategies from Wizard-of-Oz Data
E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=19-2459.html&submissionid=186364&topicid=14&msgnumber=1  

Institution: Saarland University 
Program: Department of Computational Linguistics and Phonetics 
Dissertation Status: Completed 
Degree Date: 2008 

Author: Verena Rieser

Dissertation Title: Bootstrapping Reinforcement Learning-Based Dialogue
Strategies from Wizard-of-Oz Data 

Dissertation URL:  http://homepages.inf.ed.ac.uk/vrieser/thesis.html

Linguistic Field(s): Computational Linguistics

Dissertation Director(s):
Oliver Lemon
Manfred Pinkal

Dissertation Abstract:

In my PhD thesis, I develop a framework to optimise multimodal dialogue
strategies from small amounts of Wizard-of-Oz (WOZ) data.

Designing a spoken dialogue system can be a time-consuming and challenging
process. To facilitate strategy development, recent research investigates
the use of Reinforcement Learning (RL) methods applied to automatic
dialogue strategy optimisation from real data. For new application domains
where a system is designed from scratch, however, there is often no
suitable in-domain data available, leaving the developer with a classic
chicken-and-egg problem.

This thesis proposes to learn dialogue strategies by simulation-based RL,
where the simulated environment is learned from small amounts of
Wizard-of-Oz data. Using WOZ data rather than data from real Human-Computer
Interaction allows us to learn optimal strategies for new application areas
beyond the scope of existing dialogue systems. Optimised learned strategies
are then available from the first moment of online-operation, and tedious
handcrafting of dialogue strategies is fully omitted. We call this method
'bootstrapping'.

Our results show that a dialogue policy constructed using this framework
significantly outperforms a non-optimised data-driven policy (constructed
via Supervised Learning) in in terms of subjective user ratings and
objective dialogue performance measures. For example, RL leads to an almost
50% increase in perceived Task Ease and almost 20% increase in Future Use.

The technical contributions of this thesis are new methods and techniques
introduced to learn a simulated learning environment from small amounts of
WOZ data. For example, a new method to learn and evaluate user simulations,
and non-linear reward functions are introduced. The overall contribution is
an end-to-end data-driven framework to design and evaluate RL-based
dialogue strategies - from data collection to user testing. 

-----------------------------------------------------------
LINGUIST List: Vol-19-2459