Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning

2011

Article

ei

Direct policy search is a promising reinforcement learning framework, in particular for controlling continuous, high-dimensional systems. Policy search often requires a large number of samples for obtaining a stable policy update estimator, and this is prohibitive when the sampling cost is expensive. In this letter, we extend an expectation-maximization-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, reward-weighted regression with sample reuse (R), is demonstrated through robot learning experiments.

Author(s):	Hachiya, H. and Peters, J. and Sugiyama, M.
Journal:	Neural Computation
Volume:	23
Number (issue):	11
Pages:	2798-2832
Year:	2011
Month:	November
Day:	0

Department(s):	Empirical Inference
Bibtex Type:	Article (article)

Digital:	0
DOI:	10.1162/NECO_a_00199

Links:	Web

BibTex @article{HachiyaPS2011, title = {Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning}, author = {Hachiya, H. and Peters, J. and Sugiyama, M.}, journal = {Neural Computation}, volume = {23}, number = {11}, pages = {2798-2832}, month = nov, year = {2011}, doi = {10.1162/NECO_a_00199}, month_numeric = {11} }

People

Jan Peters

Research Group Leader

Alumni

Hirotaka Hachiya

Alumni

Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning

2011

Article

ei

People

Latest News

Links

Contact Us