Empirical Inference

Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning

2011

Article

ei


Direct policy search is a promising reinforcement learning framework, in particular for controlling continuous, high-dimensional systems. Policy search often requires a large number of samples for obtaining a stable policy update estimator, and this is prohibitive when the sampling cost is expensive. In this letter, we extend an expectation-maximization-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, reward-weighted regression with sample reuse (R), is demonstrated through robot learning experiments.

Author(s): Hachiya, H. and Peters, J. and Sugiyama, M.
Journal: Neural Computation
Volume: 23
Number (issue): 11
Pages: 2798-2832
Year: 2011
Month: November
Day: 0

Department(s): Empirical Inference
Bibtex Type: Article (article)

Digital: 0
DOI: 10.1162/NECO_a_00199

Links: Web

BibTex

@article{HachiyaPS2011,
  title = {Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning},
  author = {Hachiya, H. and Peters, J. and Sugiyama, M.},
  journal = {Neural Computation},
  volume = {23},
  number = {11},
  pages = {2798-2832},
  month = nov,
  year = {2011},
  doi = {10.1162/NECO_a_00199},
  month_numeric = {11}
}