Reinforcement Learning with Bounded Information Loss

2011

Article

ei

Policy search is a successful approach to reinforcement learning. However, policy improvements often result in the loss of information. Hence, it has been marred by premature convergence and implausible solutions. As first suggested in the context of covariant or natural policy gradients, many of these problems may be addressed by constraining the information loss. In this paper, we continue this path of reasoning and suggest two reinforcement learning methods, i.e., a model‐based and a model free algorithm that bound the loss in relative entropy while maximizing their return. The resulting methods differ significantly from previous policy gradient approaches and yields an exact update step. It works well on typical reinforcement learning benchmark problems as well as novel evaluations in robotics. We also show a Bayesian bound motivation of this new approach [8].

Author(s):	Peters, J. and Peters, J. and Mülling, K. and Altun, Y.
Journal:	AIP Conference Proceedings
Volume:	1305
Number (issue):	1
Pages:	365-372
Year:	2011
Day:	0

Department(s):	Empirical Inference
Bibtex Type:	Article (article)

Digital:	0
DOI:	10.1063/1.3573639

Links:	Web

BibTex @article{PetersMSA2011, title = {Reinforcement Learning with Bounded Information Loss}, author = {Peters, J. and Peters, J. and M{\"u}lling, K. and Altun, Y.}, journal = {AIP Conference Proceedings}, volume = {1305}, number = {1}, pages = {365-372}, year = {2011}, doi = {10.1063/1.3573639} }