Header logo is ei

A Non-Parametric Approach to Dynamic Programming


Conference Paper


In this paper, we consider the problem of policy evaluation for continuousstate systems. We present a non-parametric approach to policy evaluation, which uses kernel density estimation to represent the system. The true form of the value function for this model can be determined, and can be computed using Galerkin’s method. Furthermore, we also present a unified view of several well-known policy evaluation methods. In particular, we show that the same Galerkin method can be used to derive Least-Squares Temporal Difference learning, Kernelized Temporal Difference learning, and a discrete-state Dynamic Programming solution, as well as our proposed method. In a numerical evaluation of these algorithms, the proposed approach performed better than the other methods.

Author(s): Kroemer, O. and Peters, J.
Book Title: Advances in Neural Information Processing Systems 24
Pages: 1719-1727
Year: 2011
Day: 0
Editors: J Shawe-Taylor and RS Zemel and P Bartlett and F Pereira and KQ Weinberger

Department(s): Empirical Inference
Research Project(s): Reinforcement Learning
Bibtex Type: Conference Paper (inproceedings)

Digital: 0
Event Name: Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS 2011)
Event Place: Granada, Spain

Links: PDF


  title = {A Non-Parametric Approach to Dynamic Programming},
  author = {Kroemer, O. and Peters, J.},
  booktitle = {Advances in Neural Information Processing Systems 24},
  pages = {1719-1727},
  editors = {J Shawe-Taylor and RS Zemel and P Bartlett and F Pereira and KQ Weinberger},
  year = {2011}