Jan Peters (Project Leader),
Carl Edward Rasmussen,
Marc P. Deisenroth,
Philipp Hennig,
Oliver Kroemer,
Jens Kober,
Yevgeny Seldin,
Abdeslam Boularias

Reinforcement learning ranks among the biggest challenges for machine learning. Just controlling a known dynamical system is hard on its own -- interacting with an unknown system poses even harder decision problems, such as the infamous exploration-exploitation tradeoff. Most research in this area is still conï¬ned to theoretical analysis and simplistic experiments, but the promise of autonomous machines justiï¬es the effort. Over the past years, members of the department contributed to reinforcement learning in theory and experiment.

11 results

**PAC-Bayesian Inequalities for Martingales **
*IEEE Transactions on Information Theory*, 58(12):7086-7093, June 2012 (article)

**Structured Apprenticeship Learning**
In European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), 2012 (inproceedings)

**Hierarchical Relative Entropy Policy Search**
In *Fifteenth International Conference on Artificial Intelligence and Statistics*, 22, pages: 273-281, JMLR Proceedings, (Editors: Lawrence, N. D. and Girolami, M.), JMLR.org, AISTATS, April 2012 (inproceedings)

**Policy Search for Motor Primitives in Robotics**
*Machine Learning*, 84(1-2):171-203, July 2011 (article)

**Relative Entropy Inverse Reinforcement Learning**
In *JMLR Workshop and Conference Proceedings Volume 15: AISTATS 2011*, pages: 182-189, (Editors: Gordon, G. , D. Dunson, M. Dudík ), MIT Press, Cambridge, MA, USA, Fourteenth International Conference on Artificial Intelligence and Statistics, April 2011 (inproceedings)

**PILCO: A Model-Based and Data-Efficient Approach to Policy Search**
In *Proceedings of the 28th International Conference on Machine Learning, ICML 2011*, pages: 465-472, (Editors: L Getoor and T Scheffer), Omnipress, 2011 (inproceedings)

**Optimal Reinforcement Learning for Gaussian Systems**
In *Advances in Neural Information Processing Systems 24*, pages: 325-333, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and F Pereira and KQ Weinberger), Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

**A Non-Parametric Approach to Dynamic Programming**
In *Advances in Neural Information Processing Systems 24*, pages: 1719-1727, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and F Pereira and KQ Weinberger), Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

**PAC-Bayesian Analysis of Contextual Bandits **
In *Advances in Neural Information Processing Systems 24*, pages: 1683-1691, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and F Pereira and KQ Weinberger), Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

**Relative Entropy Policy Search**
In *Proceedings of the Twenty-Fourth National Conference on Artificial Intelligence*, pages: 1607-1612, (Editors: Fox, M. , D. Poole), AAAI Press, Menlo Park, CA, USA, Twenty-Fourth National Conference on Artificial Intelligence (AAAI-10), July 2010 (inproceedings)

**Gaussian Process Dynamic Programming**
*Neurocomputing*, 72(7-9):1508-1524, March 2009 (article)