Research Group Leader

Office: 223

Spemannstr. 38

72076 Tübingen

Spemannstr. 38

72076 Tübingen

+49 7071 601 572

+49 7071 601 552

For more information about our work, see also our group's homepage.

NEWS:

- Our paper on Gauss-Markov Runge-Kutta methods has been selected for an oral presentation at NIPS.
- An emerging community of people interested in assigning uncertainty to the result of deterministic computations met for the roundtable on probabilistic numerics, at the MPI, on 21/22 August. See also Mike Osborne's blog posts. This is an exciting time for our young circle.
- We have launched a community webpage: http://www.probabilistic-numerics.org
- Videos of my Gaussian Process tutorial at the RNLS summer school at the ETH Zürich are online. This is the most recent and most compact version of my GP tutorial. If you want to learn about GPs in 90 minutes, I recommend watching this one instead of the older ones listed below.
- I recently organised the Machine Learning Summer School 2013 (videos and slides)
- Videos of my talks at the MLSS are up (slide animations only work in Adobe Reader):
- I also recently spoke at the Gaussian Process Winter School 2013. Here is a video, and the slides of this talk (slide animations only work in Adobe Reader).
- I'm co-teaching the lecture course Intelligent Systems I of the University of Tübingen
- I recently co-organized the 2012 NIPS workshop on Probabilistic Numerics

See also my CV for more information on myself.

My work concerns

Intelligence is the ability to *act* under *uncertainty*. It exists on a broad range, not just of physical, but also of computational scales: From simplistic ideas like gradient descent, which may be a microbe's strategy to get closer to a source of nutrients, to an adult human's reasoning about career goals. Much of modern research in machine learning and artificial intelligence aims for the top of this hierarchy: algorithms capable of building highly structured models, and taking complicated decisions, at high computational cost. I believe that there is still plenty of room for improvement left at the bottom, too.

Algorithms for the bottom end of the intelligence hierarchy are those constructed by **numerical mathematics**. They are methods that estimate the result of intractable computations: *Optimizers* estimate the location of (local or global) extrema. *Quadrature methods* estimate values of integrals. *Differential equation solvers* estimate curves fulfilling constraints on their derivatives. In this sense, all these methods can be seen as *inference algorithms*, estimating latent quantities from the result of tractable computations.

These algorithms are the building blocks for the more complex top level intelligence. So they have to be *modular*, to be re-usable. They have to be *robust*, because their failure may cause big problems upstream. And of course they have to be *cheap*. In my work, I try to address theses requirements. Our analysis starts with the observation that many existing, basic numerical methods can be interpreted precisely as maximum a-posterior estimators under Gaussian priors and likelihoods. This includes real classics, like the conjugate gradient algorithm [Hennig, 2014], the BFGS rule [Hennig & Kiefel, 2013], Gaussian quadrature (including Romberg's rule and others) and the family of Runge-Kutta methods [Schober, Duvenaud & Hennig, 2014]. Taking this interpretation as a starting point, we extend the established methods to address contemporary challenges of large-scale learning: Modelling noise in observations and building structured priors for improved performance on specific tasks. A particularly promising area of generalization is the sharing of information among related problems, and the propagation of information from one computational step to the next. Over time, we plan to extend this notion into a general purpose framework for uncertain computation.

Here is an older selection of some of my work. See "publications" for pdfs and detailed citations, my CV (link above) for more recent information, and http://probabilistic-numerics.org for a frequently updated overview of our emerging community:

Runge-Kutta methods, the bedrock of numerical ODE solution methods, can be interpreted as the posterior mean of a specific kind of linear state-space model under a Wiener drift [*Schober, Duvenaud & Hennig. *Probabilistic ODE Solvers with Runge-Kutta Means. NIPS 2014]. This insight can be used to construct methods assigning probability measures over the solution of an ODE. This measure can be visualised as an uncertainty [*Schober et al.*, Probabilistic Shortest Path Tractography in DTI using Gaussian Process ODE solvers, MICCAI 2014], and used to control the computational cost of ODE solvers [*Hennig & Hauberg*, Probabilistic Solutions to Differential Equations and their Application to Riemannian Statistics, AISTATS 2014].

together with Mark Bangert at the German Cancer Research Centre, we are developing a toolchain for the analytic propagation of setup uncertainties through the optimization chain used in radiation therapy treatment planning, aiming to find more robust treatment plans that reduce the incidence of complications and side-effects in radiation tumor therapy. See *Bangert, Hennig & Oelfke *"Analytical probabilistic modeling for radiation therapy treatment planning Physics in Medicine and Biology"** 58**(16) 5401-5419.

Stochastic gradient descent is still the dominant algorithm for the training of many online learning algorithms, like neural networks. All just because more elaborate ideas, like quasi-Newton methods, cannot deal with noise? See what can be done about that: *Hennig*. "Fast Probabilistic Optimization from Noisy Gradients". ICML 2013

Did you know that BFGS is a least-squares regressor? See what happens when you make it nonparametric: *Hennig & Kiefel*. "Quasi-Newton methods, a new direction". ICML 2012

When optimizing experimental parameters in search of a global optimum, algorithms shouldn't try evaluating close to the optimum. They should try to evaluate where they *expect to learn most about the optimum*. *Hennig & Schuler*, "Entropy Search for Information Efficient Global Optimization". JMLR 13 (2012).

Probability theory offers a uniquely coherent view on the infamous exploration/exploitation tradeoff: From the Bayesian view, reinforcement learning is about modelling the effect of possible future observations on the optimality of decisions taken in the present. In general, this decision process is intractable. But under Gaussian process assumptions (which, depending how on look on it, is either a quite general, or a quite limited set of assumptions), the right answer moves within reach of numerical analysis. *Hennig*, "Optimal Reinforcement Learning for Gaussian Systems", NIPS 2011

Topic modelling is a very popular area of machine learning at the moment. Documents come with metadata, and topics change over time, and from document to document depending on the author, the subject, and many other features. The probabilistic extension of topic models that allows modelling such effects requires an algorithmic link between discrete distributions and continuous domains, often realised as a set of "dependent Dirichlets". We pointed out how to do this, in a numerically extremely efficient way. *Hennig, Stern, Herbrich and Graepel*, "Kernel Topic Models". AISTATS 2011.

Tree search, finding the optimal leaf of a tree, is exponentially hard in the depth of the tree, because trees are exponentially big in their depth. But what happens during that exponentially long search? If you have a probabilistic belief over the value and location of the optimal leaf, and get one more observation of one individual leaf's values? Shouldn't updating the belief cost only linear time? It does. *Hennig, Stern and Grapel*. "Coherent Inference on Optimal Play in Game Trees". AISTATS 2010

Inference Probability Numerical Methods

35 results
(BibTeX)

**Automatic LQR Tuning Based on Gaussian Process Optimization: Early Experimental Results** In: *Machine Learning in Planning and Control of Robot Motion Workshop at the IEEE/RSJ International Conference on Intelligent Robots and Systems*
(Conference)

**Probabilistic numerics and uncertainty in computations** *Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences*, 471(2179)
(Article)

**A Random Riemannian Metric for Probabilistic Shortest-Path Tractography** In: *18th International Conference on Medical Image Computing and Computer Assisted Intervention*, MICCAI 2015
(In Proceedings)

**Gaussian Process Based Predictive Control for Periodic Error Correction ** *IEEE Transactions on Control Systems Technology *
(Article)

**Inference of Cause and Effect with Unsupervised Inverse Regression** In: *Proceedings of the 18th International Conference on Artificial Intelligence and Statistics*, JMLR.org, AISTATS 2015
(In Proceedings)

**Probabilistic Interpretation of Linear Solvers** *SIAM Journal on Optimization*, 25(1):234-260
(Article)

**Probabilistic Progress Bars** In: *Conference on Pattern Recognition (GCPR)*, 8753, 331–341, Springer, GCPR 2014
(In Proceedings)

**Probabilistic Solutions to Differential Equations and their Application to Riemannian Statistics** In: *Proceedings of the 17th International Conference on Artificial Intelligence and Statistics*, 33, 347-355, Microtome Publishing, Brookline, MA, AISTATS 2014
(In Proceedings)

**Probabilistic ODE Solvers with Runge-Kutta Means** In: *Advances in Neural Information Processing Systems 27*, 739-747, Curran Associates, Inc., 28th Annual Conference on Neural Information Processing Systems (NIPS 2014)
(In Proceedings)

**Active Learning of Linear Embeddings for Gaussian Processes** In: *Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence*, 230-239, AUAI Press , Corvallis, Oregon, UAI2014
(In Proceedings)

**Probabilistic Shortest Path Tractography in DTI~Using Gaussian Process ODE Solvers** In: *Medical Image Computing and Computer-Assisted Intervention – MICCAI 2014, Lecture Notes in Computer Science Vol. 8675*, 265-272, Springer, Heidelberg, MICCAI 2014
(In Proceedings)

Meier, F. Hennig, P. Schaal, S. (2014). **Local Gaussian Regression** In: *arXiv preprint*
(Miscellaneous)

**Incremental Local Gaussian Regression** In: *Advances in Neural Information Processing Systems 27*, 972–980, 28th Annual Conference on Neural Information Processing Systems (NIPS 2014)
(In Proceedings)

**Sampling for Inference in Probabilistic Models with Fast Bayesian Quadrature** In: *Advances in Neural Information Processing Systems 27*, 2789–2797, Curran Associates, Inc., 28th Annual Conference on Neural Information Processing Systems (NIPS 2014)
(In Proceedings)

**Efficient Bayesian Local Model Learning for Control** In: *Proceedings of the IEEE International Conference on Intelligent Robotics Systems*, IROS 2014
(In Proceedings)

**Quasi-Newton Methods: A New Direction** *Journal of Machine Learning Research*, 14(1):843–865
(Article)

**The Randomized Dependence Coefficient** In: *Advances in Neural Information Processing Systems 26*, 1–9, 27th Annual Conference on Neural Information Processing Systems (NIPS 2013)
(In Proceedings)

**Fast Probabilistic Optimization from Noisy Gradients** In: *Proceedings of The 30th International Conference on Machine Learning, JMLR W&CP 28(1)*, 62–70, ICML
(In Proceedings)

**Nonparametric dynamics estimation for time periodic systems** In: *Proceedings of the 51st Annual Allerton Conference on Communication, Control, and Computing*, 486 - 493
(In Proceedings)

**The Randomized Dependence Coefficient** Neural Information Processing Systems (NIPS 2013)
(Poster)

For more information about our work, see also our group's homepage.

NEWS:

- Our paper on Gauss-Markov Runge-Kutta methods has been selected for an oral presentation at NIPS.
- An emerging community of people interested in assigning uncertainty to the result of deterministic computations met for the roundtable on probabilistic numerics, at the MPI, on 21/22 August. See also Mike Osborne's blog posts. This is an exciting time for our young circle.
- We have launched a community webpage: http://www.probabilistic-numerics.org
- Videos of my Gaussian Process tutorial at the RNLS summer school at the ETH Zürich are online. This is the most recent and most compact version of my GP tutorial. If you want to learn about GPs in 90 minutes, I recommend watching this one instead of the older ones listed below.
- I recently organised the Machine Learning Summer School 2013 (videos and slides)
- Videos of my talks at the MLSS are up (slide animations only work in Adobe Reader):
- I also recently spoke at the Gaussian Process Winter School 2013. Here is a video, and the slides of this talk (slide animations only work in Adobe Reader).
- I'm co-teaching the lecture course Intelligent Systems I of the University of Tübingen
- I recently co-organized the 2012 NIPS workshop on Probabilistic Numerics

See also my CV for more information on myself.

My work concerns

Intelligence is the ability to *act* under *uncertainty*. It exists on a broad range, not just of physical, but also of computational scales: From simplistic ideas like gradient descent, which may be a microbe's strategy to get closer to a source of nutrients, to an adult human's reasoning about career goals. Much of modern research in machine learning and artificial intelligence aims for the top of this hierarchy: algorithms capable of building highly structured models, and taking complicated decisions, at high computational cost. I believe that there is still plenty of room for improvement left at the bottom, too.

Algorithms for the bottom end of the intelligence hierarchy are those constructed by **numerical mathematics**. They are methods that estimate the result of intractable computations: *Optimizers* estimate the location of (local or global) extrema. *Quadrature methods* estimate values of integrals. *Differential equation solvers* estimate curves fulfilling constraints on their derivatives. In this sense, all these methods can be seen as *inference algorithms*, estimating latent quantities from the result of tractable computations.

These algorithms are the building blocks for the more complex top level intelligence. So they have to be *modular*, to be re-usable. They have to be *robust*, because their failure may cause big problems upstream. And of course they have to be *cheap*. In my work, I try to address theses requirements. Our analysis starts with the observation that many existing, basic numerical methods can be interpreted precisely as maximum a-posterior estimators under Gaussian priors and likelihoods. This includes real classics, like the conjugate gradient algorithm [Hennig, 2014], the BFGS rule [Hennig & Kiefel, 2013], Gaussian quadrature (including Romberg's rule and others) and the family of Runge-Kutta methods [Schober, Duvenaud & Hennig, 2014]. Taking this interpretation as a starting point, we extend the established methods to address contemporary challenges of large-scale learning: Modelling noise in observations and building structured priors for improved performance on specific tasks. A particularly promising area of generalization is the sharing of information among related problems, and the propagation of information from one computational step to the next. Over time, we plan to extend this notion into a general purpose framework for uncertain computation.

Here is an older selection of some of my work. See "publications" for pdfs and detailed citations, my CV (link above) for more recent information, and http://probabilistic-numerics.org for a frequently updated overview of our emerging community:

Runge-Kutta methods, the bedrock of numerical ODE solution methods, can be interpreted as the posterior mean of a specific kind of linear state-space model under a Wiener drift [*Schober, Duvenaud & Hennig. *Probabilistic ODE Solvers with Runge-Kutta Means. NIPS 2014]. This insight can be used to construct methods assigning probability measures over the solution of an ODE. This measure can be visualised as an uncertainty [*Schober et al.*, Probabilistic Shortest Path Tractography in DTI using Gaussian Process ODE solvers, MICCAI 2014], and used to control the computational cost of ODE solvers [*Hennig & Hauberg*, Probabilistic Solutions to Differential Equations and their Application to Riemannian Statistics, AISTATS 2014].

together with Mark Bangert at the German Cancer Research Centre, we are developing a toolchain for the analytic propagation of setup uncertainties through the optimization chain used in radiation therapy treatment planning, aiming to find more robust treatment plans that reduce the incidence of complications and side-effects in radiation tumor therapy. See *Bangert, Hennig & Oelfke *"Analytical probabilistic modeling for radiation therapy treatment planning Physics in Medicine and Biology"** 58**(16) 5401-5419.

Stochastic gradient descent is still the dominant algorithm for the training of many online learning algorithms, like neural networks. All just because more elaborate ideas, like quasi-Newton methods, cannot deal with noise? See what can be done about that: *Hennig*. "Fast Probabilistic Optimization from Noisy Gradients". ICML 2013

Did you know that BFGS is a least-squares regressor? See what happens when you make it nonparametric: *Hennig & Kiefel*. "Quasi-Newton methods, a new direction". ICML 2012

When optimizing experimental parameters in search of a global optimum, algorithms shouldn't try evaluating close to the optimum. They should try to evaluate where they *expect to learn most about the optimum*. *Hennig & Schuler*, "Entropy Search for Information Efficient Global Optimization". JMLR 13 (2012).

Probability theory offers a uniquely coherent view on the infamous exploration/exploitation tradeoff: From the Bayesian view, reinforcement learning is about modelling the effect of possible future observations on the optimality of decisions taken in the present. In general, this decision process is intractable. But under Gaussian process assumptions (which, depending how on look on it, is either a quite general, or a quite limited set of assumptions), the right answer moves within reach of numerical analysis. *Hennig*, "Optimal Reinforcement Learning for Gaussian Systems", NIPS 2011

Topic modelling is a very popular area of machine learning at the moment. Documents come with metadata, and topics change over time, and from document to document depending on the author, the subject, and many other features. The probabilistic extension of topic models that allows modelling such effects requires an algorithmic link between discrete distributions and continuous domains, often realised as a set of "dependent Dirichlets". We pointed out how to do this, in a numerically extremely efficient way. *Hennig, Stern, Herbrich and Graepel*, "Kernel Topic Models". AISTATS 2011.

Tree search, finding the optimal leaf of a tree, is exponentially hard in the depth of the tree, because trees are exponentially big in their depth. But what happens during that exponentially long search? If you have a probabilistic belief over the value and location of the optimal leaf, and get one more observation of one individual leaf's values? Shouldn't updating the belief cost only linear time? It does. *Hennig, Stern and Grapel*. "Coherent Inference on Optimal Play in Game Trees". AISTATS 2010

Office: 223

Spemannstr. 38

72076 Tübingen

Spemannstr. 38

72076 Tübingen

+49 7071 601 572

+49 7071 601 552

Inference Probability Numerical Methods