Emmy Noether Group on Probabilistic Numerics

Philipp Hennig (Group Leader), Simon Bartels, Hans Kersting, Maren Mahsereci, Michael Schober, Edgar Klenske

Probabilistic Numerical Methods: A Summary of our Work

Artificial intelligent systems build models of their environment from observations, and choose actions that they predict will have beneficial effect on the environment's state. A central challenge is that the mathematical models used in this process call for computations that have no closed, analytic solution. Learning machines thus rely on a whole toolbox of numerical methods: High-dimensional integration routines are used for marginalization and conditioning in probabilistic models. Fitting of parameters poses nonlinear (and often non-convex) optimization problems. And the prediction of dynamic changes in the environment involves various forms of differential equations. In addition, there are special cases for each of these areas in which the computation amounts to large-scale linear algebra (i.e. Gaussian conditioning, least-squares optimization, linear differential equations). Traditionally, machine learning researchers have served these needs by taking numerical methods "off the shelf" and treating them as black boxes.

Since the 1970s, researchers like Wahba, Diaconis and O'Hagan repeatedly pointed out that, in fact, numerical methods can themselves be interpreted as statistical estimation methods -- as acting, learning machines: Numerical methods estimate an unknown, intractable, latent quantity given computable, tractable, observable quantities. For example, an integration method estimates the value of an integral given access to evaluations of the integrand. This is a rather abstract observation, but Diaconis and O'Hagan separately made a precise connection between inference and compuation in the case of integration: Several classic quadrature rules, e.g. the trapezoidal rule, can be interpreted as the maximum a posteriori (MAP) estimator arising from a certain family of Gaussian process priors on the integrand.

Over recent years, the research group on probabilistic numerics at the MPI IS (since 2015 supported by an Emmy Noether grant of the DFG) has been able to add more such bridges between computation and inference, across the domains of numerical computation, by showing that various basic numerical methods are MAP estimates under equally basic probabilistic priors. Hennig and Kiefel [ ] showed that quasi-Newton methods, such as the BFGS rule, arise as the mean of a Gaussian distribution over the elements of the inverse Hessian matrix of an optimization objective. Hennig extended this result to linear solvers [ ], in particular the method of conjugate gradients for linear problems (which amounts to Gaussian regression on the elements of the inverse of a symmetric matrix. And regarding ordinary differential equations, Schober, Duvenaud and Hennig [ ] showed that various Runge-Kutta methods can be interpreted as autoregressive filters, returning a Gaussian process posterior over the solution of a differential equation.

The picture emerging from these connections is a mathematically precise description of computation itself as the active collection of information. In this view, the analytic description of a numerical task provides a prior probability measure over possible solutions; which can be shrunk towards the limit of a point measure through conditioning on the result of tractable computations. Many concepts and philosophical problems from statistics carry over to computation quite naturally, with two notable differences: First, in numerical ``inference'' tasks, the validity of the prior can be analysed to a higher formal degree than in inference from physical data sources, because the task is specified in a formal (programming) language. Secondly, since numerical routines are the bottom, ``mechanistic'' layer of artificial intelligence, the ``inner loop'', they are subject to strict requirements on computational complexity. Internally, a numerical method can only use tractable floating-point operations. This translates into a constraint on acceptable probabilistic models -- most basic numerical methods make Gaussian assumptions.

In the context of machine learning, this description of numerical computation as the collection of information opens a number of exciting directions:

Once it is clear that a numerical method uses an implicit prior, it is natural to adapt this prior to reflect available knowledge about the integrand. This design of "customized numerics" was used in a collaboration with colleagues at Oxford to build an efficient active integration method that outperforms Monte Carlo integration methods in wall-clock time on integration problems of moderate dimensionality [ ].
Many numerical problems are defined relative to a setup that is itself uncertain to begin with. Once numerical methods are defined as probabilistic inference, such uncertainty can often be captured quite naturally. In a collaboration with colleagues in Copenhagen, Hauberg, Schober, Hennig and others [ ] showed how uncertainty arising from a medical imaging process can be propagated in an approximate inference fashion to more completely model uncertainty over neural pathways in the human brain.
Explicit representations of uncertainty can also be used to increase robustness of a numerical computation itself. Addressing a pertinent issue in deep learning, Mahsereci and Hennig [ ] constructed a line search method---an elementary building block of nonlinear optimization methods---that is able to work with gradient evaluations corrupted by noise. This method drastically simplifies the problem of choosing a step size for the stochastic gradient descent methods used to train deep neural networks.
- More generally, it is possible to define probabilistic numerical methods, which accept probability measures over a numerical problem as inputs, and return another probability measure over the solution of the problem, which reflects both the effect of the input uncertainty, and uncertainty arising from the finite precision of the internal computation itself. A position paper by Hennig, Osborne and Girolami [ ] motivates this class of methods and suggests their use for the control of computational effort across heterogeneous, composite chains of computations, such as those that make up intelligent machines. A DFG-funded collaboration between Mark Bangert's group at the German Cancer Research Centre and the Emmy Noether group develops an approximate inference framework to propagate physical uncertainties through the planning (optimization) pipeline for radiation treatment, to lower the risk of unnecessary complications for cancer patients [ ].

In a separate but related development, a community has also arisen around the formulation of global optimization as inference, and the formulation of sample-efficient optimization methods. These Bayesian Optimization methods can, for example, be used to structurally optimize, automate the design of machine learning models themselves. We contributed to this area with the development of the Entropy Search [ ] algorithm that automatically performs experiments expected to provide maximal information about the location of a function's extremum.

The Emmy Noether group on Probabilistic Numerics keeps close collaborative contact with the groups of Michael A Osborne in Oxford and Mark Girolami in Warwick. Together with the colleagues in Oxford, Philipp Hennig hosts the community portal probabilistic-numerics.org. Members of the group have co-organized a growing number of international meetings on probabilistic numerics, including, among others, several workshops at NIPS (including the very first workshop on Probabilistic Numerics), in Tübingen, at the DALI conference, and at the University of Warwick.

Probability Theory is an integral part of machine learning as a discipline, and continues to play a major role in the development of the ﬁeld. The modularity of probabilistic models has repeatedly allowed for them to be extended and developed into general frameworks

for large classes of problems, like regression and density estimation. In many areas of statistics and machine learning, probabilistic formulations were not the ﬁrst to the table, but still helped improve understanding of implicit assumptions of the established statistical algorithms, helped with theoretical analysis, and lit the way to algorithmic extensions. Over the years, members of the department of empirical inference have contributed substantially to the ﬁeld of probabilistic learning.

Read more

ei pn

Philipp Hennig (Group leader)

Alumni

Simon Bartels

Alumni

Hans Kersting

Alumni

Maren Mahsereci

Alumni

Michael Schober

Alumni

ei pn zwe-sw

Edgar Klenske

Alumni

23 results

2015

Probabilistic Line Searches for Stochastic Optimization

Mahsereci, M., Hennig, P.

In Advances in Neural Information Processing Systems 28, pages: 181-189, (Editors: C. Cortes, N.D. Lawrence, D.D. Lee, M. Sugiyama and R. Garnett), Curran Associates, Inc., 29th Annual Conference on Neural Information Processing Systems (NIPS), 2015 (inproceedings)

Abstract

In deterministic optimization, line searches are a standard tool ensuring stability and efficiency. Where only stochastic gradients are available, no direct equivalent has so far been formulated, because uncertain gradients do not allow for a strict sequence of decisions collapsing the search space. We construct a probabilistic line search by combining the structure of existing deterministic methods with notions from Bayesian optimization. Our method retains a Gaussian process surrogate of the univariate optimization objective, and uses a probabilistic belief over the Wolfe conditions to monitor the descent. The algorithm has very low computational cost, and no user-controlled parameters. Experiments show that it effectively removes the need to define a learning rate for stochastic gradient descent. [You can find the matlab research code under `attachments' below. The zip-file contains a minimal working example. The docstring in probLineSearch.m contains additional information. A more polished implementation in C++ will be published here at a later point. For comments and questions about the code please write to mmahsereci@tue.mpg.de.]

Matlab research code link (url) [BibTex]

2015

Mahsereci, M., Hennig, P. Probabilistic Line Searches for Stochastic Optimization In Advances in Neural Information Processing Systems 28, pages: 181-189, (Editors: C. Cortes, N.D. Lawrence, D.D. Lee, M. Sugiyama and R. Garnett), Curran Associates, Inc., 29th Annual Conference on Neural Information Processing Systems (NIPS), 2015 (inproceedings)

Matlab research code link (url) [BibTex]

Probabilistic numerics and uncertainty in computations

Hennig, P., Osborne, M. A., Girolami, M.

Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 471(2179), 2015 (article)

Abstract

We deliver a call to arms for probabilistic numerical methods: algorithms for numerical tasks, including linear algebra, integration, optimization and solving differential equations, that return uncertainties in their calculations. Such uncertainties, arising from the loss of precision induced by numerical calculation with limited time or hardware, are important for much contemporary science and industry. Within applications such as climate science and astrophysics, the need to make decisions on the basis of computations with large and complex data have led to a renewed focus on the management of numerical uncertainty. We describe how several seminal classic numerical methods can be interpreted naturally as probabilistic inference. We then show that the probabilistic view suggests new algorithms that can flexibly be adapted to suit application specifics, while delivering improved empirical performance. We provide concrete illustrations of the benefits of probabilistic numeric algorithms on real scientific problems from astrometry and astronomical imaging, while highlighting open problems with these new algorithms. Finally, we describe how probabilistic numerical methods provide a coherent framework for identifying the uncertainty in calculations performed with a combination of numerical algorithms (e.g. both numerical optimizers and differential equation solvers), potentially allowing the diagnosis (and control) of error sources in computations.

PDF DOI [BibTex]

Hennig, P., Osborne, M. A., Girolami, M. Probabilistic numerics and uncertainty in computations Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 471(2179), 2015 (article)

PDF DOI [BibTex]

A Random Riemannian Metric for Probabilistic Shortest-Path Tractography

Hauberg, S., Schober, M., Liptrot, M., Hennig, P., Feragen, A.

In 18th International Conference on Medical Image Computing and Computer Assisted Intervention, 9349, pages: 597-604, Lecture Notes in Computer Science, MICCAI, 2015 (inproceedings)

PDF DOI [BibTex]

Hauberg, S., Schober, M., Liptrot, M., Hennig, P., Feragen, A. A Random Riemannian Metric for Probabilistic Shortest-Path Tractography In 18th International Conference on Medical Image Computing and Computer Assisted Intervention, 9349, pages: 597-604, Lecture Notes in Computer Science, MICCAI, 2015 (inproceedings)

PDF DOI [BibTex]

Probabilistic Interpretation of Linear Solvers

Hennig, P.

SIAM Journal on Optimization, 25(1):234-260, 2015 (article)

Web PDF link (url) DOI [BibTex]

Hennig, P. Probabilistic Interpretation of Linear Solvers SIAM Journal on Optimization, 25(1):234-260, 2015 (article)

Web PDF link (url) DOI [BibTex]

2014

Probabilistic Progress Bars

Kiefel, M., Schuler, C., Hennig, P.

In Conference on Pattern Recognition (GCPR), 8753, pages: 331-341, Lecture Notes in Computer Science, (Editors: Jiang, X., Hornegger, J., and Koch, R.), Springer, GCPR, September 2014 (inproceedings)

Abstract

Predicting the time at which the integral over a stochastic process reaches a target level is a value of interest in many applications. Often, such computations have to be made at low cost, in real time. As an intuitive example that captures many features of this problem class, we choose progress bars, a ubiquitous element of computer user interfaces. These predictors are usually based on simple point estimators, with no error modelling. This leads to fluctuating behaviour confusing to the user. It also does not provide a distribution prediction (risk values), which are crucial for many other application areas. We construct and empirically evaluate a fast, constant cost algorithm using a Gauss-Markov process model which provides more information to the user.

website+code pdf DOI [BibTex]

2014

Kiefel, M., Schuler, C., Hennig, P. Probabilistic Progress Bars In Conference on Pattern Recognition (GCPR), 8753, pages: 331-341, Lecture Notes in Computer Science, (Editors: Jiang, X., Hornegger, J., and Koch, R.), Springer, GCPR, September 2014 (inproceedings)

website+code pdf DOI [BibTex]

Probabilistic Solutions to Differential Equations and their Application to Riemannian Statistics

Hennig, P., Hauberg, S.

In Proceedings of the 17th International Conference on Artificial Intelligence and Statistics, 33, pages: 347-355, JMLR: Workshop and Conference Proceedings, (Editors: S Kaski and J Corander), Microtome Publishing, Brookline, MA, AISTATS, April 2014 (inproceedings)

Abstract

We study a probabilistic numerical method for the solution of both boundary and initial value problems that returns a joint Gaussian process posterior over the solution. Such methods have concrete value in the statistics on Riemannian manifolds, where non-analytic ordinary differential equations are involved in virtually all computations. The probabilistic formulation permits marginalising the uncertainty of the numerical solution such that statistics are less sensitive to inaccuracies. This leads to new Riemannian algorithms for mean value computations and principal geodesic analysis. Marginalisation also means results can be less precise than point estimates, enabling a noticeable speed-up over the state of the art. Our approach is an argument for a wider point that uncertainty caused by numerical calculations should be tracked throughout the pipeline of machine learning algorithms.

pdf Youtube Supplements Project page link (url) [BibTex]

Hennig, P., Hauberg, S. Probabilistic Solutions to Differential Equations and their Application to Riemannian Statistics In Proceedings of the 17th International Conference on Artificial Intelligence and Statistics, 33, pages: 347-355, JMLR: Workshop and Conference Proceedings, (Editors: S Kaski and J Corander), Microtome Publishing, Brookline, MA, AISTATS, April 2014 (inproceedings)

pdf Youtube Supplements Project page link (url) [BibTex]

Active Learning of Linear Embeddings for Gaussian Processes

Garnett, R., Osborne, M., Hennig, P.

In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence, pages: 230-239, (Editors: NL Zhang and J Tian), AUAI Press , Corvallis, Oregon, UAI2014, 2014, another link: http://arxiv.org/abs/1310.6740 (inproceedings)

PDF Web [BibTex]

Garnett, R., Osborne, M., Hennig, P. Active Learning of Linear Embeddings for Gaussian Processes In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence, pages: 230-239, (Editors: NL Zhang and J Tian), AUAI Press , Corvallis, Oregon, UAI2014, 2014, another link: http://arxiv.org/abs/1310.6740 (inproceedings)

PDF Web [BibTex]

Sampling for Inference in Probabilistic Models with Fast Bayesian Quadrature

Gunter, T., Osborne, M., Garnett, R., Hennig, P., Roberts, S.

In Advances in Neural Information Processing Systems 27, pages: 2789-2797, (Editors: Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence and K.Q. Weinberger), Curran Associates, Inc., 28th Annual Conference on Neural Information Processing Systems (NIPS), 2014 (inproceedings)

Web link (url) [BibTex]

Gunter, T., Osborne, M., Garnett, R., Hennig, P., Roberts, S. Sampling for Inference in Probabilistic Models with Fast Bayesian Quadrature In Advances in Neural Information Processing Systems 27, pages: 2789-2797, (Editors: Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence and K.Q. Weinberger), Curran Associates, Inc., 28th Annual Conference on Neural Information Processing Systems (NIPS), 2014 (inproceedings)

Web link (url) [BibTex]

Probabilistic Shortest Path Tractography in DTI Using Gaussian Process ODE Solvers

Schober, M., Kasenburg, N., Feragen, A., Hennig, P., Hauberg, S.

In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2014, Lecture Notes in Computer Science Vol. 8675, pages: 265-272, (Editors: P. Golland, N. Hata, C. Barillot, J. Hornegger and R. Howe), Springer, Heidelberg, MICCAI, 2014 (inproceedings)

DOI [BibTex]

Schober, M., Kasenburg, N., Feragen, A., Hennig, P., Hauberg, S. Probabilistic Shortest Path Tractography in DTI Using Gaussian Process ODE Solvers In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2014, Lecture Notes in Computer Science Vol. 8675, pages: 265-272, (Editors: P. Golland, N. Hata, C. Barillot, J. Hornegger and R. Howe), Springer, Heidelberg, MICCAI, 2014 (inproceedings)

DOI [BibTex]

Probabilistic ODE Solvers with Runge-Kutta Means

Schober, M., Duvenaud, D., Hennig, P.

In Advances in Neural Information Processing Systems 27, pages: 739-747, (Editors: Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence and K.Q. Weinberger), Curran Associates, Inc., 28th Annual Conference on Neural Information Processing Systems (NIPS), 2014 (inproceedings)

Web link (url) [BibTex]

Schober, M., Duvenaud, D., Hennig, P. Probabilistic ODE Solvers with Runge-Kutta Means In Advances in Neural Information Processing Systems 27, pages: 739-747, (Editors: Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence and K.Q. Weinberger), Curran Associates, Inc., 28th Annual Conference on Neural Information Processing Systems (NIPS), 2014 (inproceedings)

Web link (url) [BibTex]

Incremental Local Gaussian Regression

Meier, F., Hennig, P., Schaal, S.

In Advances in Neural Information Processing Systems 27, pages: 972-980, (Editors: Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence and K.Q. Weinberger), 28th Annual Conference on Neural Information Processing Systems (NIPS), 2014, clmc (inproceedings)

PDF link (url) [BibTex]

Meier, F., Hennig, P., Schaal, S. Incremental Local Gaussian Regression In Advances in Neural Information Processing Systems 27, pages: 972-980, (Editors: Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence and K.Q. Weinberger), 28th Annual Conference on Neural Information Processing Systems (NIPS), 2014, clmc (inproceedings)

PDF link (url) [BibTex]

Efficient Bayesian Local Model Learning for Control

Meier, F., Hennig, P., Schaal, S.

In Proceedings of the IEEE International Conference on Intelligent Robots and Systems, pages: 2244 - 2249, IROS, 2014, clmc (inproceedings)

Abstract

Model-based control is essential for compliant controland force control in many modern complex robots, like humanoidor disaster robots. Due to many unknown and hard tomodel nonlinearities, analytical models of such robots are oftenonly very rough approximations. However, modern optimizationcontrollers frequently depend on reasonably accurate models,and degrade greatly in robustness and performance if modelerrors are too large. For a long time, machine learning hasbeen expected to provide automatic empirical model synthesis,yet so far, research has only generated feasibility studies butno learning algorithms that run reliably on complex robots.In this paper, we combine two promising worlds of regressiontechniques to generate a more powerful regression learningsystem. On the one hand, locally weighted regression techniquesare computationally efficient, but hard to tune due to avariety of data dependent meta-parameters. On the other hand,Bayesian regression has rather automatic and robust methods toset learning parameters, but becomes quickly computationallyinfeasible for big and high-dimensional data sets. By reducingthe complexity of Bayesian regression in the spirit of local modellearning through variational approximations, we arrive at anovel algorithm that is computationally efficient and easy toinitialize for robust learning. Evaluations on several datasetsdemonstrate very good learning performance and the potentialfor a general regression learning tool for robotics.

PDF link (url) DOI [BibTex]

Meier, F., Hennig, P., Schaal, S. Efficient Bayesian Local Model Learning for Control In Proceedings of the IEEE International Conference on Intelligent Robots and Systems, pages: 2244 - 2249, IROS, 2014, clmc (inproceedings)

PDF link (url) DOI [BibTex]

2013

Quasi-Newton Methods: A New Direction

Hennig, P., Kiefel, M.

Journal of Machine Learning Research, 14(1):843-865, March 2013 (article)

Abstract

Four decades after their invention, quasi-Newton methods are still state of the art in unconstrained numerical optimization. Although not usually interpreted thus, these are learning algorithms that fit a local quadratic approximation to the objective function. We show that many, including the most popular, quasi-Newton methods can be interpreted as approximations of Bayesian linear regression under varying prior assumptions. This new notion elucidates some shortcomings of classical algorithms, and lights the way to a novel nonparametric quasi-Newton method, which is able to make more efficient use of available information at computational cost similar to its predecessors.

website+code pdf link (url) [BibTex]

2013

Hennig, P., Kiefel, M. Quasi-Newton Methods: A New Direction Journal of Machine Learning Research, 14(1):843-865, March 2013 (article)

website+code pdf link (url) [BibTex]

Fast Probabilistic Optimization from Noisy Gradients

Hennig, P.

In Proceedings of The 30th International Conference on Machine Learning, JMLR W&CP 28(1), pages: 62–70, (Editors: S Dasgupta and D McAllester), ICML, 2013 (inproceedings)

PDF [BibTex]

Hennig, P. Fast Probabilistic Optimization from Noisy Gradients In Proceedings of The 30th International Conference on Machine Learning, JMLR W&CP 28(1), pages: 62–70, (Editors: S Dasgupta and D McAllester), ICML, 2013 (inproceedings)

PDF [BibTex]

Nonparametric dynamics estimation for time periodic systems

Klenske, E., Zeilinger, M., Schölkopf, B., Hennig, P.

In Proceedings of the 51st Annual Allerton Conference on Communication, Control, and Computing, pages: 486-493 , 2013 (inproceedings)

PDF DOI [BibTex]

Klenske, E., Zeilinger, M., Schölkopf, B., Hennig, P. Nonparametric dynamics estimation for time periodic systems In Proceedings of the 51st Annual Allerton Conference on Communication, Control, and Computing, pages: 486-493 , 2013 (inproceedings)

PDF DOI [BibTex]

Analytical probabilistic proton dose calculation and range uncertainties

Bangert, M., Hennig, P., Oelfke, U.

In 17th International Conference on the Use of Computers in Radiation Therapy, pages: 6-11, (Editors: A. Haworth and T. Kron), ICCR, 2013 (inproceedings)

[BibTex]

Bangert, M., Hennig, P., Oelfke, U. Analytical probabilistic proton dose calculation and range uncertainties In 17th International Conference on the Use of Computers in Radiation Therapy, pages: 6-11, (Editors: A. Haworth and T. Kron), ICCR, 2013 (inproceedings)

[BibTex]

Analytical probabilistic modeling for radiation therapy treatment planning

Bangert, M., Hennig, P., Oelfke, U.

Physics in Medicine and Biology, 58(16):5401-5419, 2013 (article)

PDF DOI [BibTex]

Bangert, M., Hennig, P., Oelfke, U. Analytical probabilistic modeling for radiation therapy treatment planning Physics in Medicine and Biology, 58(16):5401-5419, 2013 (article)

PDF DOI [BibTex]

Animating Samples from Gaussian Distributions

Hennig, P.

(8), Max Planck Institute for Intelligent Systems, Tübingen, Germany, 2013 (techreport)

PDF [BibTex]

Hennig, P. Animating Samples from Gaussian Distributions (8), Max Planck Institute for Intelligent Systems, Tübingen, Germany, 2013 (techreport)

PDF [BibTex]

2012

Quasi-Newton Methods: A New Direction

Hennig, P., Kiefel, M.

In Proceedings of the 29th International Conference on Machine Learning, pages: 25-32, ICML ’12, (Editors: John Langford and Joelle Pineau), Omnipress, New York, NY, USA, ICML, July 2012 (inproceedings)

Abstract

Four decades after their invention, quasi- Newton methods are still state of the art in unconstrained numerical optimization. Although not usually interpreted thus, these are learning algorithms that fit a local quadratic approximation to the objective function. We show that many, including the most popular, quasi-Newton methods can be interpreted as approximations of Bayesian linear regression under varying prior assumptions. This new notion elucidates some shortcomings of classical algorithms, and lights the way to a novel nonparametric quasi-Newton method, which is able to make more efficient use of available information at computational cost similar to its predecessors.

website+code pdf link (url) [BibTex]

2012

Hennig, P., Kiefel, M. Quasi-Newton Methods: A New Direction In Proceedings of the 29th International Conference on Machine Learning, pages: 25-32, ICML ’12, (Editors: John Langford and Joelle Pineau), Omnipress, New York, NY, USA, ICML, July 2012 (inproceedings)

website+code pdf link (url) [BibTex]

Entropy Search for Information-Efficient Global Optimization

Hennig, P., Schuler, C.

Journal of Machine Learning Research, 13, pages: 1809-1837, -, June 2012 (article)

Abstract

Contemporary global optimization algorithms are based on local measures of utility, rather than a probability measure over location and value of the optimum. They thus attempt to collect low function values, not to learn about the optimum. The reason for the absence of probabilistic global optimizers is that the corresponding inference problem is intractable in several ways. This paper develops desiderata for probabilistic optimization algorithms, then presents a concrete algorithm which addresses each of the computational intractabilities with a sequence of approximations and explicitly adresses the decision problem of maximizing information gain from each evaluation.

PDF Web Project Page Project Page [BibTex]

Hennig, P., Schuler, C. Entropy Search for Information-Efficient Global Optimization Journal of Machine Learning Research, 13, pages: 1809-1837, -, June 2012 (article)

PDF Web Project Page Project Page [BibTex]

Kernel Topic Models

Hennig, P., Stern, D., Herbrich, R., Graepel, T.

In Fifteenth International Conference on Artificial Intelligence and Statistics, 22, pages: 511-519, JMLR Proceedings, (Editors: Lawrence, N. D. and Girolami, M.), JMLR.org, AISTATS , 2012 (inproceedings)

Abstract

Latent Dirichlet Allocation models discrete data as a mixture of discrete distributions, using Dirichlet beliefs over the mixture weights. We study a variation of this concept, in which the documents' mixture weight beliefs are replaced with squashed Gaussian distributions. This allows documents to be associated with elements of a Hilbert space, admitting kernel topic models (KTM), modelling temporal, spatial, hierarchical, social and other structure between documents. The main challenge is efficient approximate inference on the latent Gaussian. We present an approximate algorithm cast around a Laplace approximation in a transformed basis. The KTM can also be interpreted as a type of Gaussian process latent variable model, or as a topic model conditional on document features, uncovering links between earlier work in these areas.

PDF Web [BibTex]

Hennig, P., Stern, D., Herbrich, R., Graepel, T. Kernel Topic Models In Fifteenth International Conference on Artificial Intelligence and Statistics, 22, pages: 511-519, JMLR Proceedings, (Editors: Lawrence, N. D. and Girolami, M.), JMLR.org, AISTATS , 2012 (inproceedings)

PDF Web [BibTex]

Approximate Gaussian Integration using Expectation Propagation

Cunningham, J., Hennig, P., Lacoste-Julien, S.

In pages: 1-11, -, January 2012 (inproceedings) Submitted

Abstract

While Gaussian probability densities are omnipresent in applied mathematics, Gaussian cumulative probabilities are hard to calculate in any but the univariate case. We offer here an empirical study of the utility of Expectation Propagation (EP) as an approximate integration method for this problem. For rectangular integration regions, the approximation is highly accurate. We also extend the derivations to the more general case of polyhedral integration regions. However, we find that in this polyhedral case, EP's answer, though often accurate, can be almost arbitrarily wrong. These unexpected results elucidate an interesting and non-obvious feature of EP not yet studied in detail, both for the problem of Gaussian probabilities and for EP more generally.

Web [BibTex]

Cunningham, J., Hennig, P., Lacoste-Julien, S. Approximate Gaussian Integration using Expectation Propagation In pages: 1-11, -, January 2012 (inproceedings) Submitted

Web [BibTex]

2011

Optimal Reinforcement Learning for Gaussian Systems

Hennig, P.

In Advances in Neural Information Processing Systems 24, pages: 325-333, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and F Pereira and KQ Weinberger), Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

Abstract

The exploration-exploitation trade-off is among the central challenges of reinforcement learning. The optimal Bayesian solution is intractable in general. This paper studies to what extent analytic statements about optimal learning are possible if all beliefs are Gaussian processes. A first order approximation of learning of both loss and dynamics, for nonlinear, time-varying systems in continuous time and space, subject to a relatively weak restriction on the dynamics, is described by an infinite-dimensional partial differential equation. An approximate finitedimensional projection gives an impression for how this result may be helpful.

PDF Web [BibTex]

2011

Hennig, P. Optimal Reinforcement Learning for Gaussian Systems In Advances in Neural Information Processing Systems 24, pages: 325-333, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and F Pereira and KQ Weinberger), Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

PDF Web [BibTex]

Recent News

Probabilistic Numerical Methods: A Summary of our Work

2015

Probabilistic Line Searches for Stochastic Optimization

2015

Probabilistic numerics and uncertainty in computations

A Random Riemannian Metric for Probabilistic Shortest-Path Tractography

Probabilistic Interpretation of Linear Solvers

2014

Probabilistic Progress Bars

2014

Probabilistic Solutions to Differential Equations and their Application to Riemannian Statistics

Active Learning of Linear Embeddings for Gaussian Processes

Sampling for Inference in Probabilistic Models with Fast Bayesian Quadrature

Probabilistic Shortest Path Tractography in DTI Using Gaussian Process ODE Solvers

Probabilistic ODE Solvers with Runge-Kutta Means

Incremental Local Gaussian Regression

Efficient Bayesian Local Model Learning for Control

2013

Quasi-Newton Methods: A New Direction

2013

Fast Probabilistic Optimization from Noisy Gradients

Nonparametric dynamics estimation for time periodic systems

Analytical probabilistic proton dose calculation and range uncertainties

Analytical probabilistic modeling for radiation therapy treatment planning

Animating Samples from Gaussian Distributions

2012

Quasi-Newton Methods: A New Direction

2012

Entropy Search for Information-Efficient Global Optimization

Kernel Topic Models

Approximate Gaussian Integration using Expectation Propagation

2011

Optimal Reinforcement Learning for Gaussian Systems

2011

Latest News

Links

Contact Us