In machine learning, the standard explanation of Ockham's razor is to minimize predictive risk. But prediction is interpreted passively---one may not rely on predictions to change the probability distribution used for training. That limitation may be overcome by studying alternatively manipulated systems in randomized experimental trials, but experiments on multivariate systems or on human subjects are often infeasible or immoral. Happily, the past three decades have witnessed the development of a range of statistical techniques for discovering causal relations from non-experimental data. One characteristic of such methods is a strong Ockham bias toward simpler causal theories---i.e., theories with fewer causal connections among the variables of interest. Our question is what Ockham's razor has to do with finding true (rather than merely plausible) causal theories from non-experimental data. The traditional story of minimizing predictive risk does not apply, because uniform consistency is often infeasible in non-experimental causal discovery: without strong and implausible assumptions, the probability of erroneous causal orientation may be arbitrarily high at any sample size. The standard justification for causal discovery methods is point-wise consistency, or convergence in probability to the true causes. But Ockham's razor is not necessary for point-wise convergence: a Bayesian with a strong prior bias toward a complex model would also be point-wise consistent. Either way, the crucial Ockham bias remains disconnected from learning performance. A method reverses its opinion in probability when it probably says A at some sample size and probably says B incompatible with A at a higher sample size. A method cycles in probability when it probably says A, then probably says B incompatible with A, and then probably says A again. Uniform consistency allows for no reversals or cycles in probability. Point-wise consistency allows for arbitrarily many. Lying plausibly between those two extremes is straightest possible convergence to the truth, which allows for only as many cycles and reversals in probability as are necessary to solve the learning problem at hand. We show that Ockham's razor is necessary for cycle-minimal convergence and that patience, or waiting for nature to choose among simplest theories, is necessary for reversal-minimal convergence. The idea yields very tight constraints on inductive statistical methods, both classical and Bayesian, with causal discovery methods as an important special case. It also provides a valid interpretation of significance and power when tests are used to fish inductively for models. The talk is self-contained for a general scientific audience. Novel concepts are illustrated amply with figures and simulations.
Facebook serves close to a billion people every day, who are only able to consume a small subset of the information available to them. In this talk I will give some examples of how machine learning is used to personalize people’s Facebook experience. I will also present some data science experiments with fairly counter-intuitive results.
Stochastic differential equations (SDEs) arise naturally as descriptions of continuous time dynamical systems. My talk addresses the problem of inferring the dynamical state and parameters of such systems from observations taken at discrete times. I will discuss the application of approximate inference methods such as the variational method and expectation propagation and show how higher dimensional systems can be treated by a mean field approximation. In the second part of my talk I will discuss the nonparametric estimation of the drift (i.e. the deterministic part of the ‘force’ which governs the dynamics) as a function of the state using Gaussian process approaches.
The recent theory of compressive sensing predicts that (approximately) sparse vectors can be recovered from vastly incomplete linear measurements using efficient algorithms. This principle has a large number of potential applications in signal and image processing, machine learning and more. Optimal measurement matrices in this context known so far are based on randomness. Recovery algorithms include convex optimization approaches (l1-minimization) as well as greedy methods. Gaussian and Bernoulli random matrices are provably optimal in the sense that the smallest possible number of samples is required. Such matrices, however, are of limited practical interest because of the lack of any structure. In fact, applications demand for certain structure so that there is only limited freedom to inject randomness. We present recovery results for various structured random matrices including random partial Fourier matrices and partial random circulant matrices. We will also review recent extensions of compressive sensing for recovering matrices of low rank from incomplete information via efficient algorithms such as nuclear norm minimization. This principle has recently found applications for phaseless estimation, i.e., in situations where only the magnitude of measurements is available. Another extension considers the recovery of low rank tensors (multi-dimensional arrays) from incomplete linear information. Several obstacles arise when passing from matrices and tensors such as the lack of a singular value decomposition which shares all the nice properties of the matrix singular value decomposition. Although only partial theoretical results are available, we discuss algorithmic approaches for this problem.
Organizers: Michel Besserve
This talk reviews differential equations on manifolds of matrices or tensors of low rank. They serve to approximate, in a low-rank format, large time-dependent matrices and tensors that are either given explicitly via their increments or are unknown solutions of differential equations. Furthermore, low-rank differential equations are used in novel algorithms for eigenvalue optimisation, for instance in robust-stability problems.
Organizers: Philipp Hennig
Gaussian process regression is a non-parametric Bayesian machine learning paradigm, where instead of estimating parameters of fixed-form functions, we model the whole unknown functions as Gaussian processes. Gaussian processes are also commonly used for representing uncertainties in models of dynamic systems in many applications such as tracking, navigation, and automatic control systems. The latter models are often formulated as state-space models, where the use of non-linear Kalman filter type of methods is common. The aim of this talk is to discuss connections of Kalman filtering methods and Gaussian process regression. In particular, I discuss representations of Gaussian processes as state-space models, which enable the use of computationally efficient Kalman-filter-based (or more general Bayesian-filter-based) solutions to Gaussian process regression problems. This also allows for computationally efficient inference in latent force models (LFM), which are models combining first-principles mechanical models with non-parametric Gaussian process regression models.
Organizers: Philipp Hennig
(joint work with Jan. C. Neddermeyer) A technique for online estimation of spot volatility for high-frequency data is developed. The algorithm works directly on the transaction data and updates the volatility estimate immediately after the occurrence of a new transaction. Furthermore, a nonlinear market microstructure noise model is proposed that reproduces several stylized facts of high frequency data. A computationally efficient particle filter is used that allows for the approximation of the unknown efficient prices and, in combination with a recursive EM algorithm, for the estimation of the volatility curve. We neither assume that the transaction times are equidistant nor do we use interpolated prices. We also make a distinction between volatility per time unit and volatility per transaction and provide estimators for both. More precisely we use a model with random time change where spot volatility is decomposed into spot volatility per transaction times the trading intensity - thus highlighting the influence of trading intensity on volatility.
Organizers: Michel Besserve
Our ability to understand a scene is central to how we interact with our environment and with each other. Classic research on visual scene perception has focused on how people "know what is where by looking", but this talk will explore people's ability to infer the "hows" and "whys" of their world, and in particular, how they form a physical understanding of a scene. From a glance we can know so much: not only what objects are where, but whether they are movable, fragile, slimy, or hot; whether they were made by hand, by machine, or by nature; whether they are broken and how they could be repaired; and so on. I posit that these common-sense physical intuitions are made possible by the brain's sophisticated capacity for constructing and manipulating a rich mental representation of a scene via a mechanism of approximate probabilistic simulation -- in short, a physics engine in the head. I will present a series of recent and ongoing studies that develop and test this computational model in a variety of prediction, inference, and planning tasks. Our model captures various aspects of people's experimental judgments, including the accuracy of their performance as well as several illusions and errors. These results help explain core aspects of human mental models that are instrumental to how we understand and act in our everyday world. They also open new directions for developing robotic and AI systems that can perceive, reason, and act the way people do.
Organizers: Michel Besserve