I studied physics at LMU Munich and Cambridge University and obtained my masters degree in 2014. Throughout my studies I was mostly interested in quantum condensed matter physics and quantum information theory, but also in classical statistical mechanics. In my masters project with Erwin Frey (LMU Munich), I worked on the emergent behaviour of a population of bacteria whose interactions are described by game theory (Prisoner's dilemma).
Over the course my PhD I have worked on several different topics in machine learning ranging from more theoretic/fundamental to quite applied. My main focus is on (Bayesian) probabilistic modelling, (deep) generative models, and variational inference.
Deep Probabilistic Models
(Deep) Probabilistic models are a class of generative models that learn the distribution of observations using unobserved, stochastic latent variables. Thus, they explain (typically high-dimensional) observations through (typically low-dimensional) latent factors. They are specified by a prior distribution on the latents as well as a likelihood model that connects these latents to the observations. Deep probabilistic models mostly employ flexible deep neural networks (DNNs) to parameterise these mappings, and VAEs are the most well known model class. In addition to a DNN for the likelihood, usually referred to as the decoder, VAEs use a second DNN, the encoder, to perform inference in this otherwise intractable model. Another instance of a deep probabilistic model are, for example, Deep Gaussian Processes, which use individual Gaussian Processes as stackable building blocks.
I am interested in (variational) inference in such deep probabilistic models as well as their design, properties, limitations, and applications.
In a recent work (done during an internship at DeepMind), we proposed Learned Accept/Reject Sampling (LARS), a method for constructing richer priors using rejection sampling with a learned acceptance function. This work is motivated by recent analyses of the VAE objective, which pointed out that commonly used simple priors can lead to underfitting.
Probabilistic Transfer and Meta-Learning
I am involved in two projects on transfer and meta-learning, one concluded and focussed on transfer learning and one ongoing with a focus on meta-learning. In both cases, we use probabilistic models to account for the uncertainty that arises in settings where only very few training data is available. This is either because the data distribution has shifted and we haven't seen many new instances yet (e.g. we trained on cats and dogs but now want to classify tigers and bears) or because each new task only comes with little data. In both cases, we want to leverage previous knowledge and data to improve performance on new tasks.
While the field of meta-learning is moving very quickly, we still have a long way to go as current approaches are not very data-efficient and still generalise surprisingly bad.
Computational Photography: Lens Quality Assessment from Photographs
To learn about neural networks and because I am interested in photography, I worked on a project in computational photography. Simply speaking, computational photography aims to improve images of a camera system by means of intelligent post-processing. The most common examples are smartphone cameras that perform almost on par with much larger and much more expensive DSLR cameras. More generally, computational photography applies to all aspects of the imaging pipeline and, for example, also influences lens/camera design.
In our project, we performed automatic lens quality assessment directly from photographs. Optical lenses (even for expensive DSLR cameras) are often far from perfect; they might be good in the centre, but often pictures are blurred in the corners. A quality measure that describes this blur is the modulation transfer function (MTF) which describes how well black and white stripe patterns of different sizes can be resolved by the camera. For example, in the centre very coarse and very fine line patterns can be resolved, whereas in the corners the coarse pattern still looks good but the fine pattern is blurred and individual lines cannot be resolved anymore. Typically it is expensive and time consuming to measure the MTF in a lab and so we built a system that can estimate the MTF from photographs taken with that lens directly.
I am generally interested in inference with Gaussian Processes as they are (i) probabilistic (provide uncertainties), (ii) data-efficient, (iii) have strong inductive biases that can be designed by an expert or learned.
In my first year, my collaborators and I investigated commonly used Gaussian Process approximations (so-called sparse inducing point methods) and showed that some approximations show pathological behaviour whereas a variational approach behaves similarly to the original model. However, optimisation dynamics for variational approaches can sometimes lead to underfitting.
More recently, we investigated the kernel design of a Gaussian Process to account for invariances in the data. Using recent advances in GP inference, we propose a kernel that can automatically learn to be invariant or insensitive to certain input transformations present in the training set. An alternative interpretation of our results is that we can learn a form of data augmentation via the marginal likelihood of a Gaussian Process.
Inference in Variational Autoencoders
Applications of ML in physics and other sciences
Structure Learning and inductive biases for data-efficient learning
Oct 2014: Master in Physics, Ludwig-Maximilians-University (LMU) Munich
Nov 2014 - Jun 2015: Research Assistant in the group of Erwin Frey, LMU Munich
van der Wilk, M., Bauer, M., John, S. T., Hensman, J.
Advances in Neural Information Processing Systems 31, pages: 9960-9970, (Editors: S. Bengio and H. Wallach and H. Larochelle and K. Grauman and N. Cesa-Bianchi and R. Garnett), Curran Associates, Inc., 32th Annual Conference on Neural Information Processing Systems, December 2018 (conference)
van der Wilk, M., Bauer, M., John, S. T., Hensman, J.
Learning Invariances using the Marginal LikelihoodAdvances in Neural Information Processing Systems 31, pages: 9960-9970, (Editors: S. Bengio and H. Wallach and H. Larochelle and K. Grauman and N. Cesa-Bianchi and R. Garnett), Curran Associates, Inc., 32th Annual Conference on Neural Information Processing Systems, December 2018 (conference)
Advances in Neural Information Processing Systems 29, pages: 1533-1541, (Editors: D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett), Curran Associates, Inc., 30th Annual Conference on Neural Information Processing Systems, December 2016 (conference)
Bauer, M., van der Wilk, M., Rasmussen, C. E.
Understanding Probabilistic Sparse Gaussian Process ApproximationsAdvances in Neural Information Processing Systems 29, pages: 1533-1541, (Editors: D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett), Curran Associates, Inc., 30th Annual Conference on Neural Information Processing Systems, December 2016 (conference)
Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems