I’ll start with a concept of 1990 that has become popular: unsupervised learning without a teacher through two adversarial neural networks (NNs) that duel in a minimax game, where one NN minimizes the objective function maximized by the other. The first NN generates data through its output actions, the second NN predicts the data. The second NN minimizes its error, thus becoming a better predictor. But it is a zero sum game: the first NN tries to find actions that maximize the error of the second NN. The system exhibits what I called “artificial curiosity” because the first NN is motivated to invent actions that yield data that the second NN still finds surprising, until the data becomes familiar and eventually boring. A similar adversarial zero sum game was used for another unsupervised method called "predictability minimization," where two NNs fight each other to discover a disentangled code of the incoming data (since 1991), remarkably similar to codes found in biological brains. I’ll also discuss passive unsupervised learning through predictive coding of an agent’s observation stream (since 1991) to overcome the fundamental deep learning problem through data compression. I’ll offer thoughts as to why most current commercial applications don’t use unsupervised learning, and whether that will change in the future.
Organizers: Bernhard Schölkopf
In an effort to improve the performance of deep neural networks in data-scarce, non-i.i.d., or unsupervised settings, much recent research has been devoted to encoding invariance under symmetry transformations into neural network architectures. We treat the neural network input and output as random variables, and consider group invariance from the perspective of probabilistic symmetry. Drawing on tools from probability and statistics, we establish a link between functional and probabilistic symmetry, and obtain functional representations of probability distributions that are invariant or equivariant under the action of a compact group. Those representations characterize the structure of neural networks that can be used to represent such distributions and yield a general program for constructing invariant stochastic or deterministic neural networks. We develop the details of the general program for exchangeable sequences and arrays, recovering a number of recent examples as special cases. This is work in collaboration with Yee Whye Teh. https://arxiv.org/abs/1901.06082
Machine learning with artificial neural networks is revolutionizing science. The most advanced challenges require discovering answers autonomously. In the domain of reinforcement learning, control strategies are improved according to a reward function. The power of neural-network-based reinforcement learning has been highlighted by spectacular recent successes such as playing Go, but its benefits for physics are yet to be demonstrated. Here, we show how a network-based "agent" can discover complete quantum-error-correction strategies, protecting a collection of qubits against noise. These strategies require feedback adapted to measurement outcomes. Finding them from scratch without human guidance and tailored to different hardware resources is a formidable challenge due to the combinatorially large search space. To solve this challenge, we develop two ideas: two-stage learning with teacher and student networks and a reward quantifying the capability to recover the quantum information stored in a multiqubit system. Beyond its immediate impact on quantum computation, our work more generally demonstrates the promise of neural-network-based reinforcement learning in physics.
Organizers: Matthias Bauer
Bayesian optimization (BO) is a model-based approach for gradient-free black-box function optimization, such as hyperparameter optimization. Typically, BO relies on conventional Gaussian process regression, whose algorithmic complexity is cubic in the number of evaluations. As a result, Gaussian process-based BO cannot leverage large numbers of past function evaluations, for example, to warm-start related BO runs. After a brief intro to BO and an overview of several use cases at Amazon, I will discuss a multi-task adaptive Bayesian linear regression model, whose computational complexity is attractive (linear) in the number of function evaluations and able to leverage information of related black-box functions through a shared deep neural net. Experimental results show that the neural net learns a representation suitable for warm-starting related BO runs and that they can be accelerated when the target black-box function (e.g., validation loss) is learned together with other related signals (e.g., training loss). The proposed method was found to be at least one order of magnitude faster than competing neural network-based methods recently published in the literature. This is joint work with Valerio Perrone, Rodolphe Jenatton, and Matthias Seeger.
In this talk first an introduction to the double machine learning framework is given. This allows inference on parameters in high-dimensional settings. Then, two applications are given, namely transformation models and Gaussian graphical models in high-dimensional settings. Both kind of models are widely used by practitioners. As high-dimensional data sets become more and more available, it is important to allow situations where the number of parameters is large compared to the sample size.
Organizers: Philipp Geiger
Abstract: Sequential Monte Carlo (SMC) methods (including the particle filters and smoothers) allows us to compute probabilistic representations of the unknown objects in models used to represent for example nonlinear dynamical systems. This talk has three connected parts: 1. A (hopefully pedagogical) introduction to probabilistic modelling of dynamical systems and an explanation of the SMC method. 2. In learning unknown parameters appearing in nonlinear state-space models using maximum likelihood it is natural to make use of SMC to compute unbiased estimates of the intractable likelihood. The challenge is that the resulting optimization problem is stochastic, which recently inspired us to construct a new solution to this problem. 3. A challenge with the above (and in fact with most use of SMC) is that it all quickly becomes very technical. This is indeed the key challenging in spreading the use of SMC methods to a wider group of users. At the same time there are many researchers who would benefit a lot from having access to these methods in their daily work and for those of us already working with them it is essential to reduce the amount of time spent on new problems. We believe that the solution to this can be provided by probabilistic programming. We are currently developing a new probabilistic programming language that we call Birch. A pre-release is available from birch-lang.org/ It allow users to use SMC methods without having to implement the algorithms on their own.
Organizers: Philipp Hennig
In this talk I will describe the main types of research questions and neuroimaging tools used in my work in human cognitive neuroscience (with foci in audition and sleep), some of the existing approaches used to analyze our data, and their limitations. I will then discuss the main practical obstacles to applying machine learning methods in our field. Several of my ongoing and planned projects include research questions that could be addressed and perhaps considerably extended using machine learning approaches; I will describe some specific datasets and problems, with the goal of exploring ideas and potentially opportunities for collaboration.
Organizers: Mara Cascianelli
In academic and policy circles, there has been considerable interest in the impact of “big data” on firm performance. We examine the question of how the amount of data impacts the accuracy of Machine Learned models of weekly retail product forecasts using a proprietary data set obtained from Amazon. We examine the accuracy of forecasts in two relevant dimensions: the number of products (N), and the number of time periods for which a product is available for sale (T). Theory suggests diminishing returns to larger N and T, with relative forecast errors diminishing at rate 1/sqrt(N) + 1/sqrt(T) . Empirical results indicate gains in forecast improvement in the T dimension; as more and more data is available for a particular product, demand forecasts for that product improve over time, though with diminishing returns to scale. In contrast, we find an essentially flat N effect across the various lines of merchandise: with a few exceptions, expansion in the number of retail products within a category does not appear associated with increases in forecast performance. We do find that the firm’s overall forecast performance, controlling for N and T effects across product lines, has improved over time, suggesting gradual improvements in forecasting from the introduction of new models and improved technology.
In this talk, I'd like to discuss the intertwining importance and connections of three principles of data science in the title. They will be demonstrated in the context of two collaborative projects in neuroscience and genomics, respectively. The first project in neuroscience uses transfer learning to integrate fitted convolutional neural networks (CNNs) on ImageNet with regression methods to provide predictive and stable characterizations of neurons from the challenging primary visual cortex V4. The second project proposes iterative random forests (iRF) as a stablized RF to seek predictable and interpretable high-order interactions among biomolecules.
Organizers: Michel Besserve
Optic flow offers a rich source of information about an organism’s environment. Flies, for instance, are thought to make use of motion vision to control and stabilise their course during acrobatic airborne manoeuvres. How these computations are implemented in neural hardware and how such circuits cope with the visual complexity of natural scenes, however, remain open questions. This talk outlines some of the progress we have made in unraveling the computational substrate underlying optic flow processing in Drosophila. In particular, I will focus on our efforts to connect neural mechanisms and real-world demands via task-driven modelling.
Organizers: Michel Besserve