Bayesian optimization (BO) is a model-based approach for gradient-free black-box function optimization, such as hyperparameter optimization. Typically, BO relies on conventional Gaussian process regression, whose algorithmic complexity is cubic in the number of evaluations. As a result, Gaussian process-based BO cannot leverage large numbers of past function evaluations, for example, to warm-start related BO runs. After a brief intro to BO and an overview of several use cases at Amazon, I will discuss a multi-task adaptive Bayesian linear regression model, whose computational complexity is attractive (linear) in the number of function evaluations and able to leverage information of related black-box functions through a shared deep neural net. Experimental results show that the neural net learns a representation suitable for warm-starting related BO runs and that they can be accelerated when the target black-box function (e.g., validation loss) is learned together with other related signals (e.g., training loss). The proposed method was found to be at least one order of magnitude faster than competing neural network-based methods recently published in the literature. This is joint work with Valerio Perrone, Rodolphe Jenatton, and Matthias Seeger.
Organizers: Isabel Valera
In this talk first an introduction to the double machine learning framework is given. This allows inference on parameters in high-dimensional settings. Then, two applications are given, namely transformation models and Gaussian graphical models in high-dimensional settings. Both kind of models are widely used by practitioners. As high-dimensional data sets become more and more available, it is important to allow situations where the number of parameters is large compared to the sample size.
Organizers: Philipp Geiger
Abstract: Sequential Monte Carlo (SMC) methods (including the particle filters and smoothers) allows us to compute probabilistic representations of the unknown objects in models used to represent for example nonlinear dynamical systems. This talk has three connected parts: 1. A (hopefully pedagogical) introduction to probabilistic modelling of dynamical systems and an explanation of the SMC method. 2. In learning unknown parameters appearing in nonlinear state-space models using maximum likelihood it is natural to make use of SMC to compute unbiased estimates of the intractable likelihood. The challenge is that the resulting optimization problem is stochastic, which recently inspired us to construct a new solution to this problem. 3. A challenge with the above (and in fact with most use of SMC) is that it all quickly becomes very technical. This is indeed the key challenging in spreading the use of SMC methods to a wider group of users. At the same time there are many researchers who would benefit a lot from having access to these methods in their daily work and for those of us already working with them it is essential to reduce the amount of time spent on new problems. We believe that the solution to this can be provided by probabilistic programming. We are currently developing a new probabilistic programming language that we call Birch. A pre-release is available from birch-lang.org/ It allow users to use SMC methods without having to implement the algorithms on their own.
Organizers: Philipp Hennig
In this talk I will describe the main types of research questions and neuroimaging tools used in my work in human cognitive neuroscience (with foci in audition and sleep), some of the existing approaches used to analyze our data, and their limitations. I will then discuss the main practical obstacles to applying machine learning methods in our field. Several of my ongoing and planned projects include research questions that could be addressed and perhaps considerably extended using machine learning approaches; I will describe some specific datasets and problems, with the goal of exploring ideas and potentially opportunities for collaboration.
Organizers: Mara Cascianelli
In academic and policy circles, there has been considerable interest in the impact of “big data” on firm performance. We examine the question of how the amount of data impacts the accuracy of Machine Learned models of weekly retail product forecasts using a proprietary data set obtained from Amazon. We examine the accuracy of forecasts in two relevant dimensions: the number of products (N), and the number of time periods for which a product is available for sale (T). Theory suggests diminishing returns to larger N and T, with relative forecast errors diminishing at rate 1/sqrt(N) + 1/sqrt(T) . Empirical results indicate gains in forecast improvement in the T dimension; as more and more data is available for a particular product, demand forecasts for that product improve over time, though with diminishing returns to scale. In contrast, we find an essentially flat N effect across the various lines of merchandise: with a few exceptions, expansion in the number of retail products within a category does not appear associated with increases in forecast performance. We do find that the firm’s overall forecast performance, controlling for N and T effects across product lines, has improved over time, suggesting gradual improvements in forecasting from the introduction of new models and improved technology.
In this talk, I'd like to discuss the intertwining importance and connections of three principles of data science in the title. They will be demonstrated in the context of two collaborative projects in neuroscience and genomics, respectively. The first project in neuroscience uses transfer learning to integrate fitted convolutional neural networks (CNNs) on ImageNet with regression methods to provide predictive and stable characterizations of neurons from the challenging primary visual cortex V4. The second project proposes iterative random forests (iRF) as a stablized RF to seek predictable and interpretable high-order interactions among biomolecules.
Organizers: Michel Besserve
Optic flow offers a rich source of information about an organism’s environment. Flies, for instance, are thought to make use of motion vision to control and stabilise their course during acrobatic airborne manoeuvres. How these computations are implemented in neural hardware and how such circuits cope with the visual complexity of natural scenes, however, remain open questions. This talk outlines some of the progress we have made in unraveling the computational substrate underlying optic flow processing in Drosophila. In particular, I will focus on our efforts to connect neural mechanisms and real-world demands via task-driven modelling.
Organizers: Michel Besserve
Probabilistic modeling is the method of choice when it comes to reasoning under uncertainty. However, one of the main practical downsides of probabilistic models is that inference, i.e. the process of using the model to answer statistical queries, is notoriously hard in general. This led to a common folklore that probabilistic models which allow exact inference are necessarily simplistic and undermodel any practical task. In this talk, I will present sum-product networks (SPNs), a recently proposed architecture representing a rich and expressive class of probability distributions, which also allows exact and efficient computation of many inference tasks. I will discuss representational properties, inference routines and learning approaches in SPNs. Furthermore, I will provide some examples of practical applications using SPNs.
Machine learning has become a popular application domain for modern optimization techniques, pushing its algorithmic frontier. The need for large scale optimization algorithms which can handle millions of dimensions or data points, typical for the big data era, have brought a resurgence of interest for first order algorithms, making us revisit the venerable stochastic gradient method [Robbins-Monro 1951] as well as the Frank-Wolfe algorithm [Frank-Wolfe 1956]. In this talk, I will review recent improvements on these algorithms which can exploit the structure of modern machine learning approaches. I will explain why the Frank-Wolfe algorithm has become so popular lately; and present a surprising tweak on the stochastic gradient method which yields a fast linear convergence rate. Motivating applications will include weakly supervised video analysis and structured prediction problems.
Organizers: Philipp Hennig
Under acute threat, biological agents need to choose adaptive actions to survive. In my talk, I will provide a decision-theoretic view on this problem and ask, what are potential computational algorithms for this choice, and how are they implemented in neural circuits. Rational design principles and non-human animal data tentatively suggest a specific architecture that heavily relies on tailored algorithms for specific threat scenarios. Virtual reality computer games provide an opportunity to translate non-human animal tasks to humans and investigate these algorithms across species. I will discuss the specific challenges for empirical inference on underlying neural circuits given such architecture.
Organizers: Michel Besserve