Header logo is ei


2011


Learning Output Kernels with Block Coordinate Descent
Learning Output Kernels with Block Coordinate Descent

Dinuzzo, F., Ong, C. S., Gehler, P., Pillonetto, G.

In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages: 49-56, ICML ’11, (Editors: Getoor, Lise and Scheffer, Tobias), ACM, New York, NY, USA, ICML, June 2011 (inproceedings)

data+code pdf [BibTex]

2011

data+code pdf [BibTex]


no image
Finding dependencies between frequencies with the kernel cross-spectral density

Besserve, M., Janzing, D., Logothetis, N., Schölkopf, B.

In pages: 2080-2083 , IEEE, Piscataway, NJ, USA, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , May 2011 (inproceedings)

Abstract
Cross-spectral density (CSD), is widely used to find linear dependency between two real or complex valued time series. We define a non-linear extension of this measure by mapping the time series into two Reproducing Kernel Hilbert Spaces. The dependency is quantified by the Hilbert Schmidt norm of a cross-spectral density operator between these two spaces. We prove that, by choosing a characteristic kernel for the mapping, this quantity detects any pairwise dependency between the time series. Then we provide a fast estimator for the Hilbert-Schmidt norm based on the Fast Fourier Trans form. We demonstrate the interest of this approach to quantify non-linear dependencies between frequency bands of simulated signals and intra-cortical neural recordings.

Web DOI [BibTex]

Web DOI [BibTex]


no image
Trajectory Planning for Optimal Robot Catching in Real-Time

Lampariello, R., Nguyen-Tuong, D., Castellini, C., Hirzinger, G., Peters, J.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2011), pages: 3719-3726 , IEEE, Piscataway, NJ, USA, IEEE International Conference on Robotics and Automation (ICRA), May 2011 (inproceedings)

Abstract
Many real-world tasks require fast planning of highly dynamic movements for their execution in real-time. The success often hinges on quickly finding one of the few plans that can achieve the task at all. A further challenge is to quickly find a plan which optimizes a desired cost. In this paper, we will discuss this problem in the context of catching small flying targets efficiently. This can be formulated as a non-linear optimization problem where the desired trajectory is encoded by an adequate parametric representation. The optimizer generates an energy-optimal trajectory by efficiently using the robot kinematic redundancy while taking into account maximal joint motion, collision avoidance and local minima. To enable the resulting method to work in real-time, examples of the global planner are generalized using nearest neighbour approaches, Support Vector Machines and Gaussian process regression, which are compared in this context. Evaluations indicate that the presented method is highly efficient in complex tasks such as ball-catching.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Implementation of a teleoperation system to test control of the haptic master of a surgical robot

Wang, H., Hong, A., Cho, JH., Lee, DY.

In Institute of Control, Robotics and Systems, Bucheon, South Korea, 26th ICROS Annual Conference (ICROS), May 2011 (inproceedings)

[BibTex]

[BibTex]


no image
Fronto-Parietal Gamma-Oscillations are a Cause of Performance Variation in Brain-Computer Interfacing

Grosse-Wentrup, M.

In pages: 384-387, IEEE, Piscataway, NJ, USA, 5th International IEEE/EMBS Conference on Neural Engineering (NER) , May 2011 (inproceedings)

Abstract
In recent work, we have provided evidence that fronto-parietal γ-oscillations of the electromagnetic field of the brain modulate the sensorimotor-rhythm. It is unclear, however, what impact this effect may have on explaining and addressing within-subject performance variations of brain-computer interfaces (BCIs). In this paper, we provide evidence that on a group-average classification accuracies in a two-class motor-imagery paradigm differ by up to 22.2% depending on the state of fronto-parietal γ-power. As such, this effect may have a large impact on the design of future BCI-systems. We further investigate whether adapting classification procedures to the current state of γ-power improves classification accuracy, and discuss other approaches to exploiting this effect.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
A Flexible Hybrid Framework for Modeling Complex Manipulation Tasks

Kroemer, O., Peters, J.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2011), pages: 1856-1861 , IEEE, Piscataway, NJ, USA, IEEE International Conference on Robotics and Automation (ICRA), May 2011 (inproceedings)

Abstract
Future service robots will need to perform a wide range of tasks using various objects. In order to perform complex tasks, robots require a suitable internal representation of the task. We propose a hybrid framework for representing manipulation tasks, which combines continuous motion planning and discrete task-level planning. In addition, we use a mid-level planner to optimize individual actions according to the plan. The proposed framework incorporates biologically-inspired concepts, such as affordances and motor primitives, in order to efficiently plan for manipulation tasks. The final framework is modular, can generalize well to different situations, and is straightforward to expand. Our demonstrations also show how the use of affordances and mid-level planning can lead to improved performance.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Fast Convergent Algorithms for Expectation Propagation Approximate Bayesian Inference

Seeger, M., Nickisch, H.

In JMLR Workshop and Conference Proceedings Volume 15: AISTATS 2011, pages: 652-660, (Editors: Gordon, G. , D. Dunson, M. Dudík ), MIT Press, Cambridge, MA, USA, 14th International Conference on Artificial Intelligence and Statistics, April 2011 (inproceedings)

Abstract
We propose a novel algorithm to solve the expectation propagation relaxation of Bayesian inference for continuous-variable graphical models. In contrast to most previous algorithms, our method is provably convergent. By marrying convergent EP ideas from (Opper&Winther, 2005) with covariance decoupling techniques (Wipf&Nagarajan, 2008; Nickisch&Seeger, 2009), it runs at least an order of magnitude faster than the most common EP solver.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Active Exploration for Robot Parameter Selection in Episodic Reinforcement Learning

Kroemer, O., Peters, J.

In Proceedings of the 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL 2011), pages: 25-31, IEEE, Piscataway, NJ, USA, IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), April 2011 (inproceedings)

Abstract
As the complexity of robots and other autonomous systems increases, it becomes more important that these systems can adapt and optimize their settings actively. However, such optimization is rarely trivial. Sampling from the system is often expensive in terms of time and other costs, and excessive sampling should therefore be avoided. The parameter space is also usually continuous and multi-dimensional. Given the inherent exploration-exploitation dilemma of the problem, we propose treating it as an episodic reinforcement learning problem. In this reinforcement learning framework, the policy is defined by the system's parameters and the rewards are given by the system's performance. The rewards accumulate during each episode of a task. In this paper, we present a method for efficiently sampling and optimizing in continuous multidimensional spaces. The approach is based on Gaussian process regression, which can represent continuous non-linear mappings from parameters to system performance. We employ an upper confidence bound policy, which explicitly manages the trade-off between exploration and exploitation. Unlike many other policies for this kind of problem, we do not rely on a discretization of the action space. The presented method was evaluated on a real robot. The robot had to learn grasping parameters in order to adapt its grasping execution to different objects. The proposed method was also tested on a more general gain tuning problem. The results of the experiments show that the presented method can quickly determine suitable parameters and is applicable to real online learning applications.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Relative Entropy Inverse Reinforcement Learning

Boularias, A., Kober, J., Peters, J.

In JMLR Workshop and Conference Proceedings Volume 15: AISTATS 2011, pages: 182-189, (Editors: Gordon, G. , D. Dunson, M. Dudík ), MIT Press, Cambridge, MA, USA, Fourteenth International Conference on Artificial Intelligence and Statistics, April 2011 (inproceedings)

Abstract
We consider the problem of imitation learning where the examples, demonstrated by an expert, cover only a small part of a large state space. Inverse Reinforcement Learning (IRL) provides an efficient tool for generalizing the demonstration, based on the assumption that the expert is optimally acting in a Markov Decision Process (MDP). Most of the past work on IRL requires that a (near)-optimal policy can be computed for different reward functions. However, this requirement can hardly be satisfied in systems with a large, or continuous, state space. In this paper, we propose a model-free IRL algorithm, where the relative entropy between the empirical distribution of the state-action trajectories under a uniform policy and their distribution under the learned policy is minimized by stochastic gradient descent. We compare this new approach to well-known IRL algorithms using approximate MDP models. Empirical results on simulated car racing, gridworld and ball-in-a-cup problems show that our approach is able to learn good policies from a small number of demonstrations.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Removing noise from astronomical images using a pixel-specific noise model

Burger, H., Schölkopf, B., Harmeling, S.

In pages: 8, (Editors: H Lensch and SL Narasimhan and ME Testorf), IEEE, Piscataway, NJ, USA, IEEE International Conference on Computational Photography (ICCP), April 2011 (inproceedings)

Abstract
For digital photographs of astronomical objects, where exposure times are usually long and ISO settings high, the so-called dark-current is a significant source of noise. Dark-current refers to thermally generated electrons and is therefore present even in the absence of light. This paper presents a novel approach for denoising astronomical images that have been corrupted by dark-current noise. Our method relies on a probabilistic description of the dark-current of each pixel of a given camera. The noise model is then combined with an image prior which is adapted to astronomical images. In a laboratory environment, we use a black and white CCD camera containing a cooling unit and show that our method is superior to existing methods in terms of root mean squared error. Furthermore, we show that our method is practically relevant by providing visually more appealing results on astronomical photographs taken with a single lens reflex CMOS camera.

Web DOI [BibTex]

Web DOI [BibTex]


no image
Cooperative Cuts

Jegelka, S.

COSA Workshop: Combinatorial Optimization, Statistics, and Applications, March 2011 (talk)

Abstract
Combinatorial problems with submodular cost functions have recently drawn interest. In a standard combinatorial problem, the sum-of-weights cost is replaced by a submodular set function. The result is a powerful model that is though very hard. In this talk, I will introduce cooperative cuts, minimum cuts with submodular edge weights. I will outline methods to approximately solve this problem, and show an application in computer vision. If time permits, the talk will also sketch regret-minimizing online algorithms for submodular-cost combinatorial problems. This is joint work with Jeff Bilmes (University of Washington).

Web [BibTex]

Web [BibTex]


no image
Towards Motor Skill Learning for Robotics

Peters, J., Mülling, K., Kober, J., Nguyen-Tuong, D., Kroemer, O.

In Robotics Research, pages: 469-482, (Editors: Pradalier, C. , R. Siegwart, G. Hirzinger), Springer, Berlin, Germany, 14th International Symposium on Robotics Research (ISRR), January 2011 (inproceedings)

Abstract
Learning robots that can acquire new motor skills and refine existing one has been a long standing vision of robotics, artificial intelligence, and the cognitive sciences. Early steps towards this goal in the 1980s made clear that reasoning and human insights will not suffice. Instead, new hope has been offered by the rise of modern machine learning approaches. However, to date, it becomes increasingly clear that off-the-shelf machine learning approaches will not suffice for motor skill learning as these methods often do not scale into the high-dimensional domains of manipulator and humanoid robotics nor do they fulfill the real-time requirement of our domain. As an alternative, we propose to break the generic skill learning problem into parts that we can understand well from a robotics point of view. After designing appropriate learning approaches for these basic components, these will serve as the ingredients of a general approach to motor skill learning. In this paper, we discuss our recent and current progress in this direction. For doing so, we present our work on learning to control, on learning elementary movements as well as our steps towards learning of complex tasks. We show several evaluations both using real robots as well as physically realistic simulations.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Learning Visual Representations for Interactive Systems

Piater, J., Jodogne, S., Detry, R., Kraft, D., Krüger, N., Kroemer, O., Peters, J.

In Robotics Research, pages: 399-416, (Editors: Pradalier, C. , R. Siegwart, G. Hirzinger), Springer, Berlin, Germany, 14th International Symposium on Robotics Research (ISRR), January 2011 (inproceedings)

Abstract
We describe two quite different methods for associating action parameters to visual percepts. Our RLVC algorithm performs reinforcement learning directly on the visual input space. To make this very large space manageable, RLVC interleaves the reinforcement learner with a supervised classification algorithm that seeks to split perceptual states so as to reduce perceptual aliasing. This results in an adaptive discretization of the perceptual space based on the presence or absence of visual features. Its extension RLJC also handles continuous action spaces. In contrast to the minimalistic visual representations produced by RLVC and RLJC, our second method learns structural object models for robust object detection and pose estimation by probabilistic inference. To these models, the method associates grasp experiences autonomously learned by trial and error. These experiences form a non-parametric representation of grasp success likelihoods over gripper poses, which we call a gra sp d ensi ty. Thus, object detection in a novel scene simultaneously produces suitable grasping options.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
A Non-Parametric Approach to Dynamic Programming

Kroemer, O., Peters, J.

In Advances in Neural Information Processing Systems 24, pages: 1719-1727, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and F Pereira and KQ Weinberger), Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

Abstract
In this paper, we consider the problem of policy evaluation for continuousstate systems. We present a non-parametric approach to policy evaluation, which uses kernel density estimation to represent the system. The true form of the value function for this model can be determined, and can be computed using Galerkin’s method. Furthermore, we also present a unified view of several well-known policy evaluation methods. In particular, we show that the same Galerkin method can be used to derive Least-Squares Temporal Difference learning, Kernelized Temporal Difference learning, and a discrete-state Dynamic Programming solution, as well as our proposed method. In a numerical evaluation of these algorithms, the proposed approach performed better than the other methods.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Transfer Learning with Copulas

Lopez-Paz, D., Hernandez-Lobato, J.

In pages: 2, NIPS, Workshop on Copulas in Machine Learning, 2011 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Denoising sparse noise via online dictionary learning

Cherian, A., Sra, S., Papanikolopoulos, N.

In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, pages: 2060 -2063, IEEE, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2011 (inproceedings)

DOI [BibTex]

DOI [BibTex]


no image
Support Vector Machines for finding deletions and short insertions using paired-end short reads

Grimm, D., Hagmann, J., König, D., Weigel, D., Borgwardt, KM.

International Conference on Intelligent Systems for Molecular Biology (ISMB), 2011 (poster)

Web [BibTex]

Web [BibTex]


no image
PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Deisenroth, MP., Rasmussen, CE.

In Proceedings of the 28th International Conference on Machine Learning, ICML 2011, pages: 465-472, (Editors: L Getoor and T Scheffer), Omnipress, 2011 (inproceedings)

Abstract
In this paper, we introduce PILCO, a practical, data-efficient model-based policy search method. PILCO reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way. By learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning, PILCO can cope with very little data and facilitates learning from scratch in only a few trials. Policy evaluation is performed in closed form using state-of-the-art approximate inference. Furthermore, policy gradients are computed analytically for policy improvement. We report unprecedented learning efficiency on challenging and high-dimensional control tasks.

Web [BibTex]

Web [BibTex]


no image
Kernel Bayes’ Rule

Fukumizu, K., Song, L., Gretton, A.

In Advances in Neural Information Processing Systems 24, pages: 1737-1745, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and F Pereira and KQ Weinberger), Curran Associates, Inc., Red Hook, NY, USA, Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Optimal Reinforcement Learning for Gaussian Systems

Hennig, P.

In Advances in Neural Information Processing Systems 24, pages: 325-333, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and F Pereira and KQ Weinberger), Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

Abstract
The exploration-exploitation trade-off is among the central challenges of reinforcement learning. The optimal Bayesian solution is intractable in general. This paper studies to what extent analytic statements about optimal learning are possible if all beliefs are Gaussian processes. A first order approximation of learning of both loss and dynamics, for nonlinear, time-varying systems in continuous time and space, subject to a relatively weak restriction on the dynamics, is described by an infinite-dimensional partial differential equation. An approximate finitedimensional projection gives an impression for how this result may be helpful.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Efficient inference in matrix-variate Gaussian models with iid observation noise

Stegle, O., Lippert, C., Mooij, J., Lawrence, N., Borgwardt, K.

In Advances in Neural Information Processing Systems 24, pages: 630-638, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and F Pereira and KQ Weinberger), Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

Abstract
Inference in matrix-variate Gaussian models has major applications for multioutput prediction and joint learning of row and column covariances from matrixvariate data. Here, we discuss an approach for efficient inference in such models that explicitly account for iid observation noise. Computational tractability can be retained by exploiting the Kronecker product between row and column covariance matrices. Using this framework, we show how to generalize the Graphical Lasso in order to learn a sparse inverse covariance between features while accounting for a low-rank confounding covariance between samples. We show practical utility on applications to biology, where we model covariances with more than 100,000 dimensions. We find greater accuracy in recovering biological network structures and are able to better reconstruct the confounders.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Expectation Propagation for the Estimation of Conditional Bivariate Copulas

Hernandez-Lobato, J., Lopez-Paz, D., Gharhamani, Z.

In pages: 2, NIPS, Workshop on Copulas in Machine Learning, 2011 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Efficient Similarity Search for Covariance Matrices via the Jensen-Bregman LogDet Divergence

Cherian, A., Sra, S., Banerjee, A., Papanikolopoulos, N.

In IEEE International Conference on Computer Vision, ICCV 2011, pages: 2399-2406, (Editors: DN Metaxas and L Quan and A Sanfeliu and LJ Van Gool), IEEE, 13th International Conference on Computer Vision (ICCV), 2011 (inproceedings)

DOI [BibTex]

DOI [BibTex]


no image
Introducing the detection of auditory error responses based on BCI technology for passive interaction

Zander, TO., Klippel, DM., Scherer, R.

In Proceedings of the 5th International Brain–Computer Interface Conference, pages: 252-255, (Editors: GR Müller-Putz and R Scherer and M Billinger and A Kreilinger and V Kaiser and C Neuper), Graz: Verlag der Technischen Universität, 2011 (inproceedings)

[BibTex]

[BibTex]


no image
Statistical estimation for optimization problems on graphs

Langovoy, M., Sra, S.

Empirical Inference Symposium, 2011 (poster)

[BibTex]


no image
Generalized Dictionary Learning for Symmetric Positive Definite Matrices with Application to Nearest Neighbor Retrieval

Sra, S., Cherian, A.

In Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2011, LNCS vol 6913, Part III, pages: 318-332, (Editors: D Gunopulos and T Hofmann and D Malerba and M Vazirgiannis), Springer, 22th European Conference on Machine Learning (ECML), 2011 (inproceedings)

DOI [BibTex]

DOI [BibTex]


no image
Restricted boltzmann machines as useful tool for detecting oscillatory eeg components

Balderas, D., Zander, TO., Bachl, F., Neuper, C., Scherer, R.

In Proceedings of the 5th International Brain–Computer Interface Conference, pages: 68-71, (Editors: GR Müller-Putz and R Scherer and M Billinger and A Kkreilinger and V Kaiser and C Neuper), Graz: Verlag der Technischen Universität, 2011 (inproceedings)

[BibTex]

[BibTex]


no image
Hierarchical Multitask Structured Output Learning for Large-scale Sequence Segmentation

Görnitz, N., Widmer, C., Zeller, G., Kahles, A., Sonnenburg, S., Rätsch, G.

In Advances in Neural Information Processing Systems 24, pages: 2690-2698, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and FCN Pereira and KQ Weinberger), Curran Associates, Inc., Red Hook, NY, USA, Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Phase transition in the family of p-resistances

Alamgir, M., von Luxburg, U.

In Advances in Neural Information Processing Systems 24, pages: 379-387, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and F Pereira and KQ Weinberger), Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

Abstract
We study the family of p-resistances on graphs for p ≥ 1. This family generalizes the standard resistance distance. We prove that for any fixed graph, for p=1, the p-resistance coincides with the shortest path distance, for p=2 it coincides with the standard resistance distance, and for p → ∞ it converges to the inverse of the minimal s-t-cut in the graph. Secondly, we consider the special case of random geometric graphs (such as k-nearest neighbor graphs) when the number n of vertices in the graph tends to infinity. We prove that an interesting phase-transition takes place. There exist two critical thresholds p^* and p^** such that if p < p^*, then the p-resistance depends on meaningful global properties of the graph, whereas if p > p^**, it only depends on trivial local quantities and does not convey any useful information. We can explicitly compute the critical values: p^* = 1 + 1/(d-1) and p^** = 1 + 1/(d-2) where d is the dimension of the underlying space (we believe that the fact that there is a small gap between p^* and p^** is an artifact of our proofs. We also relate our findings to Laplacian regularization and suggest to use q-Laplacians as regularizers, where q satisfies 1/p^* + 1/q = 1.

PDF Web [BibTex]

PDF Web [BibTex]


no image
On Fast Approximate Submodular Minimization

Jegelka, S., Lin, H., Bilmes, J.

In Advances in Neural Information Processing Systems 24, pages: 460-468, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and F Pereira and KQ Weinberger), Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

Abstract
We are motivated by an application to extract a representative subset of machine learning training data and by the poor empirical performance we observe of the popular minimum norm algorithm. In fact, for our application, minimum norm can have a running time of about O(n7) (O(n5) oracle calls). We therefore propose a fast approximate method to minimize arbitrary submodular functions. For a large sub-class of submodular functions, the algorithm is exact. Other submodular functions are iteratively approximated by tight submodular upper bounds, and then repeatedly optimized. We show theoretical properties, and empirical results suggest significant speedups over minimum norm while retaining higher accuracies.

PDF Web [BibTex]

PDF Web [BibTex]


no image
PAC-Bayesian Analysis of Contextual Bandits

Seldin, Y., Auer, P., Laviolette, F., Shawe-Taylor, J., Ortner, R.

In Advances in Neural Information Processing Systems 24, pages: 1683-1691, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and F Pereira and KQ Weinberger), Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

Abstract
We derive an instantaneous (per-round) data-dependent regret bound for stochastic multiarmed bandits with side information (also known as contextual bandits). The scaling of our regret bound with the number of states (contexts) $N$ goes as $\sqrt{N I_{\rho_t}(S;A)}$, where $I_{\rho_t}(S;A)$ is the mutual information between states and actions (the side information) used by the algorithm at round $t$. If the algorithm uses all the side information, the regret bound scales as $\sqrt{N \ln K}$, where $K$ is the number of actions (arms). However, if the side information $I_{\rho_t}(S;A)$ is not fully used, the regret bound is significantly tighter. In the extreme case, when $I_{\rho_t}(S;A) = 0$, the dependence on the number of states reduces from linear to logarithmic. Our analysis allows to provide the algorithm large amount of side information, let the algorithm to decide which side information is relevant for the task, and penalize the algorithm only for the side information that it is using de facto. We also present an algorithm for multiarmed bandits with side information with computational complexity that is a linear in the number of actions.

PDF PDF Web [BibTex]

PDF PDF Web [BibTex]


no image
Fast projections onto L1,q-norm balls for grouped feature selection

Sra, S.

In Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2011, LNCS vol 6913, Part III, pages: 305-317, (Editors: D Gunopulos and T Hofmann and D Malerba and M Vazirgiannis), Springer, 22th European Conference on Machine Learning (ECML), 2011 (inproceedings)

DOI [BibTex]

DOI [BibTex]


no image
Kernel Belief Propagation

Song, L., Gretton, A., Bickson, D., Low, Y., Guestrin, C.

In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, Vol. 15, pages: 707-715, (Editors: G Gordon and D Dunson and M Dudík), JMLR, AISTATS, 2011 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
On Causal Discovery with Cyclic Additive Noise Models

Mooij, J., Janzing, D., Schölkopf, B., Heskes, T.

In Advances in Neural Information Processing Systems 24, pages: 639-647, (Editors: J Shawe-Taylor and RS Zemel and PL Bartlett and FCN Pereira and KQ Weinberger), Curran Associates, Inc., Red Hook, NY, USA, Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

Abstract
We study a particular class of cyclic causal models, where each variable is a (possibly nonlinear) function of its parents and additive noise. We prove that the causal graph of such models is generically identifiable in the bivariate, Gaussian-noise case. We also propose a method to learn such models from observational data. In the acyclic case, the method reduces to ordinary regression, but in the more challenging cyclic case, an additional term arises in the loss function, which makes it a special case of nonlinear independent component analysis. We illustrate the proposed method on synthetic data.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Additive Gaussian Processes

Duvenaud, D., Nickisch, H., Rasmussen, C.

In Advances in Neural Information Processing Systems 24, pages: 226-234, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and F Pereira and KQ Weinberger), Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

Abstract
We introduce a Gaussian process model of functions which are additive. An additive function is one which decomposes into a sum of low-dimensional functions, each depending on only a subset of the input variables. Additive GPs generalize both Generalized Additive Models, and the standard GP models which use squared-exponential kernels. Hyperparameter learning in this model can be seen as Bayesian Hierarchical Kernel Learning (HKL). We introduce an expressive but tractable parameterization of the kernel function, which allows efficient evaluation of all input interaction terms, whose number is exponential in the input dimension. The additional structure discoverable by this model results in increased interpretability, as well as state-of-the-art predictive power in regression tasks.

PDF Web [BibTex]

PDF Web [BibTex]


no image
k-NN Regression Adapts to Local Intrinsic Dimension

Kpotufe, S.

In Advances in Neural Information Processing Systems 24, pages: 729-737, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and F Pereira and KQ Weinberger), Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

Abstract
Many nonparametric regressors were recently shown to converge at rates that depend only on the intrinsic dimension of data. These regressors thus escape the curse of dimension when high-dimensional data has low intrinsic dimension (e.g. a manifold). We show that k-NN regression is also adaptive to intrinsic dimension. In particular our rates are local to a query x and depend only on the way masses of balls centered at x vary with radius. Furthermore, we show a simple way to choose k = k(x) locally at any x so as to nearly achieve the minimax rate at x in terms of the unknown intrinsic dimension in the vicinity of x. We also establish that the minimax rate does not depend on a particular choice of metric space or distribution, but rather that this minimax rate holds for any metric space and doubling measure.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Fast Newton-type Methods for Total-Variation with Applications

Barbero, A., Sra, S.

In Proceedings of the 28th International Conference on Machine Learning, ICML 2011, pages: 313-320, (Editors: L Getoor and T Scheffer), Omnipress, 28th International Conference on Machine Learning (ICML), 2011 (inproceedings)

[BibTex]

[BibTex]


no image
Parallel Gibbs Sampling: From Colored Fields to Thin Junction Trees

Gonzalez, J., Low, Y., Gretton, A., Guestrin, C.

In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, Vol. 15, pages: 324-332, (Editors: G Gordon and D Dunson and M Dudík), JMLR, AISTATS, 2011 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Transfer Learning with Copulas

Lopez-Paz, D., Hernandez-Lobato, J.

Neural Information Processing Systems (NIPS), 2011 (poster)

PDF [BibTex]

PDF [BibTex]


no image
Access to Unlabeled Data can Speed up Prediction Time

Urner, R., Shalev-Shwartz, S., Ben-David, S.

In Proceedings of the 28th International Conference on Machine Learning, pages: 641-648, ICML, 2011 (inproceedings)

link (url) [BibTex]

link (url) [BibTex]


Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance
Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance

Gehler, P., Rother, C., Kiefel, M., Zhang, L., Schölkopf, B.

In Advances in Neural Information Processing Systems 24, pages: 765-773, (Editors: Shawe-Taylor, John and Zemel, Richard S. and Bartlett, Peter L. and Pereira, Fernando C. N. and Weinberger, Kilian Q.), Curran Associates, Inc., Red Hook, NY, USA, Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

Abstract
We address the challenging task of decoupling material properties from lighting properties given a single image. In the last two decades virtually all works have concentrated on exploiting edge information to address this problem. We take a different route by introducing a new prior on reflectance, that models reflectance values as being drawn from a sparse set of basis colors. This results in a Random Field model with global, latent variables (basis colors) and pixel-accurate output reflectance values. We show that without edge information high-quality results can be achieved, that are on par with methods exploiting this source of information. Finally, we are able to improve on state-of-the-art results by integrating edge information into our model. We believe that our new approach is an excellent starting point for future developments in this field.

website + code pdf poster Project Page Project Page [BibTex]

website + code pdf poster Project Page Project Page [BibTex]

2005


no image
Spectral clustering and transductive inference for graph data

Zhou, D.

NIPS Workshop on Kernel Methods and Structured Domains, December 2005 (talk)

PDF Web [BibTex]

2005

PDF Web [BibTex]


no image
Kernel ICA for Large Scale Problems

Jegelka, S., Gretton, A., Achlioptas, D.

In pages: -, NIPS Workshop on Large Scale Kernel Machines, December 2005 (inproceedings)

Web [BibTex]

Web [BibTex]


no image
Infinite dimensional exponential families by reproducing kernel Hilbert spaces

Fukumizu, K.

In IGAIA 2005, pages: 324-333, 2nd International Symposium on Information Geometry and its Applications, December 2005 (inproceedings)

Abstract
The purpose of this paper is to propose a method of constructing exponential families of Hilbert manifold, on which estimation theory can be built. Although there have been works on infinite dimensional exponential families of Banach manifolds (Pistone and Sempi, 1995; Gibilisco and Pistone, 1998; Pistone and Rogantin, 1999), they are not appropriate to discuss statistical estimation with finite number of samples; the likelihood function with finite samples is not continuous on the manifold. In this paper we use a reproducing kernel Hilbert space as a functional space for constructing an exponential manifold. A reproducing kernel Hilbert space is dened as a Hilbert space of functions such that evaluation of a function at an arbitrary point is a continuous functional on the Hilbert space. Since we can discuss the value of a function with this space, it is very natural to use a manifold associated with a reproducing kernel Hilbert space as a basis of estimation theory. We focus on the maximum likelihood estimation (MLE) with the exponential manifold of a reproducing kernel Hilbert space. As in many non-parametric estimation methods, straightforward extension of MLE to an infinite dimensional exponential manifold suffers the problem of ill-posedness caused by the fact that the estimator should be chosen from the infinite dimensional space with only finite number of constraints given by the data. To solve this problem, a pseudo-maximum likelihood method is proposed by restricting the infinite dimensional manifold to a series of finite dimensional submanifolds, which enlarge as the number of samples increases. Some asymptotic results in the limit of infinite samples are shown, including the consistency of the pseudo-MLE.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Some thoughts about Gaussian Processes

Chapelle, O.

NIPS Workshop on Open Problems in Gaussian Processes for Machine Learning, December 2005 (talk)

PDF Web [BibTex]

PDF Web [BibTex]


no image
Shortest-path kernels on graphs

Borgwardt, KM., Kriegel, H-P.

In pages: 74-81, IEEE Computer Society, Los Alamitos, CA, USA, Fifth International Conference on Data Mining (ICDM), November 2005 (inproceedings)

Abstract
Data mining algorithms are facing the challenge to deal with an increasing number of complex objects. For graph data, a whole toolbox of data mining algorithms becomes available by defining a kernel function on instances of graphs. Graph kernels based on walks, subtrees and cycles in graphs have been proposed so far. As a general problem, these kernels are either computationally expensive or limited in their expressiveness. We try to overcome this problem by defining expressive graph kernels which are based on paths. As the computation of all paths and longest paths in a graph is NP-hard, we propose graph kernels based on shortest paths. These kernels are computable in polynomial time, retain expressivity and are still positive definite. In experiments on classification of graph models of proteins, our shortest-path kernels show significantly higher classification accuracy than walk-based kernels.

Web DOI [BibTex]

Web DOI [BibTex]


no image
Kernel methods for dependence testing in LFP-MUA

Gretton, A., Belitski, A., Murayama, Y., Schölkopf, B., Logothetis, N.

35(689.17), 35th Annual Meeting of the Society for Neuroscience (Neuroscience), November 2005 (poster)

Abstract
A fundamental problem in neuroscience is determining whether or not particular neural signals are dependent. The correlation is the most straightforward basis for such tests, but considerable work also focuses on the mutual information (MI), which is capable of revealing dependence of higher orders that the correlation cannot detect. That said, there are other measures of dependence that share with the MI an ability to detect dependence of any order, but which can be easier to compute in practice. We focus in particular on tests based on the functional covariance, which derive from work originally accomplished in 1959 by Renyi. Conceptually, our dependence tests work by computing the covariance between (infinite dimensional) vectors of nonlinear mappings of the observations being tested, and then determining whether this covariance is zero - we call this measure the constrained covariance (COCO). When these vectors are members of universal reproducing kernel Hilbert spaces, we can prove this covariance to be zero only when the variables being tested are independent. The greatest advantage of these tests, compared with the mutual information, is their simplicity – when comparing two signals, we need only take the largest eigenvalue (or the trace) of a product of two matrices of nonlinearities, where these matrices are generally much smaller than the number of observations (and are very simple to construct). We compare the mutual information, the COCO, and the correlation in the context of finding changes in dependence between the LFP and MUA signals in the primary visual cortex of the anaesthetized macaque, during the presentation of dynamic natural stimuli. We demonstrate that the MI and COCO reveal dependence which is not detected by the correlation alone (which we prove by artificially removing all correlation between the signals, and then testing their dependence with COCO and the MI); and that COCO and the MI give results consistent with each other on our data.

Web [BibTex]

Web [BibTex]


no image
Training Support Vector Machines with Multiple Equality Constraints

Kienzle, W., Schölkopf, B.

In Proceedings of the 16th European Conference on Machine Learning, Lecture Notes in Computer Science, Vol. 3720, pages: 182-193, (Editors: JG Carbonell and J Siekmann), Springer, Berlin, Germany, ECML, November 2005 (inproceedings)

Abstract
In this paper we present a primal-dual decomposition algorithm for support vector machine training. As with existing methods that use very small working sets (such as Sequential Minimal Optimization (SMO), Successive Over-Relaxation (SOR) or the Kernel Adatron (KA)), our method scales well, is straightforward to implement, and does not require an external QP solver. Unlike SMO, SOR and KA, the method is applicable to a large number of SVM formulations regardless of the number of equality constraints involved. The effectiveness of our algorithm is demonstrated on a more difficult SVM variant in this respect, namely semi-parametric support vector regression.

PDF DOI [BibTex]

PDF DOI [BibTex]