Header logo is ei


2011


no image
PAC-Bayesian Analysis of Martingales and Multiarmed Bandits

Seldin, Y., Laviolette, F., Shawe-Taylor, J., Peters, J., Auer, P.

Max Planck Institute for Biological Cybernetics, Tübingen, Germany, May 2011 (techreport)

Abstract
We present two alternative ways to apply PAC-Bayesian analysis to sequences of dependent random variables. The first is based on a new lemma that enables to bound expectations of convex functions of certain dependent random variables by expectations of the same functions of independent Bernoulli random variables. This lemma provides an alternative tool to Hoeffding-Azuma inequality to bound concentration of martingale values. Our second approach is based on integration of Hoeffding-Azuma inequality with PAC-Bayesian analysis. We also introduce a way to apply PAC-Bayesian analysis in situation of limited feedback. We combine the new tools to derive PAC-Bayesian generalization and regret bounds for the multiarmed bandit problem. Although our regret bound is not yet as tight as state-of-the-art regret bounds based on other well-established techniques, our results significantly expand the range of potential applications of PAC-Bayesian analysis and introduce a new analysis tool to reinforcement learning and many other fields, where martingales and limited feedback are encountered.

PDF Web [BibTex]

2011

PDF Web [BibTex]


no image
Non-stationary Correction of Optical Aberrations

Schuler, C., Hirsch, M., Harmeling, S., Schölkopf, B.

(1), Max Planck Institute for Intelligent Systems, Tübingen, Germany, May 2011 (techreport)

Abstract
Taking a sharp photo at several megapixel resolution traditionally relies on high grade lenses. In this paper, we present an approach to alleviate image degradations caused by imperfect optics. We rely on a calibration step to encode the optical aberrations in a space-variant point spread function and obtain a corrected image by non-stationary deconvolution. By including the Bayer array in our image formation model, we can perform demosaicing as part of the deconvolution.

PDF [BibTex]

PDF [BibTex]


no image
Multiple Kernel Learning: A Unifying Probabilistic Viewpoint

Nickisch, H., Seeger, M.

Max Planck Institute for Biological Cybernetics, March 2011 (techreport)

Abstract
We present a probabilistic viewpoint to multiple kernel learning unifying well-known regularised risk approaches and recent advances in approximate Bayesian inference relaxations. The framework proposes a general objective function suitable for regression, robust regression and classification that is lower bound of the marginal likelihood and contains many regularised risk approaches as special cases. Furthermore, we derive an efficient and provably convergent optimisation algorithm.

Web [BibTex]

Web [BibTex]


no image
Multiple testing, uncertainty and realistic pictures

Langovoy, M., Wittich, O.

(2011-004), EURANDOM, Technische Universiteit Eindhoven, January 2011 (techreport)

Abstract
We study statistical detection of grayscale objects in noisy images. The object of interest is of unknown shape and has an unknown intensity, that can be varying over the object and can be negative. No boundary shape constraints are imposed on the object, only a weak bulk condition for the object's interior is required. We propose an algorithm that can be used to detect grayscale objects of unknown shapes in the presence of nonparametric noise of unknown level. Our algorithm is based on a nonparametric multiple testing procedure. We establish the limit of applicability of our method via an explicit, closed-form, non-asymptotic and nonparametric consistency bound. This bound is valid for a wide class of nonparametric noise distributions. We achieve this by proving an uncertainty principle for percolation on nite lattices.

PDF [BibTex]

PDF [BibTex]


no image
Nonconvex proximal splitting: batch and incremental algorithms

Sra, S.

(2), Max Planck Institute for Intelligent Systems, Tübingen, Germany, 2011 (techreport)

Abstract
Within the unmanageably large class of nonconvex optimization, we consider the rich subclass of nonsmooth problems having composite objectives (this includes the extensively studied convex, composite objective problems as a special case). For this subclass, we introduce a powerful, new framework that permits asymptotically non-vanishing perturbations. In particular, we develop perturbation-based batch and incremental (online like) nonconvex proximal splitting algorithms. To our knowledge, this is the rst time that such perturbation-based nonconvex splitting algorithms are being proposed and analyzed. While the main contribution of the paper is the theoretical framework, we complement our results by presenting some empirical results on matrix factorization.

PDF [BibTex]

PDF [BibTex]

2010


no image
Computationally efficient algorithms for statistical image processing: Implementation in R

Langovoy, M., Wittich, O.

(2010-053), EURANDOM, Technische Universiteit Eindhoven, December 2010 (techreport)

Abstract
In the series of our earlier papers on the subject, we proposed a novel statistical hy- pothesis testing method for detection of objects in noisy images. The method uses results from percolation theory and random graph theory. We developed algorithms that allowed to detect objects of unknown shapes in the presence of nonparametric noise of unknown level and of un- known distribution. No boundary shape constraints were imposed on the objects, only a weak bulk condition for the object's interior was required. Our algorithms have linear complexity and exponential accuracy. In the present paper, we describe an implementation of our nonparametric hypothesis testing method. We provide a program that can be used for statistical experiments in image processing. This program is written in the statistical programming language R.

PDF [BibTex]

2010

PDF [BibTex]


no image
Fast Convergent Algorithms for Expectation Propagation Approximate Bayesian Inference

Seeger, M., Nickisch, H.

Max Planck Institute for Biological Cybernetics, December 2010 (techreport)

Abstract
We propose a novel algorithm to solve the expectation propagation relaxation of Bayesian inference for continuous-variable graphical models. In contrast to most previous algorithms, our method is provably convergent. By marrying convergent EP ideas from (Opper&Winther 05) with covariance decoupling techniques (Wipf&Nagarajan 08, Nickisch&Seeger 09), it runs at least an order of magnitude faster than the most commonly used EP solver.

Web [BibTex]

Web [BibTex]


no image
A PAC-Bayesian Analysis of Graph Clustering and Pairwise Clustering

Seldin, Y.

Max Planck Institute for Biological Cybernetics, Tübingen, Germany, September 2010 (techreport)

Abstract
We formulate weighted graph clustering as a prediction problem: given a subset of edge weights we analyze the ability of graph clustering to predict the remaining edge weights. This formulation enables practical and theoretical comparison of different approaches to graph clustering as well as comparison of graph clustering with other possible ways to model the graph. We adapt the PAC-Bayesian analysis of co-clustering (Seldin and Tishby, 2008; Seldin, 2009) to derive a PAC-Bayesian generalization bound for graph clustering. The bound shows that graph clustering should optimize a trade-off between empirical data fit and the mutual information that clusters preserve on the graph nodes. A similar trade-off derived from information-theoretic considerations was already shown to produce state-of-the-art results in practice (Slonim et al., 2005; Yom-Tov and Slonim, 2009). This paper supports the empirical evidence by providing a better theoretical foundation, suggesting formal generalization guarantees, and offering a more accurate way to deal with finite sample issues. We derive a bound minimization algorithm and show that it provides good results in real-life problems and that the derived PAC-Bayesian bound is reasonably tight.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Sparse nonnegative matrix approximation: new formulations and algorithms

Tandon, R., Sra, S.

(193), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, September 2010 (techreport)

Abstract
We introduce several new formulations for sparse nonnegative matrix approximation. Subsequently, we solve these formulations by developing generic algorithms. Further, to help selecting a particular sparse formulation, we briefly discuss the interpretation of each formulation. Finally, preliminary experiments are presented to illustrate the behavior of our formulations and algorithms.

PDF [BibTex]

PDF [BibTex]


no image
Robust nonparametric detection of objects in noisy images

Langovoy, M., Wittich, O.

(2010-049), EURANDOM, Technische Universiteit Eindhoven, September 2010 (techreport)

Abstract
We propose a novel statistical hypothesis testing method for detection of objects in noisy images. The method uses results from percolation theory and random graph theory. We present an algorithm that allows to detect objects of unknown shapes in the presence of nonparametric noise of unknown level and of unknown distribution. No boundary shape constraints are imposed on the object, only a weak bulk condition for the object's interior is required. The algorithm has linear complexity and exponential accuracy and is appropriate for real-time systems. In this paper, we develop further the mathematical formalism of our method and explore im- portant connections to the mathematical theory of percolation and statistical physics. We prove results on consistency and algorithmic complexity of our testing procedure. In addition, we address not only an asymptotic behavior of the method, but also a nite sample performance of our test.

PDF [BibTex]

PDF [BibTex]


no image
Large Scale Variational Inference and Experimental Design for Sparse Generalized Linear Models

Seeger, M., Nickisch, H.

Max Planck Institute for Biological Cybernetics, August 2010 (techreport)

Abstract
Many problems of low-level computer vision and image processing, such as denoising, deconvolution, tomographic reconstruction or super-resolution, can be addressed by maximizing the posterior distribution of a sparse linear model (SLM). We show how higher-order Bayesian decision-making problems, such as optimizing image acquisition in magnetic resonance scanners, can be addressed by querying the SLM posterior covariance, unrelated to the density's mode. We propose a scalable algorithmic framework, with which SLM posteriors over full, high-resolution images can be approximated for the first time, solving a variational optimization problem which is convex iff posterior mode finding is convex. These methods successfully drive the optimization of sampling trajectories for real-world magnetic resonance imaging through Bayesian experimental design, which has not been attempted before. Our methodology provides new insight into similarities and differences between sparse reconstruction and approximate Bayesian inference, and has important implications for compressive sensing of real-world images.

Web [BibTex]


no image
Cooperative Cuts for Image Segmentation

Jegelka, S., Bilmes, J.

(UWEETR-1020-0003), University of Washington, Washington DC, USA, August 2010 (techreport)

Abstract
We propose a novel framework for graph-based cooperative regularization that uses submodular costs on graph edges. We introduce an efficient iterative algorithm to solve the resulting hard discrete optimization problem, and show that it has a guaranteed approximation factor. The edge-submodular formulation is amenable to the same extensions as standard graph cut approaches, and applicable to a range of problems. We apply this method to the image segmentation problem. Specifically, Here, we apply it to introduce a discount for homogeneous boundaries in binary image segmentation on very difficult images, precisely, long thin objects and color and grayscale images with a shading gradient. The experiments show that significant portions of previously truncated objects are now preserved.

Web [BibTex]

Web [BibTex]


no image
Fast algorithms for total-variationbased optimization

Barbero, A., Sra, S.

(194), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, August 2010 (techreport)

Abstract
We derive a number of methods to solve efficiently simple optimization problems subject to a totalvariation (TV) regularization, under different norms of the TV operator and both for the case of 1-dimensional and 2-dimensional data. In spite of the non-smooth, non-separable nature of the TV terms considered, we show that a dual formulation with strong structure can be derived. Taking advantage of this structure we develop adaptions of existing algorithms from the optimization literature, resulting in efficient methods for the problem at hand. Experimental results show that for 1-dimensional data the proposed methods achieve convergence within good accuracy levels in practically linear time, both for L1 and L2 norms. For the more challenging 2-dimensional case a performance of order O(N2 log2 N) for N x N inputs is achieved when using the L2 norm. A final section suggests possible extensions and lines of further work.

PDF [BibTex]

PDF [BibTex]


no image
Gaussian Mixture Modeling with Gaussian Process Latent Variable Models

Nickisch, H., Rasmussen, C.

Max Planck Institute for Biological Cybernetics, June 2010 (techreport)

Abstract
Density modeling is notoriously difficult for high dimensional data. One approach to the problem is to search for a lower dimensional manifold which captures the main characteristics of the data. Recently, the Gaussian Process Latent Variable Model (GPLVM) has successfully been used to find low dimensional manifolds in a variety of complex data. The GPLVM consists of a set of points in a low dimensional latent space, and a stochastic map to the observed space. We show how it can be interpreted as a density model in the observed space. However, the GPLVM is not trained as a density model and therefore yields bad density estimates. We propose a new training strategy and obtain improved generalisation performance and better density estimates in comparative evaluations on several benchmark data sets.

Web [BibTex]

Web [BibTex]


no image
Generalized Proximity and Projection with Norms and Mixed-norms

Sra, S.

(192), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, May 2010 (techreport)

Abstract
We discuss generalized proximity operators (GPO) and their associated generalized projection problems. On inputs of size n, we show how to efficiently apply GPOs and generalized projections for separable norms and distance-like functions to accuracy e in O(n log(1/e)) time. We also derive projection algorithms that run theoretically in O(n log n log(1/e)) time but can for suitable parameter ranges empirically outperform the O(n log(1/e)) projection method. The proximity and projection tasks are either separable, and solved directly, or are reduced to a single root-finding step. We highlight that as a byproduct, our analysis also yields an O(n log(1/e)) (weakly linear-time) procedure for Euclidean projections onto the l1;1-norm ball; previously only an O(n log n) method was known. We provide empirical evaluation to illustrate the performance of our methods, noting that for the l1;1-norm projection, our implementation is more than two orders of magnitude faster than the previously known method.

PDF [BibTex]

PDF [BibTex]


no image
Cooperative Cuts: Graph Cuts with Submodular Edge Weights

Jegelka, S., Bilmes, J.

(189), Max Planck Institute for Biological Cybernetics, Tuebingen, Germany, March 2010 (techreport)

Abstract
We introduce a problem we call Cooperative cut, where the goal is to find a minimum-cost graph cut but where a submodular function is used to define the cost of a subsets of edges. That means, the cost of an edge that is added to the current cut set C depends on the edges in C. This generalization of the cost in the standard min-cut problem to a submodular cost function immediately makes the problem harder. Not only do we prove NP hardness even for nonnegative submodular costs, but also show a lower bound of Omega(|V|^(1/3)) on the approximation factor for the problem. On the positive side, we propose and compare four approximation algorithms with an overall approximation factor of min { |V|/2, |C*|, O( sqrt(|E|) log |V|), |P_max|}, where C* is the optimal solution, and P_max is the longest s, t path across the cut between given s, t. We also introduce additional heuristics for the problem which have attractive properties from the perspective of practical applications and implementations in that existing fast min-cut libraries may be used as subroutines. Both our approximation algorithms, and our heuristics, appear to do well in practice.

PDF [BibTex]

PDF [BibTex]


no image
Information-theoretic inference of common ancestors

Steudel, B., Ay, N.

Computing Research Repository (CoRR), abs/1010.5720, pages: 18, 2010 (techreport)

Web [BibTex]

Web [BibTex]

2006


no image
A New Projected Quasi-Newton Approach for the Nonnegative Least Squares Problem

Kim, D., Sra, S., Dhillon, I.

(TR-06-54), Univ. of Texas, Austin, December 2006 (techreport)

PDF [BibTex]

2006

PDF [BibTex]


no image
Probabilistic inference for solving (PO)MDPs

Toussaint, M., Harmeling, S., Storkey, A.

(934), School of Informatics, University of Edinburgh, December 2006 (techreport)

PDF [BibTex]

PDF [BibTex]


no image
Minimal Logical Constraint Covering Sets

Sinz, F., Schölkopf, B.

(155), Max Planck Institute for Biological Cybernetics, Tübingen, December 2006 (techreport)

Abstract
We propose a general framework for computing minimal set covers under class of certain logical constraints. The underlying idea is to transform the problem into a mathematical programm under linear constraints. In this sense it can be seen as a natural extension of the vector quantization algorithm proposed by Tipping and Schoelkopf. We show which class of logical constraints can be cast and relaxed into linear constraints and give an algorithm for the transformation.

PDF [BibTex]

PDF [BibTex]


no image
New Methods for the P300 Visual Speller

Biessmann, F.

(1), (Editors: Hill, J. ), Max-Planck Institute for Biological Cybernetics, Tübingen, Germany, November 2006 (techreport)

PDF [BibTex]

PDF [BibTex]


no image
Geometric Analysis of Hilbert Schmidt Independence criterion based ICA contrast function

Shen, H., Jegelka, S., Gretton, A.

(PA006080), National ICT Australia, Canberra, Australia, October 2006 (techreport)

Web [BibTex]

Web [BibTex]


no image
A tutorial on spectral clustering

von Luxburg, U.

(149), Max Planck Institute for Biological Cybernetics, Tübingen, August 2006 (techreport)

Abstract
In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm. Nevertheless, on the first glance spectral clustering looks a bit mysterious, and it is not obvious to see why it works at all and what it really does. This article is a tutorial introduction to spectral clustering. We describe different graph Laplacians and their basic properties, present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches. Advantages and disadvantages of the different spectral clustering algorithms are discussed.

PDF [BibTex]

PDF [BibTex]


no image
Towards the Inference of Graphs on Ordered Vertexes

Zien, A., Raetsch, G., Ong, C.

(150), Max Planck Institute for Biological Cybernetics, Tübingen, August 2006 (techreport)

Abstract
We propose novel methods for machine learning of structured output spaces. Specifically, we consider outputs which are graphs with vertices that have a natural order. We consider the usual adjacency matrix representation of graphs, as well as two other representations for such a graph: (a) decomposing the graph into a set of paths, (b) converting the graph into a single sequence of nodes with labeled edges. For each of the three representations, we propose an encoding and decoding scheme. We also propose an evaluation measure for comparing two graphs.

PDF [BibTex]

PDF [BibTex]


no image
Nonnegative Matrix Approximation: Algorithms and Applications

Sra, S., Dhillon, I.

Univ. of Texas, Austin, May 2006 (techreport)

[BibTex]

[BibTex]


no image
An Automated Combination of Sequence Motif Kernels for Predicting Protein Subcellular Localization

Zien, A., Ong, C.

(146), Max Planck Institute for Biological Cybernetics, Tübingen, April 2006 (techreport)

Abstract
Protein subcellular localization is a crucial ingredient to many important inferences about cellular processes, including prediction of protein function and protein interactions. While many predictive computational tools have been proposed, they tend to have complicated architectures and require many design decisions from the developer. We propose an elegant and fully automated approach to building a prediction system for protein subcellular localization. We propose a new class of protein sequence kernels which considers all motifs including motifs with gaps. This class of kernels allows the inclusion of pairwise amino acid distances into their computation. We further propose a multiclass support vector machine method which directly solves protein subcellular localization without resorting to the common approach of splitting the problem into several binary classification problems. To automatically search over families of possible amino acid motifs, we generalize our method to optimize over multiple kernels at the same time. We compare our automated approach to four other predictors on three different datasets.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Training a Support Vector Machine in the Primal

Chapelle, O.

(147), Max Planck Institute for Biological Cybernetics, Tübingen, April 2006, The version in the "Large Scale Kernel Machines" book is more up to date. (techreport)

Abstract
Most literature on Support Vector Machines (SVMs) concentrate on the dual optimization problem. In this paper, we would like to point out that the primal problem can also be solved efficiently, both for linear and non-linear SVMs, and there is no reason for ignoring it. Moreover, from the primal point of view, new families of algorithms for large scale SVM training can be investigated.

PDF [BibTex]

PDF [BibTex]


no image
Cross-Validation Optimization for Structured Hessian Kernel Methods

Seeger, M., Chapelle, O.

Max-Planck Institute for Biological Cybernetics, Tübingen, Germany, February 2006 (techreport)

Abstract
We address the problem of learning hyperparameters in kernel methods for which the Hessian of the objective is structured. We propose an approximation to the cross-validation log likelihood whose gradient can be computed analytically, solving the hyperparameter learning problem efficiently through nonlinear optimization. Crucially, our learning method is based entirely on matrix-vector multiplication primitives with the kernel matrices and their derivatives, allowing straightforward specialization to new kernels or to large datasets. When applied to the problem of multi-way classification, our method scales linearly in the number of classes and gives rise to state-of-the-art results on a remote imaging task.

PDF Web [BibTex]

PDF Web [BibTex]


Thumb xl screen shot 2012 06 06 at 11.31.38 am
Implicit Wiener Series, Part II: Regularised estimation

Gehler, P., Franz, M.

(148), Max Planck Institute, 2006 (techreport)

pdf [BibTex]

2002


no image
Kernel Dependency Estimation

Weston, J., Chapelle, O., Elisseeff, A., Schölkopf, B., Vapnik, V.

(98), Max Planck Institute for Biological Cybernetics, August 2002 (techreport)

Abstract
We consider the learning problem of finding a dependency between a general class of objects and another, possibly different, general class of objects. The objects can be for example: vectors, images, strings, trees or graphs. Such a task is made possible by employing similarity measures in both input and output spaces using kernel functions, thus embedding the objects into vector spaces. Output kernels also make it possible to encode prior information and/or invariances in the loss function in an elegant way. We experimentally validate our approach on several tasks: mapping strings to strings, pattern recognition, and reconstruction from partial images.

PDF [BibTex]

2002

PDF [BibTex]


no image
Global Geometry of SVM Classifiers

Zhou, D., Xiao, B., Zhou, H., Dai, R.

Max Planck Institute for Biological Cybernetics, Tübingen, Germany, June 2002 (techreport)

Abstract
We construct an geometry framework for any norm Support Vector Machine (SVM) classifiers. Within this framework, separating hyperplanes, dual descriptions and solutions of SVM classifiers are constructed by a purely geometric fashion. In contrast with the optimization theory used in SVM classifiers, we have no complicated computations any more. Each step in our theory is guided by elegant geometric intuitions.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Computationally Efficient Face Detection

Romdhani, S., Torr, P., Schölkopf, B., Blake, A.

(MSR-TR-2002-69), Microsoft Research, June 2002 (techreport)

Web [BibTex]

Web [BibTex]


no image
Kernel-based nonlinear blind source separation

Harmeling, S., Ziehe, A., Kawanabe, M., Müller, K.

EU-Project BLISS, January 2002 (techreport)

GZIP [BibTex]

GZIP [BibTex]


no image
A compression approach to support vector model selection

von Luxburg, U., Bousquet, O., Schölkopf, B.

(101), Max Planck Institute for Biological Cybernetics, 2002, see more detailed JMLR version (techreport)

Abstract
In this paper we investigate connections between statistical learning theory and data compression on the basis of support vector machine (SVM) model selection. Inspired by several generalization bounds we construct ``compression coefficients'' for SVMs, which measure the amount by which the training labels can be compressed by some classification hypothesis. The main idea is to relate the coding precision of this hypothesis to the width of the margin of the SVM. The compression coefficients connect well known quantities such as the radius-margin ratio R^2/rho^2, the eigenvalues of the kernel matrix and the number of support vectors. To test whether they are useful in practice we ran model selection experiments on several real world datasets. As a result we found that compression coefficients can fairly accurately predict the parameters for which the test error is minimized.

[BibTex]

[BibTex]


no image
Feature Selection and Transduction for Prediction of Molecular Bioactivity for Drug Design

Weston, J., Perez-Cruz, F., Bousquet, O., Chapelle, O., Elisseeff, A., Schölkopf, B.

Max Planck Institute for Biological Cybernetics / Biowulf Technologies, 2002 (techreport)

Web [BibTex]

Web [BibTex]


no image
Observations on the Nyström Method for Gaussian Process Prediction

Williams, C., Rasmussen, C., Schwaighofer, A., Tresp, V.

Max Planck Institute for Biological Cybernetics, Tübingen, Germany, 2002 (techreport)

Abstract
A number of methods for speeding up Gaussian Process (GP) prediction have been proposed, including the Nystr{\"o}m method of Williams and Seeger (2001). In this paper we focus on two issues (1) the relationship of the Nystr{\"o}m method to the Subset of Regressors method (Poggio and Girosi 1990; Luo and Wahba, 1997) and (2) understanding in what circumstances the Nystr{\"o}m approximation would be expected to provide a good approximation to exact GP regression.

PostScript [BibTex]

PostScript [BibTex]