Header logo is ei


2011


no image
Client–Server Multitask Learning From Distributed Datasets

Dinuzzo, F., Pillonetto, G., De Nicolao, G.

IEEE Transactions on Neural Networks, 22(2):290-303, February 2011 (article)

Abstract
A client-server architecture to simultaneously solve multiple learning tasks from distributed datasets is described. In such architecture, each client corresponds to an individual learning task and the associated dataset of examples. The goal of the architecture is to perform information fusion from multiple datasets while preserving privacy of individual data. The role of the server is to collect data in real time from the clients and codify the information in a common database. Such information can be used by all the clients to solve their individual learning task, so that each client can exploit the information content of all the datasets without actually having access to private data of others. The proposed algorithmic framework, based on regularization and kernel methods, uses a suitable class of “mixed effect” kernels. The methodology is illustrated through a simulated recommendation system, as well as an experiment involving pharmacological data coming from a multicentric clinical trial.

DOI [BibTex]

2011

DOI [BibTex]


no image
Extraction of functional information from ongoing brain electrical activity: Extraction en temps-réel d’informations fonctionnelles à partir de l’activité électrique cérébrale

Besserve, M., Martinerie, J.

IRBM, 32(1):27-34, February 2011 (article)

Abstract
The modern analysis of multivariate electrical brain signals requires advanced statistical tools to automatically extract and quantify their information content. These tools include machine learning techniques and information theory. They are currently used both in basic neuroscience and challenging applications such as brain computer interfaces. We review here how these methods have been used at the Laboratoire d’Électroencéphalographie et de Neurophysiologie Appliquée (LENA) to develop a general tool for the real time analysis of functional brain signals. We then give some perspectives on how these tools can help understanding the biological mechanisms of information processing.

PDF DOI [BibTex]


no image
Learning Visual Representations for Perception-Action Systems

Piater, J., Jodogne, S., Detry, R., Kraft, D., Krüger, N., Kroemer, O., Peters, J.

International Journal of Robotics Research, 30(3):294-307, February 2011 (article)

Abstract
We discuss vision as a sensory modality for systems that interact flexibly with uncontrolled environments. Instead of trying to build a generic vision system that produces task-independent representations, we argue in favor of task-specific, learnable representations. This concept is illustrated by two examples of our own work. First, our RLVC algorithm performs reinforcement learning directly on the visual input space. To make this very large space manageable, RLVC interleaves the reinforcement learner with a supervised classification algorithm that seeks to split perceptual states so as to reduce perceptual aliasing. This results in an adaptive discretization of the perceptual space based on the presence or absence of visual features. Its extension, RLJC, additionally handles continuous action spaces. In contrast to the minimalistic visual representations produced by RLVC and RLJC, our second method learns structural object models for robust object detection and pose estimation by probabilistic inference. To these models, the method associates grasp experiences autonomously learned by trial and error. These experiences form a non-parametric representation of grasp success likelihoods over gripper poses, which we call a grasp density. Thus, object detection in a novel scene simultaneously produces suitable grasping options.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Multi-way set enumeration in weight tensors

Georgii, E., Tsuda, K., Schölkopf, B.

Machine Learning, 82(2):123-155, February 2011 (article)

Abstract
The analysis of n-ary relations receives attention in many different fields, for instance biology, web mining, and social studies. In the basic setting, there are n sets of instances, and each observation associates n instances, one from each set. A common approach to explore these n-way data is the search for n-set patterns, the n-way equivalent of itemsets. More precisely, an n-set pattern consists of specific subsets of the n instance sets such that all possible associations between the corresponding instances are observed in the data. In contrast, traditional itemset mining approaches consider only two-way data, namely items versus transactions. The n-set patterns provide a higher-level view of the data, revealing associative relationships between groups of instances. Here, we generalize this approach in two respects. First, we tolerate missing observations to a certain degree, that means we are also interested in n-sets where most (although not all) of the possible associations have been recorded in the data. Second, we take association weights into account. In fact, we propose a method to enumerate all n-sets that satisfy a minimum threshold with respect to the average association weight. Technically, we solve the enumeration task using a reverse search strategy, which allows for effective pruning of the search space. In addition, our algorithm provides a ranking of the solutions and can consider further constraints. We show experimental results on artificial and real-world datasets from different domains.

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
A graphical model framework for decoding in the visual ERP-based BCI speller

Martens, S., Mooij, J., Hill, N., Farquhar, J., Schölkopf, B.

Neural Computation, 23(1):160-182, January 2011 (article)

Abstract
We present a graphical model framework for decoding in the visual ERP-based speller system. The proposed framework allows researchers to build generative models from which the decoding rules are obtained in a straightforward manner. We suggest two models for generating brain signals conditioned on the stimulus events. Both models incorporate letter frequency information but assume different dependencies between brain signals and stimulus events. For both models, we derive decoding rules and perform a discriminative training. We show on real visual speller data how decoding performance improves by incorporating letter frequency information and using a more realistic graphical model for the dependencies between the brain signals and the stimulus events. Furthermore, we discuss how the standard approach to decoding can be seen as a special case of the graphical model framework. The letter also gives more insight into the discriminative approach for decoding in the visual speller system.

Web DOI [BibTex]

Web DOI [BibTex]


no image
Robust Control of Teleoperation Systems Interacting with Viscoelastic Soft Tissues

Cho, JH., Son, HI., Bhattacharjee, T., Lee, DG., Lee, DY.

IEEE Transactions on Control Systems Technology, January 2011 (article) In revision

[BibTex]

[BibTex]


no image
Towards Motor Skill Learning for Robotics

Peters, J., Mülling, K., Kober, J., Nguyen-Tuong, D., Kroemer, O.

In Robotics Research, pages: 469-482, (Editors: Pradalier, C. , R. Siegwart, G. Hirzinger), Springer, Berlin, Germany, 14th International Symposium on Robotics Research (ISRR), January 2011 (inproceedings)

Abstract
Learning robots that can acquire new motor skills and refine existing one has been a long standing vision of robotics, artificial intelligence, and the cognitive sciences. Early steps towards this goal in the 1980s made clear that reasoning and human insights will not suffice. Instead, new hope has been offered by the rise of modern machine learning approaches. However, to date, it becomes increasingly clear that off-the-shelf machine learning approaches will not suffice for motor skill learning as these methods often do not scale into the high-dimensional domains of manipulator and humanoid robotics nor do they fulfill the real-time requirement of our domain. As an alternative, we propose to break the generic skill learning problem into parts that we can understand well from a robotics point of view. After designing appropriate learning approaches for these basic components, these will serve as the ingredients of a general approach to motor skill learning. In this paper, we discuss our recent and current progress in this direction. For doing so, we present our work on learning to control, on learning elementary movements as well as our steps towards learning of complex tasks. We show several evaluations both using real robots as well as physically realistic simulations.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Effect of Control Parameters and Haptic Cues on Human Perception for Remote Operations

Son, HI., Bhattacharjee, T., Jung, H., Lee, DY.

Experimental Brain Research, January 2011 (article) Submitted

[BibTex]

[BibTex]


no image
Learning Visual Representations for Interactive Systems

Piater, J., Jodogne, S., Detry, R., Kraft, D., Krüger, N., Kroemer, O., Peters, J.

In Robotics Research, pages: 399-416, (Editors: Pradalier, C. , R. Siegwart, G. Hirzinger), Springer, Berlin, Germany, 14th International Symposium on Robotics Research (ISRR), January 2011 (inproceedings)

Abstract
We describe two quite different methods for associating action parameters to visual percepts. Our RLVC algorithm performs reinforcement learning directly on the visual input space. To make this very large space manageable, RLVC interleaves the reinforcement learner with a supervised classification algorithm that seeks to split perceptual states so as to reduce perceptual aliasing. This results in an adaptive discretization of the perceptual space based on the presence or absence of visual features. Its extension RLJC also handles continuous action spaces. In contrast to the minimalistic visual representations produced by RLVC and RLJC, our second method learns structural object models for robust object detection and pose estimation by probabilistic inference. To these models, the method associates grasp experiences autonomously learned by trial and error. These experiences form a non-parametric representation of grasp success likelihoods over gripper poses, which we call a gra sp d ensi ty. Thus, object detection in a novel scene simultaneously produces suitable grasping options.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Joint Genetic Analysis of Gene Expression Data with Inferred Cellular Phenotypes

Parts, L., Stegle, O., Winn, J., Durbin, R.

PLoS Genetics, 7(1):1-10, January 2011 (article)

Abstract
Even within a defined cell type, the expression level of a gene differs in individual samples. The effects of genotype, measured factors such as environmental conditions, and their interactions have been explored in recent studies. Methods have also been developed to identify unmeasured intermediate factors that coherently influence transcript levels of multiple genes. Here, we show how to bring these two approaches together and analyse genetic effects in the context of inferred determinants of gene expression. We use a sparse factor analysis model to infer hidden factors, which we treat as intermediate cellular phenotypes that in turn affect gene expression in a yeast dataset. We find that the inferred phenotypes are associated with locus genotypes and environmental conditions and can explain genetic associations to genes in trans. For the first time, we consider and find interactions between genotype and intermediate phenotypes inferred from gene expression levels, complementing and extending established results.

Web DOI [BibTex]

Web DOI [BibTex]


no image
Multiple testing, uncertainty and realistic pictures

Langovoy, M., Wittich, O.

(2011-004), EURANDOM, Technische Universiteit Eindhoven, January 2011 (techreport)

Abstract
We study statistical detection of grayscale objects in noisy images. The object of interest is of unknown shape and has an unknown intensity, that can be varying over the object and can be negative. No boundary shape constraints are imposed on the object, only a weak bulk condition for the object's interior is required. We propose an algorithm that can be used to detect grayscale objects of unknown shapes in the presence of nonparametric noise of unknown level. Our algorithm is based on a nonparametric multiple testing procedure. We establish the limit of applicability of our method via an explicit, closed-form, non-asymptotic and nonparametric consistency bound. This bound is valid for a wide class of nonparametric noise distributions. We achieve this by proving an uncertainty principle for percolation on nite lattices.

PDF [BibTex]

PDF [BibTex]


no image
A Non-Parametric Approach to Dynamic Programming

Kroemer, O., Peters, J.

In Advances in Neural Information Processing Systems 24, pages: 1719-1727, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and F Pereira and KQ Weinberger), Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

Abstract
In this paper, we consider the problem of policy evaluation for continuousstate systems. We present a non-parametric approach to policy evaluation, which uses kernel density estimation to represent the system. The true form of the value function for this model can be determined, and can be computed using Galerkin’s method. Furthermore, we also present a unified view of several well-known policy evaluation methods. In particular, we show that the same Galerkin method can be used to derive Least-Squares Temporal Difference learning, Kernelized Temporal Difference learning, and a discrete-state Dynamic Programming solution, as well as our proposed method. In a numerical evaluation of these algorithms, the proposed approach performed better than the other methods.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Reinforcement Learning with Bounded Information Loss

Peters, J., Peters, J., Mülling, K., Altun, Y.

AIP Conference Proceedings, 1305(1):365-372, 2011 (article)

Abstract
Policy search is a successful approach to reinforcement learning. However, policy improvements often result in the loss of information. Hence, it has been marred by premature convergence and implausible solutions. As first suggested in the context of covariant or natural policy gradients, many of these problems may be addressed by constraining the information loss. In this paper, we continue this path of reasoning and suggest two reinforcement learning methods, i.e., a model‐based and a model free algorithm that bound the loss in relative entropy while maximizing their return. The resulting methods differ significantly from previous policy gradient approaches and yields an exact update step. It works well on typical reinforcement learning benchmark problems as well as novel evaluations in robotics. We also show a Bayesian bound motivation of this new approach [8].

Web DOI [BibTex]


no image
Transfer Learning with Copulas

Lopez-Paz, D., Hernandez-Lobato, J.

In pages: 2, NIPS, Workshop on Copulas in Machine Learning, 2011 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Denoising sparse noise via online dictionary learning

Cherian, A., Sra, S., Papanikolopoulos, N.

In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, pages: 2060 -2063, IEEE, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2011 (inproceedings)

DOI [BibTex]

DOI [BibTex]


no image
PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Deisenroth, MP., Rasmussen, CE.

In Proceedings of the 28th International Conference on Machine Learning, ICML 2011, pages: 465-472, (Editors: L Getoor and T Scheffer), Omnipress, 2011 (inproceedings)

Abstract
In this paper, we introduce PILCO, a practical, data-efficient model-based policy search method. PILCO reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way. By learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning, PILCO can cope with very little data and facilitates learning from scratch in only a few trials. Policy evaluation is performed in closed form using state-of-the-art approximate inference. Furthermore, policy gradients are computed analytically for policy improvement. We report unprecedented learning efficiency on challenging and high-dimensional control tasks.

Web [BibTex]

Web [BibTex]


no image
Kernel Bayes’ Rule

Fukumizu, K., Song, L., Gretton, A.

In Advances in Neural Information Processing Systems 24, pages: 1737-1745, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and F Pereira and KQ Weinberger), Curran Associates, Inc., Red Hook, NY, USA, Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Optimal Reinforcement Learning for Gaussian Systems

Hennig, P.

In Advances in Neural Information Processing Systems 24, pages: 325-333, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and F Pereira and KQ Weinberger), Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

Abstract
The exploration-exploitation trade-off is among the central challenges of reinforcement learning. The optimal Bayesian solution is intractable in general. This paper studies to what extent analytic statements about optimal learning are possible if all beliefs are Gaussian processes. A first order approximation of learning of both loss and dynamics, for nonlinear, time-varying systems in continuous time and space, subject to a relatively weak restriction on the dynamics, is described by an infinite-dimensional partial differential equation. An approximate finitedimensional projection gives an impression for how this result may be helpful.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Efficient inference in matrix-variate Gaussian models with iid observation noise

Stegle, O., Lippert, C., Mooij, J., Lawrence, N., Borgwardt, K.

In Advances in Neural Information Processing Systems 24, pages: 630-638, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and F Pereira and KQ Weinberger), Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

Abstract
Inference in matrix-variate Gaussian models has major applications for multioutput prediction and joint learning of row and column covariances from matrixvariate data. Here, we discuss an approach for efficient inference in such models that explicitly account for iid observation noise. Computational tractability can be retained by exploiting the Kronecker product between row and column covariance matrices. Using this framework, we show how to generalize the Graphical Lasso in order to learn a sparse inverse covariance between features while accounting for a low-rank confounding covariance between samples. We show practical utility on applications to biology, where we model covariances with more than 100,000 dimensions. We find greater accuracy in recovering biological network structures and are able to better reconstruct the confounders.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Expectation Propagation for the Estimation of Conditional Bivariate Copulas

Hernandez-Lobato, J., Lopez-Paz, D., Gharhamani, Z.

In pages: 2, NIPS, Workshop on Copulas in Machine Learning, 2011 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Efficient Similarity Search for Covariance Matrices via the Jensen-Bregman LogDet Divergence

Cherian, A., Sra, S., Banerjee, A., Papanikolopoulos, N.

In IEEE International Conference on Computer Vision, ICCV 2011, pages: 2399-2406, (Editors: DN Metaxas and L Quan and A Sanfeliu and LJ Van Gool), IEEE, 13th International Conference on Computer Vision (ICCV), 2011 (inproceedings)

DOI [BibTex]

DOI [BibTex]


no image
Introducing the detection of auditory error responses based on BCI technology for passive interaction

Zander, TO., Klippel, DM., Scherer, R.

In Proceedings of the 5th International Brain–Computer Interface Conference, pages: 252-255, (Editors: GR Müller-Putz and R Scherer and M Billinger and A Kreilinger and V Kaiser and C Neuper), Graz: Verlag der Technischen Universität, 2011 (inproceedings)

[BibTex]

[BibTex]


no image
Generalized Dictionary Learning for Symmetric Positive Definite Matrices with Application to Nearest Neighbor Retrieval

Sra, S., Cherian, A.

In Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2011, LNCS vol 6913, Part III, pages: 318-332, (Editors: D Gunopulos and T Hofmann and D Malerba and M Vazirgiannis), Springer, 22th European Conference on Machine Learning (ECML), 2011 (inproceedings)

DOI [BibTex]

DOI [BibTex]


no image
Nonconvex proximal splitting: batch and incremental algorithms

Sra, S.

(2), Max Planck Institute for Intelligent Systems, Tübingen, Germany, 2011 (techreport)

Abstract
Within the unmanageably large class of nonconvex optimization, we consider the rich subclass of nonsmooth problems having composite objectives (this includes the extensively studied convex, composite objective problems as a special case). For this subclass, we introduce a powerful, new framework that permits asymptotically non-vanishing perturbations. In particular, we develop perturbation-based batch and incremental (online like) nonconvex proximal splitting algorithms. To our knowledge, this is the rst time that such perturbation-based nonconvex splitting algorithms are being proposed and analyzed. While the main contribution of the paper is the theoretical framework, we complement our results by presenting some empirical results on matrix factorization.

PDF [BibTex]

PDF [BibTex]


no image
Restricted boltzmann machines as useful tool for detecting oscillatory eeg components

Balderas, D., Zander, TO., Bachl, F., Neuper, C., Scherer, R.

In Proceedings of the 5th International Brain–Computer Interface Conference, pages: 68-71, (Editors: GR Müller-Putz and R Scherer and M Billinger and A Kkreilinger and V Kaiser and C Neuper), Graz: Verlag der Technischen Universität, 2011 (inproceedings)

[BibTex]

[BibTex]


no image
Hierarchical Multitask Structured Output Learning for Large-scale Sequence Segmentation

Görnitz, N., Widmer, C., Zeller, G., Kahles, A., Sonnenburg, S., Rätsch, G.

In Advances in Neural Information Processing Systems 24, pages: 2690-2698, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and FCN Pereira and KQ Weinberger), Curran Associates, Inc., Red Hook, NY, USA, Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Phase transition in the family of p-resistances

Alamgir, M., von Luxburg, U.

In Advances in Neural Information Processing Systems 24, pages: 379-387, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and F Pereira and KQ Weinberger), Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

Abstract
We study the family of p-resistances on graphs for p ≥ 1. This family generalizes the standard resistance distance. We prove that for any fixed graph, for p=1, the p-resistance coincides with the shortest path distance, for p=2 it coincides with the standard resistance distance, and for p → ∞ it converges to the inverse of the minimal s-t-cut in the graph. Secondly, we consider the special case of random geometric graphs (such as k-nearest neighbor graphs) when the number n of vertices in the graph tends to infinity. We prove that an interesting phase-transition takes place. There exist two critical thresholds p^* and p^** such that if p < p^*, then the p-resistance depends on meaningful global properties of the graph, whereas if p > p^**, it only depends on trivial local quantities and does not convey any useful information. We can explicitly compute the critical values: p^* = 1 + 1/(d-1) and p^** = 1 + 1/(d-2) where d is the dimension of the underlying space (we believe that the fact that there is a small gap between p^* and p^** is an artifact of our proofs. We also relate our findings to Laplacian regularization and suggest to use q-Laplacians as regularizers, where q satisfies 1/p^* + 1/q = 1.

PDF Web [BibTex]

PDF Web [BibTex]


no image
On Fast Approximate Submodular Minimization

Jegelka, S., Lin, H., Bilmes, J.

In Advances in Neural Information Processing Systems 24, pages: 460-468, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and F Pereira and KQ Weinberger), Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

Abstract
We are motivated by an application to extract a representative subset of machine learning training data and by the poor empirical performance we observe of the popular minimum norm algorithm. In fact, for our application, minimum norm can have a running time of about O(n7) (O(n5) oracle calls). We therefore propose a fast approximate method to minimize arbitrary submodular functions. For a large sub-class of submodular functions, the algorithm is exact. Other submodular functions are iteratively approximated by tight submodular upper bounds, and then repeatedly optimized. We show theoretical properties, and empirical results suggest significant speedups over minimum norm while retaining higher accuracies.

PDF Web [BibTex]

PDF Web [BibTex]


no image
PAC-Bayesian Analysis of Contextual Bandits

Seldin, Y., Auer, P., Laviolette, F., Shawe-Taylor, J., Ortner, R.

In Advances in Neural Information Processing Systems 24, pages: 1683-1691, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and F Pereira and KQ Weinberger), Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

Abstract
We derive an instantaneous (per-round) data-dependent regret bound for stochastic multiarmed bandits with side information (also known as contextual bandits). The scaling of our regret bound with the number of states (contexts) $N$ goes as $\sqrt{N I_{\rho_t}(S;A)}$, where $I_{\rho_t}(S;A)$ is the mutual information between states and actions (the side information) used by the algorithm at round $t$. If the algorithm uses all the side information, the regret bound scales as $\sqrt{N \ln K}$, where $K$ is the number of actions (arms). However, if the side information $I_{\rho_t}(S;A)$ is not fully used, the regret bound is significantly tighter. In the extreme case, when $I_{\rho_t}(S;A) = 0$, the dependence on the number of states reduces from linear to logarithmic. Our analysis allows to provide the algorithm large amount of side information, let the algorithm to decide which side information is relevant for the task, and penalize the algorithm only for the side information that it is using de facto. We also present an algorithm for multiarmed bandits with side information with computational complexity that is a linear in the number of actions.

PDF PDF Web [BibTex]

PDF PDF Web [BibTex]


no image
Fast projections onto L1,q-norm balls for grouped feature selection

Sra, S.

In Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2011, LNCS vol 6913, Part III, pages: 305-317, (Editors: D Gunopulos and T Hofmann and D Malerba and M Vazirgiannis), Springer, 22th European Conference on Machine Learning (ECML), 2011 (inproceedings)

DOI [BibTex]

DOI [BibTex]


no image
Kernel Belief Propagation

Song, L., Gretton, A., Bickson, D., Low, Y., Guestrin, C.

In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, Vol. 15, pages: 707-715, (Editors: G Gordon and D Dunson and M Dudík), JMLR, AISTATS, 2011 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
On Causal Discovery with Cyclic Additive Noise Models

Mooij, J., Janzing, D., Schölkopf, B., Heskes, T.

In Advances in Neural Information Processing Systems 24, pages: 639-647, (Editors: J Shawe-Taylor and RS Zemel and PL Bartlett and FCN Pereira and KQ Weinberger), Curran Associates, Inc., Red Hook, NY, USA, Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

Abstract
We study a particular class of cyclic causal models, where each variable is a (possibly nonlinear) function of its parents and additive noise. We prove that the causal graph of such models is generically identifiable in the bivariate, Gaussian-noise case. We also propose a method to learn such models from observational data. In the acyclic case, the method reduces to ordinary regression, but in the more challenging cyclic case, an additional term arises in the loss function, which makes it a special case of nonlinear independent component analysis. We illustrate the proposed method on synthetic data.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Additive Gaussian Processes

Duvenaud, D., Nickisch, H., Rasmussen, C.

In Advances in Neural Information Processing Systems 24, pages: 226-234, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and F Pereira and KQ Weinberger), Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

Abstract
We introduce a Gaussian process model of functions which are additive. An additive function is one which decomposes into a sum of low-dimensional functions, each depending on only a subset of the input variables. Additive GPs generalize both Generalized Additive Models, and the standard GP models which use squared-exponential kernels. Hyperparameter learning in this model can be seen as Bayesian Hierarchical Kernel Learning (HKL). We introduce an expressive but tractable parameterization of the kernel function, which allows efficient evaluation of all input interaction terms, whose number is exponential in the input dimension. The additional structure discoverable by this model results in increased interpretability, as well as state-of-the-art predictive power in regression tasks.

PDF Web [BibTex]

PDF Web [BibTex]


no image
k-NN Regression Adapts to Local Intrinsic Dimension

Kpotufe, S.

In Advances in Neural Information Processing Systems 24, pages: 729-737, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and F Pereira and KQ Weinberger), Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

Abstract
Many nonparametric regressors were recently shown to converge at rates that depend only on the intrinsic dimension of data. These regressors thus escape the curse of dimension when high-dimensional data has low intrinsic dimension (e.g. a manifold). We show that k-NN regression is also adaptive to intrinsic dimension. In particular our rates are local to a query x and depend only on the way masses of balls centered at x vary with radius. Furthermore, we show a simple way to choose k = k(x) locally at any x so as to nearly achieve the minimax rate at x in terms of the unknown intrinsic dimension in the vicinity of x. We also establish that the minimax rate does not depend on a particular choice of metric space or distribution, but rather that this minimax rate holds for any metric space and doubling measure.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Fast Newton-type Methods for Total-Variation with Applications

Barbero, A., Sra, S.

In Proceedings of the 28th International Conference on Machine Learning, ICML 2011, pages: 313-320, (Editors: L Getoor and T Scheffer), Omnipress, 28th International Conference on Machine Learning (ICML), 2011 (inproceedings)

[BibTex]

[BibTex]


no image
Parallel Gibbs Sampling: From Colored Fields to Thin Junction Trees

Gonzalez, J., Low, Y., Gretton, A., Guestrin, C.

In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, Vol. 15, pages: 324-332, (Editors: G Gordon and D Dunson and M Dudík), JMLR, AISTATS, 2011 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Access to Unlabeled Data can Speed up Prediction Time

Urner, R., Shalev-Shwartz, S., Ben-David, S.

In Proceedings of the 28th International Conference on Machine Learning, pages: 641-648, ICML, 2011 (inproceedings)

link (url) [BibTex]

link (url) [BibTex]


Thumb xl problem
Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance

Gehler, P., Rother, C., Kiefel, M., Zhang, L., Schölkopf, B.

In Advances in Neural Information Processing Systems 24, pages: 765-773, (Editors: Shawe-Taylor, John and Zemel, Richard S. and Bartlett, Peter L. and Pereira, Fernando C. N. and Weinberger, Kilian Q.), Curran Associates, Inc., Red Hook, NY, USA, Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

Abstract
We address the challenging task of decoupling material properties from lighting properties given a single image. In the last two decades virtually all works have concentrated on exploiting edge information to address this problem. We take a different route by introducing a new prior on reflectance, that models reflectance values as being drawn from a sparse set of basis colors. This results in a Random Field model with global, latent variables (basis colors) and pixel-accurate output reflectance values. We show that without edge information high-quality results can be achieved, that are on par with methods exploiting this source of information. Finally, we are able to improve on state-of-the-art results by integrating edge information into our model. We believe that our new approach is an excellent starting point for future developments in this field.

website + code pdf poster Project Page Project Page [BibTex]

website + code pdf poster Project Page Project Page [BibTex]

2002


no image
Optimized Support Vector Machines for Nonstationary Signal Classification

Davy, M., Gretton, A., Doucet, A., Rayner, P.

IEEE Signal Processing Letters, 9(12):442-445, December 2002 (article)

Abstract
This letter describes an efficient method to perform nonstationary signal classification. A support vector machine (SVM) algorithm is introduced and its parameters optimised in a principled way. Simulations demonstrate that our low complexity method outperforms state-of-the-art nonstationary signal classification techniques.

PostScript Web DOI [BibTex]

2002

PostScript Web DOI [BibTex]


no image
Gender Classification of Human Faces

Graf, A., Wichmann, F.

In Biologically Motivated Computer Vision, pages: 1-18, (Editors: Bülthoff, H. H., S.W. Lee, T. A. Poggio and C. Wallraven), Springer, Berlin, Germany, Second International Workshop on Biologically Motivated Computer Vision (BMCV), November 2002 (inproceedings)

Abstract
This paper addresses the issue of combining pre-processing methods—dimensionality reduction using Principal Component Analysis (PCA) and Locally Linear Embedding (LLE)—with Support Vector Machine (SVM) classification for a behaviorally important task in humans: gender classification. A processed version of the MPI head database is used as stimulus set. First, summary statistics of the head database are studied. Subsequently the optimal parameters for LLE and the SVM are sought heuristically. These values are then used to compare the original face database with its processed counterpart and to assess the behavior of a SVM with respect to changes in illumination and perspective of the face images. Overall, PCA was superior in classification performance and allowed linear separability.

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Insect-Inspired Estimation of Self-Motion

Franz, MO., Chahl, JS.

In Biologically Motivated Computer Vision, (2525):171-180, LNCS, (Editors: Bülthoff, H.H. , S.W. Lee, T.A. Poggio, C. Wallraven), Springer, Berlin, Germany, Second International Workshop on Biologically Motivated Computer Vision (BMCV), November 2002 (inproceedings)

Abstract
The tangential neurons in the fly brain are sensitive to the typical optic flow patterns generated during self-motion. In this study, we examine whether a simplified linear model of these neurons can be used to estimate self-motion from the optic flow. We present a theory for the construction of an optimal linear estimator incorporating prior knowledge about the environment. The optimal estimator is tested on a gantry carrying an omnidirectional vision sensor. The experiments show that the proposed approach leads to accurate and robust estimates of rotation rates, whereas translation estimates turn out to be less reliable.

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
A New Discriminative Kernel from Probabilistic Models

Tsuda, K., Kawanabe, M., Rätsch, G., Sonnenburg, S., Müller, K.

Neural Computation, 14(10):2397-2414, October 2002 (article)

PDF [BibTex]

PDF [BibTex]


no image
Combining sensory Information to Improve Visualization

Ernst, M., Banks, M., Wichmann, F., Maloney, L., Bülthoff, H.

In Proceedings of the Conference on Visualization ‘02 (VIS ‘02), pages: 571-574, (Editors: Moorhead, R. , M. Joy), IEEE, Piscataway, NJ, USA, IEEE Conference on Visualization (VIS '02), October 2002 (inproceedings)

Abstract
Seemingly effortlessly the human brain reconstructs the three-dimensional environment surrounding us from the light pattern striking the eyes. This seems to be true across almost all viewing and lighting conditions. One important factor for this apparent easiness is the redundancy of information provided by the sensory organs. For example, perspective distortions, shading, motion parallax, or the disparity between the two eyes' images are all, at least partly, redundant signals which provide us with information about the three-dimensional layout of the visual scene. Our brain uses all these different sensory signals and combines the available information into a coherent percept. In displays visualizing data, however, the information is often highly reduced and abstracted, which may lead to an altered perception and therefore a misinterpretation of the visualized data. In this panel we will discuss mechanisms involved in the combination of sensory information and their implications for simulations using computer displays, as well as problems resulting from current display technology such as cathode-ray tubes.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Sampling Techniques for Kernel Methods

Achlioptas, D., McSherry, F., Schölkopf, B.

In Advances in neural information processing systems 14 , pages: 335-342, (Editors: TG Dietterich and S Becker and Z Ghahramani), MIT Press, Cambridge, MA, USA, 15th Annual Neural Information Processing Systems Conference (NIPS), September 2002 (inproceedings)

Abstract
We propose randomized techniques for speeding up Kernel Principal Component Analysis on three levels: sampling and quantization of the Gram matrix in training, randomized rounding in evaluating the kernel expansions, and random projections in evaluating the kernel itself. In all three cases, we give sharp bounds on the accuracy of the obtained approximations.

PDF Web [BibTex]

PDF Web [BibTex]


no image
The Infinite Hidden Markov Model

Beal, MJ., Ghahramani, Z., Rasmussen, CE.

In Advances in Neural Information Processing Systems 14, pages: 577-584, (Editors: Dietterich, T.G. , S. Becker, Z. Ghahramani), MIT Press, Cambridge, MA, USA, Fifteenth Annual Neural Information Processing Systems Conference (NIPS), September 2002 (inproceedings)

Abstract
We show that it is possible to extend hidden Markov models to have a countably infinite number of hidden states. By using the theory of Dirichlet processes we can implicitly integrate out the infinitely many transition parameters, leaving only three hyperparameters which can be learned from data. These three hyperparameters define a hierarchical Dirichlet process capable of capturing a rich set of transition dynamics. The three hyperparameters control the time scale of the dynamics, the sparsity of the underlying state-transition matrix, and the expected number of distinct hidden states in a finite sequence. In this framework it is also natural to allow the alphabet of emitted symbols to be infinite - consider, for example, symbols being possible words appearing in English text.

PDF Web [BibTex]

PDF Web [BibTex]


no image
A new discriminative kernel from probabilistic models

Tsuda, K., Kawanabe, M., Rätsch, G., Sonnenburg, S., Müller, K.

In Advances in Neural Information Processing Systems 14, pages: 977-984, (Editors: Dietterich, T.G. , S. Becker, Z. Ghahramani), MIT Press, Cambridge, MA, USA, Fifteenth Annual Neural Information Processing Systems Conference (NIPS), September 2002 (inproceedings)

Abstract
Recently, Jaakkola and Haussler proposed a method for constructing kernel functions from probabilistic models. Their so called \Fisher kernel" has been combined with discriminative classi ers such as SVM and applied successfully in e.g. DNA and protein analysis. Whereas the Fisher kernel (FK) is calculated from the marginal log-likelihood, we propose the TOP kernel derived from Tangent vectors Of Posterior log-odds. Furthermore, we develop a theoretical framework on feature extractors from probabilistic models and use it for analyzing the TOP kernel. In experiments our new discriminative TOP kernel compares favorably to the Fisher kernel.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Incorporating Invariances in Non-Linear Support Vector Machines

Chapelle, O., Schölkopf, B.

In Advances in Neural Information Processing Systems 14, pages: 609-616, (Editors: TG Dietterich and S Becker and Z Ghahramani), MIT Press, Cambridge, MA, USA, 15th Annual Neural Information Processing Systems Conference (NIPS), September 2002 (inproceedings)

Abstract
The choice of an SVM kernel corresponds to the choice of a representation of the data in a feature space and, to improve performance, it should therefore incorporate prior knowledge such as known transformation invariances. We propose a technique which extends earlier work and aims at incorporating invariances in nonlinear kernels. We show on a digit recognition task that the proposed approach is superior to the Virtual Support Vector method, which previously had been the method of choice.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Functional Genomics of Osteoarthritis

Aigner, T., Bartnik, E., Zien, A., Zimmer, R.

Pharmacogenomics, 3(5):635-650, September 2002 (article)

Web [BibTex]

Web [BibTex]


no image
Kernel feature spaces and nonlinear blind source separation

Harmeling, S., Ziehe, A., Kawanabe, M., Müller, K.

In Advances in Neural Information Processing Systems 14, pages: 761-768, (Editors: Dietterich, T. G., S. Becker, Z. Ghahramani), MIT Press, Cambridge, MA, USA, Fifteenth Annual Neural Information Processing Systems Conference (NIPS), September 2002 (inproceedings)

Abstract
In kernel based learning the data is mapped to a kernel feature space of a dimension that corresponds to the number of training data points. In practice, however, the data forms a smaller submanifold in feature space, a fact that has been used e.g. by reduced set techniques for SVMs. We propose a new mathematical construction that permits to adapt to the intrinsic dimension and to find an orthonormal basis of this submanifold. In doing so, computations get much simpler and more important our theoretical framework allows to derive elegant kernelized blind source separation (BSS) algorithms for arbitrary invertible nonlinear mixings. Experiments demonstrate the good performance and high computational efficiency of our kTDSEP algorithm for the problem of nonlinear BSS.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Constructing Boosting algorithms from SVMs: an application to one-class classification.

Rätsch, G., Mika, S., Schölkopf, B., Müller, K.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(9):1184-1199, September 2002 (article)

Abstract
We show via an equivalence of mathematical programs that a support vector (SV) algorithm can be translated into an equivalent boosting-like algorithm and vice versa. We exemplify this translation procedure for a new algorithm—one-class leveraging—starting from the one-class support vector machine (1-SVM). This is a first step toward unsupervised learning in a boosting framework. Building on so-called barrier methods known from the theory of constrained optimization, it returns a function, written as a convex combination of base hypotheses, that characterizes whether a given test point is likely to have been generated from the distribution underlying the training data. Simulations on one-class classification problems demonstrate the usefulness of our approach.

DOI [BibTex]

DOI [BibTex]