Header logo is ei


2009


no image
Let the Kernel Figure it Out: Principled Learning of Pre-processing for Kernel Classifiers

Gehler, P., Nowozin, S.

In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages: 2836-2843, IEEE Service Center, Piscataway, NJ, USA, CVPR, June 2009 (inproceedings)

Abstract
Most modern computer vision systems for high-level tasks, such as image classification, object recognition and segmentation, are based on learning algorithms that are able to separate discriminative information from noise. In practice, however, the typical system consists of a long pipeline of pre-processing steps, such as extraction of different kinds of features, various kinds of normalizations, feature selection, and quantization into aggregated representations such as histograms. Along this pipeline, there are many parameters to set and choices to make, and their effect on the overall system performance is a-priori unclear. In this work, we shorten the pipeline in a principled way. We move pre-processing steps into the learning system by means of kernel parameters, letting the learning algorithm decide upon suitable parameter values. Learning to optimize the pre-processing choices becomes learning the kernel parameters. We realize this paradigm by extending the recent Multiple Kernel Learning formulation from the finite case of having a fixed number of kernels which can be combined to the general infinite case where each possible parameter setting induces an associated kernel. We evaluate the new paradigm extensively on image classification and object classification tasks. We show that it is possible to learn optimal discriminative codebooks and optimal spatial pyramid schemes, consistently outperforming all previous state-of-the-art approaches.

PDF Web DOI [BibTex]

2009

PDF Web DOI [BibTex]


no image
Identifying confounders using additive noise models

Janzing, D., Peters, J., Mooij, J., Schölkopf, B.

In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, pages: 249-257, (Editors: J Bilmes and AY Ng), AUAI Press, Corvallis, OR, USA, UAI, June 2009 (inproceedings)

Abstract
We propose a method for inferring the existence of a latent common cause ("confounder") of two observed random variables. The method assumes that the two effects of the confounder are (possibly nonlinear) functions of the confounder plus independent, additive noise. We discuss under which conditions the model is identifiable (up to an arbitrary reparameterization of the confounder) from the joint distribution of the effects. We state and prove a theoretical result that provides evidence for the conjecture that the model is generically identifiable under suitable technical conditions. In addition, we propose a practical method to estimate the confounder from a finite i.i.d. sample of the effects and illustrate that the method works well on both simulated and real-world data.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Regression by dependence minimization and its application to causal inference in additive noise models

Mooij, J., Janzing, D., Peters, J., Schölkopf, B.

In Proceedings of the 26th International Conference on Machine Learning, pages: 745-752, (Editors: A Danyluk and L Bottou and M Littman), ACM Press, New York, NY, USA, ICML, June 2009 (inproceedings)

Abstract
Motivated by causal inference problems, we propose a novel method for regression that minimizes the statistical dependence between regressors and residuals. The key advantage of this approach to regression is that it does not assume a particular distribution of the noise, i.e., it is non-parametric with respect to the noise distribution. We argue that the proposed regression method is well suited to the task of causal inference in additive noise models. A practical disadvantage is that the resulting optimization problem is generally non-convex and can be difficult to solve. Nevertheless, we report good results on one of the tasks of the NIPS 2008 Causality Challenge, where the goal is to distinguish causes from effects in pairs of statistically dependent variables. In addition, we propose an algorithm for efficiently inferring causal models from observational data for more than two variables. The required number of regressions and independence tests is quadratic in the number of variables, which is a significant improvement over the simple method that tests all possible DAGs.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Fitted Q-iteration by Advantage Weighted Regression

Neumann, G., Peters, J.

In Advances in neural information processing systems 21, pages: 1177-1184, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
Recently, fitted Q-iteration (FQI) based methods have become more popular due to their increased sample efficiency, a more stable learning process and the higher quality of the resulting policy. However, these methods remain hard to use for continuous action spaces which frequently occur in real-world tasks, e.g., in robotics and other technical applications. The greedy action selection commonly used for the policy improvement step is particularly problematic as it is expensive for continuous actions, can cause an unstable learning process, introduces an optimization bias and results in highly non-smooth policies unsuitable for real-world systems. In this paper, we show that by using a soft-greedy action selection the policy improvement step used in FQI can be simplified to an inexpensive advantage-weighted regression. With this result, we are able to derive a new, computationally efficient FQI algorithm which can even deal with high dimensional action spaces.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Global Connectivity Potentials for Random Field Models

Nowozin, S., Lampert, C.

In CVPR 2009, pages: 818-825, IEEE Service Center, Piscataway, NJ, USA, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2009 (inproceedings)

Abstract
Markov random field (MRF, CRF) models are popular in computer vision. However, in order to be computationally tractable they are limited to incorporate only local interactions and cannot model global properties, such as connectedness, which is a potentially useful high-level prior for object segmentation. In this work, we overcome this limitation by deriving a potential function that enforces the output labeling to be connected and that can naturally be used in the framework of recent MAP-MRF LP relaxations. Using techniques from polyhedral combinatorics, we show that a provably tight approximation to the MAP solution of the resulting MRF can still be found efficiently by solving a sequence of max-flow problems. The efficiency of the inference procedure also allows us to learn the parameters of a MRF with global connectivity potentials by means of a cutting plane algorithm. We experimentally evaluate our algorithm on both synthetic data and on the challenging segmentation task of the PASCAL VOC 2008 data set. We show that in both cases the addition of a connectedness prior significantly reduces the segmentation error.

PDF PDF Web DOI [BibTex]

PDF PDF Web DOI [BibTex]


no image
Bayesian Experimental Design of Magnetic Resonance Imaging Sequences

Seeger, M., Nickisch, H., Pohmann, R., Schölkopf, B.

In Advances in neural information processing systems 21, pages: 1441-1448, (Editors: D Koller and D Schuurmans and Y Bengio and L Bottou), Curran, Red Hook, NY, USA, 22nd Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
We show how improved sequences for magnetic resonance imaging can be found through automated optimization of Bayesian design scores. Combining recent advances in approximate Bayesian inference and natural image statistics with high-performance numerical computation, we propose the first scalable Bayesian experimental design framework for this problem of high relevance to clinical and brain research. Our solution requires approximate inference for dense, non-Gaussian models on a scale seldom addressed before. We propose a novel scalable variational inference algorithm, and show how powerful methods of numerical mathematics can be modified to compute primitives in our framework. Our approach is evaluated on a realistic setup with raw data from a 3T MR scanner.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Characteristic Kernels on Groups and Semigroups

Fukumizu, K., Sriperumbudur, B., Gretton, A., Schölkopf, B.

In Advances in neural information processing systems 21, pages: 473-480, (Editors: D Koller and D Schuurmans and Y Bengio and L Bottou), Curran, Red Hook, NY, USA, 22nd Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
Embeddings of random variables in reproducing kernel Hilbert spaces (RKHSs) may be used to conduct statistical inference based on higher order moments. For sufficiently rich (characteristic) RKHSs, each probability distribution has a unique embedding, allowing all statistical properties of the distribution to be taken into consideration. Necessary and sufficient conditions for an RKHS to be characteristic exist for Rn. In the present work, conditions are established for an RKHS to be characteristic on groups and semigroups. Illustrative examples are provided, including characteristic kernels on periodic domains, rotation matrices, and Rn+.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Policy Search for Motor Primitives in Robotics

Kober, J., Peters, J.

In Advances in neural information processing systems 21, pages: 849-856, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
Many motor skills in humanoid robotics can be learned using parametrized motor primitives as done in imitation learning. However, most interesting motor learning problems are high-dimensional reinforcement learning problems often beyond the reach of current methods. In this paper, we extend previous work on policy learning from the immediate reward case to episodic reinforcement learning. We show that this results into a general, common framework also connected to policy gradient methods and yielding a novel algorithm for policy learning by assuming a form of exploration that is particularly well-suited for dynamic motor primitives. The resulting algorithm is an EM-inspired algorithm applicable in complex motor learning tasks. We compare this algorithm to alternative parametrized policy search methods and show that it outperforms previous methods. We apply it in the context of motor learning and show that it can learn a complex Ball-in-a-Cup task using a real Barrett WAM robot arm.

PDF Web [BibTex]

PDF Web [BibTex]


no image
The graphlet spectrum

Kondor, R., Shervashidze, N., Borgwardt, K.

In Proceedings of the 26th International Conference on Machine Learning (ICML 2009), pages: 529-536, (Editors: Danyluk, A. , L. Bottou, M. Littman), ACM Press, New York, NY, USA, 26th International Conference on Machine Learning (ICML), June 2009 (inproceedings)

Abstract
Current graph kernels suffer from two limitations: graph kernels based on counting particular types of subgraphs ignore the relative position of these subgraphs to each other, while graph kernels based on algebraic methods are limited to graphs without node labels. In this paper we present the graphlet spectrum, a system of graph invariants derived by means of group representation theory that capture information about the number as well as the position of labeled subgraphs in a given graph. In our experimental evaluation the graphlet spectrum outperforms state-of-the-art graph kernels.

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Learning Similarity Measure for Multi-Modal 3D Image Registration

Lee, D., Hofmann, M., Steinke, F., Altun, Y., Cahill, N., Schölkopf, B.

In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages: 186-193, IEEE Service Center, Piscataway, NJ, USA, CVPR, June 2009 (inproceedings)

Abstract
Multi-modal image registration is a challenging problem in medical imaging. The goal is to align anatomically identical structures; however, their appearance in images acquired with different imaging devices, such as CT or MR, may be very different. Registration algorithms generally deform one image, the floating image, such that it matches with a second, the reference image, by maximizing some similarity score between the deformed and the reference image. Instead of using a universal, but a priori fixed similarity criterion such as mutual information, we propose learning a similarity measure in a discriminative manner such that the reference and correctly deformed floating images receive high similarity scores. To this end, we develop an algorithm derived from max-margin structured output learning, and employ the learned similarity measure within a standard rigid registration algorithm. Compared to other approaches, our method adapts to the specific registration problem at hand and exploits correlations between neighboring pixels in the reference and the floating image. Empirical evaluation on CT-MR/PET-MR rigid registration tasks demonstrates that our approach yields robust performance and outperforms the state of the art methods for multi-modal medical image registration.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Learning Complex Motions by Sequencing Simpler Motion Templates

Neumann, G., Maass, W., Peters, J.

In ICML 2009, pages: 753-760, (Editors: Danyluk, A. , L. Bottou, M. Littman), ACM Press, New York, NY, USA, 26th International Conference on Machine Learning, June 2009 (inproceedings)

Abstract
Abstraction of complex, longer motor tasks into simpler elemental movements enables humans and animals to exhibit motor skills which have not yet been matched by robots. Humans intuitively decompose complex motions into smaller, simpler segments. For example when describing simple movements like drawing a triangle with a pen, we can easily name the basic steps of this movement. Surprisingly, such abstractions have rarely been used in artificial motor skill learning algorithms. These algorithms typically choose a new action (such as a torque or a force) at a very fast time-scale. As a result, both policy and temporal credit assignment problem become unnecessarily complex - often beyond the reach of current machine learning methods. We introduce a new framework for temporal abstractions in reinforcement learning (RL), i.e. RL with motion templates. We present a new algorithm for this framework which can learn high-quality policies by making only few abstract decisions.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Solution Stability in Linear Programming Relaxations: Graph Partitioning and Unsupervised Learning

Nowozin, S., Jegelka, S.

In ICML 2009, pages: 769-776, (Editors: Danyluk, A. , L. Bottou, M. Littman), ACM Press, New York, NY, USA, 26th International Conference on Machine Learning, June 2009 (inproceedings)

Abstract
We propose a new method to quantify the solution stability of a large class of combinatorial optimization problems arising in machine learning. As practical example we apply the method to correlation clustering, clustering aggregation, modularity clustering, and relative performance significance clustering. Our method is extensively motivated by the idea of linear programming relaxations. We prove that when a relaxation is used to solve the original clustering problem, then the solution stability calculated by our method is conservative, that is, it never overestimates the solution stability of the true, unrelaxed problem. We also demonstrate how our method can be used to compute the entire path of optimal solutions as the optimization problem is increasingly perturbed. Experimentally, our method is shown to perform well on a number of benchmark problems.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Kernel Measures of Independence for Non-IID Data

Zhang, X., Song, L., Gretton, A., Smola, A.

In Advances in neural information processing systems 21, pages: 1937-1944, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
Many machine learning algorithms can be formulated in the framework of statistical independence such as the Hilbert Schmidt Independence Criterion. In this paper, we extend this criterion to deal with structured and interdependent observations. This is achieved by modeling the structures using undirected graphical models and comparing the Hilbert space embeddings of distributions. We apply this new criterion to independent component analysis and sequence clustering.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Predictive Representations for Policy Gradient in POMDPs

Boularias, A., Chaib-Draa, B.

In ICML 2009, pages: 65-72, (Editors: Danyluk, A. , L. Bottou, M. Littman), ACM Press, New York, NY, USA, 26th International Conference on Machine Learning, June 2009 (inproceedings)

Abstract
We consider the problem of estimating the policy gradient in Partially Observable Markov Decision Processes (POMDPs) with a special class of policies that are based on Predictive State Representations (PSRs). We compare PSR policies to Finite-State Controllers (FSCs), which are considered as a standard model for policy gradient methods in POMDPs. We present a general Actor- Critic algorithm for learning both FSCs and PSR policies. The critic part computes a value function that has as variables the parameters of the policy. These latter parameters are gradually updated to maximize the value function. We show that the value function is polynomial for both FSCs and PSR policies, with a potentially smaller degree in the case of PSR policies. Therefore, the value function of a PSR policy can have less local optima than the equivalent FSC, and consequently, the gradient algorithm is more likely to converge to a global optimal solution.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Passivity Analysis of Haptic Systems Interacting with Viscoelastic Virtual Environment

Son, HI., Bhattacharjee, T., Lee, DY.

In Proceedings of the 14th International Conference on Advanced Robotics (ICAR 2009), pages: 1-6, IEEE, Piscataway, NJ, USA, 14th International Conference on Advanced Robotics (ICAR), June 2009 (inproceedings)

Abstract
Passivity analysis of any haptic system requires the knowledge of the environment impedance, i.e., parameters of the employed environment model. There have been a few models proposed to describe the viscoelastic behavior of soft tissues, including the popular Maxwell and Voigt models. This paper analyzes passivity of haptic systems interacting with virtual viscoelastic soft tissues. The Kelvin model is employed to represent better the behavior of the soft tissues. This passivity analysis reveals a new criterion for design and control of the haptic interface. Simulation results show that this new criterion increases the range of passive environment.

Web [BibTex]

Web [BibTex]


no image
Using Bayesian Dynamical Systems for Motion Template Libraries

Chiappa, S., Kober, J., Peters, J.

In Advances in neural information processing systems 21, pages: 297-304, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
Motor primitives or motion templates have become an important concept for both modeling human motor control as well as generating robot behaviors using imitation learning. Recent impressive results range from humanoid robot movement generation to timing models of human motions. The automatic generation of skill libraries containing multiple motion templates is an important step in robot learning. Such a skill learning system needs to cluster similar movements together and represent each resulting motion template as a generative model which is subsequently used for the execution of the behavior by a robot system. In this paper, we show how human trajectories captured as multidimensional time-series can be clustered using Bayesian mixtures of linear Gaussian state-space models based on the similarity of their dynamics. The appropriate number of templates is automatically determined by enforcing a parsimonious parametrization. As the resulting model is intractable, we introduce a novel approximation method based on variational Bayes, which is especially designed to enable the use of efficient inference algorithms. On recorded human Balero movements, this method is not only capable of finding reasonable motion templates but also yields a generative model which works well in the execution of this complex task on a simulated anthropomorphic SARCOS arm.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Multi-way set enumeration in real-valued tensors

Georgii, E., Tsuda, K., Schölkopf, B.

In Proceedings of the 2nd Workshop on Data Mining using Matrices and Tensors (DMMT 2009), pages: 32-41, (Editors: C Ding and T Li), ACM Press, New York, NY, USA, 2nd Workshop on Data Mining using Matrices and Tensors (DMMT/KDD), June 2009 (inproceedings)

Abstract
The analysis of n-ary relations receives attention in many different fields, for instance biology, web mining, and social studies. In the basic setting, there are n sets of instances, and each observation associates n instances, one from each set. A common approach to explore these n-way data is the search for n-set patterns. An n-set pattern consists of specific subsets of the n instance sets such that all possible n- ary associations between the corresponding instances are observed. This provides a higher-level view of the data, revealing associative relationships between groups of instances. Here, we generalize this approach in two respects. First, we tolerate missing observations to a certain degree, that means we are also interested in n-sets where most (although not all) of the possible combinations have been recorded in the data. Second, we take association weights into account. More precisely, we propose a method to enumerate all n- sets that satisfy a minimum threshold with respect to the average association weight. Non-observed associations obtain by default a weight of zero. Technically, we solve the enumeration task using a reverse search strategy, which allows for effective pruning of the search space. In addition, our algorithm provides a ranking of the solutions and can consider further constraints. We show experimental results on artificial and real-world data sets from different domains.

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Non-parametric Regression between Riemannian Manifolds

Steinke, F., Hein, M.

In Advances in neural information processing systems 21, pages: 1561-1568, (Editors: Koller, D. , D. Schuurmans, Y. Bengio, L. Bottou), Curran, Red Hook, NY, USA, Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS), June 2009 (inproceedings)

Abstract
This paper discusses non-parametric regression between Riemannian manifolds. This learning problem arises frequently in many application areas ranging from signal processing, computer vision, over robotics to computer graphics. We present a new algorithmic scheme for the solution of this general learning problem based on regularized empirical risk minimization. The regularization functional takes into account the geometry of input and output manifold, and we show that it implements a prior which is particularly natural. Moreover, we demonstrate that our algorithm performs well in a difficult surface registration problem.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Near-optimal supervised feature selection among frequent subgraphs

Thoma, M., Cheng, H., Gretton, A., Han, J., Kriegel, H., Smola, A., Song, L., Yu, P., Yan, X., Borgwardt, K.

In Proccedings of the 2009 SIAM Conference on Data Mining (SDM 2009), pages: 1076-1087, (Editors: Park, H. , S. Parthasarathy, H. Liu), Philadelphia, PA, USA, Society for Industrial and Applied Mathematics, 9th SIAM Conference on Data Mining (SDM), May 2009 (inproceedings)

Abstract
Graph classification is an increasingly important step in numerous application domains, such as function prediction of molecules and proteins, computerised scene analysis, and anomaly detection in program flows. Among the various approaches proposed in the literature, graph classification based on frequent subgraphs is a popular branch: Graphs are represented as (usually binary) vectors, with components indicating whether a graph contains a particular subgraph that is frequent across the dataset. On large graphs, however, one faces the enormous problem that the number of these frequent subgraphs may grow exponentially with the size of the graphs, but only few of them possess enough discriminative power to make them useful for graph classification. Efficient and discriminative feature selection among frequent subgraphs is hence a key challenge for graph mining. In this article, we propose an approach to feature selection on frequent subgraphs, called CORK, that combines two central advantages. First, it optimizes a submodular quality criterion, which means that we can yield a near-optimal solution using greedy feature selection. Second, our submodular quality function criterion can be integrated into gSpan, the state-of-the-art tool for frequent subgraph mining, and help to prune the search space for discriminative frequent subgraphs even during frequent subgraph mining.

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Link Propagation: A Fast Semi-supervised Learning Algorithm for Link Prediction

Kashima, H., Kato, T., Yamanishi, Y., Sugiyama, M., Tsuda, K.

In Proceedings of the 2009 SIAM International Conference on Data Mining, pages: 1099-1110, (Editors: Park, H. , S. Parthasarathy, H. Liu), Philadelphia, PA, USA, Society for Industrial and Applied Mathematics, SDM, May 2009 (inproceedings)

Abstract
We propose Link Propagation as a new semi-supervised learning method for link prediction problems, where the task is to predict unknown parts of the network structure by using auxiliary information such as node similarities. Since the proposed method can fill in missing parts of tensors, it is applicable to multi-relational domains, allowing us to handle multiple types of links simultaneously. We also give a novel efficient algorithm for Link Propagation based on an accelerated conjugate gradient method.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Learning motor primitives for robotics

Kober, J., Peters, J.

In Proceedings of the 2009 IEEE International Conference on Robotics and Automation (ICRA 2009), pages: 2112-2118, IEEE Service Center, Piscataway, NJ, USA, IEEE International Conference on Robotics and Automation (ICRA '09), May 2009 (inproceedings)

Abstract
The acquisition and self-improvement of novel motor skills is among the most important problems in robotics. Motor primitives offer one of the most promising frameworks for the application of machine learning techniques in this context. Employing an improved form of the dynamic systems motor primitives originally introduced by Ijspeert et al. [2], we show how both discrete and rhythmic tasks can be learned using a concerted approach of both imitation and reinforcement learning. For doing so, we present both learning algorithms and representations targeted for the practical application in robotics. Furthermore, we show that it is possible to include a start-up phase in rhythmic primitives. We show that two new motor skills, i.e., Ball-in-a-Cup and Ball-Paddling, can be learned on a real Barrett WAM robot arm at a pace similar to human learning while achieving a significantly more reliable final performance.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
A robust Bayesian two-sample test for detecting intervals of differential gene expression in microarray time series

Stegle, O., Denby, K., Wild, DL., Ghahramani, Z., Borgwardt, KM.

In Research in Computational Molecular Biology, pages: 201-216, (Editors: Batzoglou, S. ), Springer, Berlin, Germany, 13th Annual International Conference on Research in Computational Molecular Biology (RECOMB), May 2009 (inproceedings)

Abstract
Understanding the regulatory mechanisms that are responsible for an organism’s response to environmental changes is an important question in molecular biology. A first and important step towards this goal is to detect genes whose expression levels are affected by altered external conditions. A range of methods to test for differential gene expression, both in static as well as in time-course experiments, have been proposed. While these tests answer the question whether a gene is differentially expressed, they do not explicitly address the question when a gene is differentially expressed, although this information may provide insights into the course and causal structure of regulatory programs. In this article, we propose a two-sample test for identifying intervals of differential gene expression in microarray time series. Our approach is based on Gaussian process regression, can deal with arbitrary numbers of replicates and is robust with respect to outliers. We apply our algorithm to study the response of Arabidopsis thaliana genes to an infection by a fungal pathogen using a microarray time series dataset covering 30,336 gene probes at 24 time points. In classification experiments our test compares favorably with existing methods and provides additional insights into time-dependent differential expression.

Web DOI [BibTex]

Web DOI [BibTex]


no image
A Bayesian Approach to Graph Regression with Relevant Subgraph Selection

Chiappa, S., Saigo, H., Tsuda, K.

In SIAM International Conference on Data Mining, pages: 295-304, (Editors: Park, H. , S. Parthasarathy, H. Liu), Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, SDM, May 2009 (inproceedings)

Abstract
Many real-world applications with graph data require the efficient solution of a given regression task as well as the identification of the subgraphs which are relevant for the task. In these cases graphs are commonly represented as binary vectors of indicators of subgraphs, giving rise to an intractable input dimensionality. An efficient solution to this problem was recently proposed by a Lasso-type method where the objective function optimization over an intractable number of variables is reformulated as a dual mathematical programming problem over a small number of variables but a large number of constraints. The dual problem is then solved by column generation where the subgraphs corresponding to the most violated constraints are found by weighted subgraph mining. This paper proposes an extension of this method to a fully Bayesian approach which defines a prior distribution on the parameters and integrate them out from the model, thus providing a posterior distribution on the target variable as opposed to a single estimate. The advantage of this approach is that the extra information given by the target posterior distribution can be used for improving the model in several ways. In this paper, we use the target posterior variance as a measure of uncertainty in the prediction and show that, by rejecting unconfident predictions, we can improve state-of-the-art performance on several molecular graph datasets.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Efficient data reuse in value function approximation

Hachiya, H., Akiyama, T., Sugiyama, M., Peters, J.

In IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning, pages: 8-15, IEEE Service Center, Piscataway, NJ, USA, IEEE ADPRL, May 2009 (inproceedings)

Abstract
Off-policy reinforcement learning is aimed at efficiently using data samples gathered from a policy that is different from the currently optimized policy. A common approach is to use importance sampling techniques for compensating for the bias of value function estimators caused by the difference between the data-sampling policy and the target policy. However, existing off-policy methods often do not take the variance of the value function estimators explicitly into account and therefore their performance tends to be unstable. To cope with this problem, we propose using an adaptive importance sampling technique which allows us to actively control the trade-off between bias and variance. We further provide a method for optimally determining the trade-off parameter based on a variant of cross-validation. The usefulness of the proposed approach is demonstrated through simulated swing-up inverted-pendulum problem.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Using reward-weighted imitation for robot Reinforcement Learning

Peters, J., Kober, J.

In IEEE ADPRL 2009, pages: 226-232, IEEE Service Center, Piscataway, NJ, USA, 2009 IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning, May 2009 (inproceedings)

Abstract
Reinforcement Learning is an essential ability for robots to learn new motor skills. Nevertheless, few methods scale into the domain of anthropomorphic robotics. In order to improve in terms of efficiency, the problem is reduced onto reward-weighted imitation. By doing so, we are able to generate a framework for policy learning which both unifies previous reinforcement learning approaches and allows the derivation of novel algorithms. We show our two most relevant applications both for motor primitive learning (e.g., a complex Ball-in-a-Cup task using a real Barrett WAM robot arm) and learning task-space control.

Web DOI [BibTex]

Web DOI [BibTex]


no image
Denoising photographs using dark frames optimized by quadratic programming

Gomez Rodriguez, M., Kober, J., Schölkopf, B.

In Proceedings of the First IEEE International Conference on Computational Photography (ICCP 2009), pages: 1-9, IEEE, Piscataway, NJ, USA, First IEEE International Conference on Computational Photography (ICCP), April 2009 (inproceedings)

Abstract
Photographs taken with long exposure or high ISO setting may contain substantial amounts of noise, drastically reducing the Signal-To-Noise Ratio (SNR). This paper presents a novel optimization approach for denoising. It is based on a library of dark frames previously taken under varying conditions of temperature, ISO setting and exposure time, and a quality measure or prior for the class of images to denoise. The method automatically computes a synthetic dark frame that, when subtracted from an image, optimizes the quality measure. For specific choices of the quality measure, the denoising problem reduces to a quadratic programming (QP) problem that can be solved efficiently. We show experimentally that it is sufficient to consider a limited subsample of pixels when evaluating the quality measure in the optimization, in which case the complexity of the procedure does not depend on the size of the images but only on the number of dark frames. We provide quantitative experimental results showing that our method automatically computes dark frames that are competitive with those taken under idealized conditions (controlled temperature, ISO setting, exposure time, and averaging of multiple exposures). We provide application examples in astronomical image denoising. The method is validated on two CMOS SLRs.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
On Pairwise Kernels: An Efficient Alternative and Generalization Analysis

Kashima, H., Oyama, S., Yamanishi, Y., Tsuda, K.

In Advances in Knowledge Discovery and Data Mining: 13th Pacific-Asia Conference, pages: 1030-1037, (Editors: Theeramunkong, T. , B. Kijsirikul, N. Cercone, T. B. Ho), Springer, Berlin, Germany, PAKDD, April 2009 (inproceedings)

Abstract
Pairwise classification has many applications including network prediction, entity resolution, and collaborative filtering. The pairwise kernel has been proposed for those purposes by several research groups independently, and become successful in various fields. In this paper, we propose an efficient alternative which we call Cartesian kernel. While the existing pairwise kernel (which we refer to as Kronecker kernel) can be interpreted as the weighted adjacency matrix of the Kronecker product graph of two graphs, the Cartesian kernel can be interpreted as that of the Cartesian graph which is more sparse than the Kronecker product graph. Experimental results show the Cartesian kernel is much faster than the existing pairwise kernel, and at the same time, competitive with the existing pairwise kernel in predictive performance.We discuss the generalization bounds by the two pairwise kernels by using eigenvalue analysis of the kernel matrices.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Convex Perturbations for Scalable Semidefinite Programming

Kulis, B., Sra, S., Dhillon, I.

In JMLR Workshop and Conference Proceedings Volume 5: AISTATS 2009, pages: 296-303, (Editors: van Dyk, D. , M. Welling), MIT Press, Cambridge, MA, USA, Twelfth International Conference on Artificial Intelligence and Statistics, April 2009 (inproceedings)

Abstract
Many important machine learning problems are modeled and solved via semidefinite programs; examples include metric learning, nonlinear embedding, and certain clustering problems. Often, off-the-shelf software is invoked for the associated optimization, which can be inappropriate due to excessive computational and storage requirements. In this paper, we introduce the use of convex perturbations for solving semidefinite programs (SDPs), and for a specific perturbation we derive an algorithm that has several advantages over existing techniques: a) it is simple, requiring only a few lines of Matlab, b) it is a first-order method, and thereby scalable, and c) it can easily exploit the structure of a given SDP (e.g., when the constraint matrices are low-rank, a situation common to several machine learning SDPs). A pleasant byproduct of our method is a fast, kernelized version of the large-margin nearest neighbor metric learning algorithm. We demonstrate that our algorithm is effective in finding fast approximations to large-scale SDPs arising in some machine learning applications.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Block Jacobi-type methods for non-orthogonal joint diagonalisation

Shen, H., Hüper, K.

In ICASSP09, pages: 3285-3288, IEEE Service Center, Piscataway, NJ, USA, 34th International Conference on Acoustics, Speech, and Signal Processing, April 2009 (inproceedings)

Abstract
In this paper, we study the problem of non-orthogonal joint diagonalisation of a set of real symmetric matrices via simultaneous conjugation. A family of block Jacobi-type methods are proposed to optimise two popular cost functions for the non-orthogonal joint diagonalisation, namely, the off-norm function and the log-likelihood function. By exploiting the appropriate underlying manifold, namely the so-called oblique manifold, rigorous analysis shows that, under the exact non-orthogonal joint diagonalisation setting, the proposed methods converge locally quadratically fast to a joint diagonaliser. Finally, performance of our methods is investigated by numerical experiments for both exact and approximate non-orthogonal joint diagonalisation.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
An Expectation Maximization Algorithm for Continuous Markov Decision Processes with Arbitrary Reward

Hoffman, M., Freitas, N., Doucet, A., Peters, J.

In JMLR Workshop and Conference Proceedings Volume 5: AISTATS 2009, pages: 232-239, (Editors: van Dyk, D. , M. Welling), MIT Press, Cambridge, MA, USA, Twelfth International Conference on Artificial Intelligence and Statistics, April 2009 (inproceedings)

Abstract
We derive a new expectation maximization algorithm for policy optimization in linear Gaussian Markov decision processes, where the reward function is parameterised in terms of a flexible mixture of Gaussians. This approach exploits both analytical tractability and numerical optimization. Consequently, on the one hand, it is more flexible and general than closed-form solutions, such as the widely used linear quadratic Gaussian (LQG) controllers. On the other hand, it is more accurate and faster than optimization methods that rely on approximation and simulation. Partial analytical solutions (though costly) eliminate the need for simulation and, hence, avoid approximation error. The experiments will show that for the same cost of computation, policy optimization methods that rely on analytical tractability have higher value than the ones that rely on simulation.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Efficient Graphlet Kernels for Large Graph Comparison

Shervashidze, N., Vishwanathan, S., Petri, T., Mehlhorn, K., Borgwardt, K.

In JMLR Workshop and Conference Proceedings Volume 5: AISTATS 2009, pages: 488-495, (Editors: Van Dyk, D. , M. Welling), MIT Press, Cambridge, MA, USA, Twelfth International Conference on Artificial Intelligence and Statistics, April 2009 (inproceedings)

Abstract
State-of-the-art graph kernels do not scale to large graphs with hundreds of nodes and thousands of edges. In this article we propose to compare graphs by counting {it graphlets}, ie subgraphs with $k$ nodes where $k in { 3, 4, 5 }$. Exhaustive enumeration of all graphlets being prohibitively expensive, we introduce two theoretically grounded speedup schemes, one based on sampling and the second one specifically designed for bounded degree graphs. In our experimental evaluation, our novel kernels allow us to efficiently compare large graphs that cannot be tackled by existing graph kernels.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Online blind deconvolution for astronomical imaging

Harmeling, S., Hirsch, M., Sra, S., Schölkopf, B.

In Proceedings of the First IEEE International Conference Computational Photography (ICCP 2009), pages: 1-7, IEEE, Piscataway, NJ, USA, First IEEE International Conference on Computational Photography (ICCP), April 2009 (inproceedings)

Abstract
Atmospheric turbulences blur astronomical images taken by earth-based telescopes. Taking many short-time exposures in such a situation provides noisy images of the same object, where each noisy image has a different blur. Commonly astronomers apply a technique called “Lucky Imaging” that selects a few of the recorded frames that fulfill certain criteria, such as reaching a certain peak intensity (“Strehl ratio”). The selected frames are then averaged to obtain a better image. In this paper we introduce and analyze a new method that exploits all the frames and generates an improved image in an online fashion. Our initial experiments with controlled artificial data and real-world astronomical datasets yields promising results.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
A kernel method for unsupervised structured network inference

Lippert, C., Stegle, O., Ghahramani, Z., Borgwardt, KM.

In JMLR Workshop and Conference Proceedings Volume 5: AISTATS 2009, pages: 368-375, (Editors: Van Dyk, D. , M. Welling), MIT Press, Cambridge, MA, USA, Twelfth International Conference on Artificial Intelligence and Statistics, April 2009 (inproceedings)

Abstract
Network inference is the problem of inferring edges between a set of real-world objects, for instance, interactions between pairs of proteins in bioinformatics. Current kernel-based approaches to this problem share a set of common features: (i) they are supervised and hence require labeled training data; (ii) edges in the network are treated as mutually independent and hence topological properties are largely ignored; (iii) they lack a statistical interpretation. We argue that these common assumptions are often undesirable for network inference, and propose (i) an unsupervised kernel method (ii) that takes the global structure of the network into account and (iii) is statistically motivated. We show that our approach can explain commonly used heuristics in statistical terms. In experiments on social networks, different variants of our method demonstrate appealing predictive performance.

PDF Web [BibTex]

PDF Web [BibTex]


no image
PAC-Bayesian Generalization Bound for Density Estimation with Application to Co-clustering

Seldin, Y., Tishby, N.

In JMLR Workshop and Conference Proceedings Volume 5: AISTATS 2009, pages: 472-479, MIT Press, Cambridge, MA, USA, 12th International Conference on Artificial Intelligence and Statistics, April 2009 (inproceedings)

Abstract
We derive a PAC-Bayesian generalization bound for density estimation. Similar to the PAC-Bayesian generalization bound for classification, the result has the appealingly simple form of a tradeoff between empirical performance and the KL-divergence of the posterior from the prior. Moreover, the PAC-Bayesian generalization bound for classification can be derived as a special case of the bound for density estimation. To illustrate a possible application of our bound we derive a generalization bound for co-clustering. The bound provides a criterion to evaluate the ability of co-clustering to predict new co-occurrences, thus introducing a supervised flavor to this traditionally unsupervised task.

PDF Web [BibTex]

PDF Web [BibTex]


no image
ICA with Sparse Connections: Revisited

Zhang, K., Peng, H., Chan, L., Hyvärinen, A.

In Independent Component Analysis and Signal Separation, pages: 195-202, (Editors: Adali, T. , Christian Jutten, J.M. Travassos Romano, A. Kardec Barros), Springer, Berlin, Germany, 8th International Conference on Independent Component Analysis and Signal Separation (ICA), March 2009 (inproceedings)

Abstract
When applying independent component analysis (ICA), sometimes we expect the connections between the observed mixtures and the recovered independent components (or the original sources) to be sparse, to make the interpretation easier or to reduce the random effect in the results. In this paper we propose two methods to tackle this problem. One is based on adaptive Lasso, which exploits the L 1 penalty with data-adaptive weights. We show the relationship between this method and the classic information criteria such as BIC and AIC. The other is based on optimal brain surgeon, and we show how its stopping criterion is related to the information criteria. This method produces the solution path of the transformation matrix, with different number of zero entries. These methods involve low computational loads. Moreover, in each method, the parameter controlling the sparsity level of the transformation matrix has clear interpretations. By setting such parameters to certain values, the results of the proposed methods are consistent with those produced by classic information criteria.

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Tech-note: Iterative design and test of a multimodal experience

Reckter, H., Geiger, C., Singer, J., Streuber, S.

In Proceedings of the IEEE Symposium on 3D User Interfaces (3DUI 2009), pages: 99-102, (Editors: Kiyokawa, K. , S. Coquillart, R. Balakrishnan), IEEE Service Center, Piscataway, NJ, USA, IEEE Symposium on 3D User Interfaces (3DUI), March 2009 (inproceedings)

Abstract
The goal of the Turtle surf project described in this tech-note is to design, implement and evaluate a multimodal installation that should provide a good user experience in a virtual 3D world. For this purpose we combine audio-visual media forms and different types of haptic/tactile feedback. For the latter, we focus on the application of vibrational feedback, wind and water spray and heat. We follow a user-centered design approach and try to get user feedback as early as possible during the iterative design process. We present the conceptual idea of the Turtle surf project, and the iterative design and test of prototypes that helped us to refine the final design based on collected user feedback.

Web DOI [BibTex]

Web DOI [BibTex]


no image
Efficient Bregman Range Search

Cayton, L.

In Advances in Neural Information Processing Systems 22, pages: 243-251, (Editors: Bengio, Y. , D. Schuurmans, J. Lafferty, C. Williams, A. Culotta), Curran, Red Hook, NY, USA, 23rd Annual Conference on Neural Information Processing Systems (NIPS), 2009 (inproceedings)

Abstract
We develop an algorithm for efficient range search when the notion of dissimilarity is given by a Bregman divergence. The range search task is to return all points in a potentially large database that are within some specified distance of a query. It arises in many learning algorithms such as locally-weighted regression, kernel density estimation, neighborhood graph-based algorithms, and in tasks like outlier detection and information retrieval. In metric spaces, efficient range search-like algorithms based on spatial data structures have been deployed on a variety of statistical tasks. Here we describe an algorithm for range search for an arbitrary Bregman divergence. This broad class of dissimilarity measures includes the relative entropy, Mahalanobis distance, Itakura-Saito divergence, and a variety of matrix divergences. Metric methods cannot be directly applied since Bregman divergences do not in general satisfy the triangle inequality. We derive geometric properties of Bregman divergences that yield an efficient algorithm for range search based on a recently proposed space decomposition for Bregman divergences.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions

Sriperumbudur, B., Fukumizu, K., Gretton, A., Lanckriet, G., Schölkopf, B.

In Advances in Neural Information Processing Systems 22, pages: 1750-1758, (Editors: Y Bengio and D Schuurmans and J Lafferty and C Williams and A Culotta), Curran, Red Hook, NY, USA, 23rd Annual Conference on Neural Information Processing Systems (NIPS), 2009 (inproceedings)

Abstract
Embeddings of probability measures into reproducing kernel Hilbert spaces have been proposed as a straightforward and practical means of representing and comparing probabilities. In particular, the distance between embeddings (the maximum mean discrepancy, or MMD) has several key advantages over many classical metrics on distributions, namely easy computability, fast convergence and low bias of finite sample estimates. An important requirement of the embedding RKHS is that it be characteristic: in this case, the MMD between two distributions is zero if and only if the distributions coincide. Three new results on the MMD are introduced in the present study. First, it is established that MMD corresponds to the optimal risk of a kernel classifier, thus forming a natural link between the distance between distributions and their ease of classification. An important consequence is that a kernel must be characteristic to guarantee classifiability between distributions in the RKHS. Second, the class of characteristic kernels is broadened to incorporate all strictly positive definite kernels: these include non-translation invariant kernels and kernels on non-compact domains. Third, a generalization of the MMD is proposed for families of kernels, as the supremum over MMDs on a class of kernels (for instance the Gaussian kernels with different bandwidths). This extension is necessary to obtain a single distance measure if a large selection or class of characteristic kernels is potentially appropriate. This generalization is reasonable, given that it corresponds to the problem of learning the kernel by minimizing the risk of the corresponding kernel classifier. The generalized MMD is shown to have consistent finite sample estimates, and its performance is demonstrated on a homogeneity testing example.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Nonlinear directed acyclic structure learning with weakly additive noise models

Tillman, R., Gretton, A., Spirtes, P.

In Advances in Neural Information Processing Systems 22, pages: 1847-1855, (Editors: Bengio, Y. , D. Schuurmans, J. Lafferty, C. Williams, A. Culotta), Curran, Red Hook, NY, USA, 23rd Annual Conference on Neural Information Processing Systems (NIPS), 2009 (inproceedings)

Abstract
The recently proposed emph{additive noise model} has advantages over previous structure learning algorithms, when attempting to recover some true data generating mechanism, since it (i) does not assume linearity or Gaussianity and (ii) can recover a unique DAG rather than an equivalence class. However, its original extension to the multivariate case required enumerating all possible DAGs, and for some special distributions, e.g. linear Gaussian, the model is invertible and thus cannot be used for structure learning. We present a new approach which combines a PC style search using recent advances in kernel measures of conditional dependence with local searches for additive noise models in substructures of the equivalence class. This results in a more computationally efficient approach that is useful for arbitrary distributions even when additive noise models are invertible. Experiments with synthetic and real data show that this method is more accurate than previous methods when data are nonlinear and/or non-Gaussian.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Graphical models for decoding in BCI visual speller systems

Martens, S., Farquhar, J., Hill, J., Schölkopf, B.

In pages: 470-473, IEEE, 4th International IEEE EMBS Conference on Neural Engineering (NER), 2009 (inproceedings)

DOI [BibTex]

DOI [BibTex]


no image
A Fast, Consistent Kernel Two-Sample Test

Gretton, A., Fukumizu, K., Harchaoui, Z., Sriperumbudur, B.

In Advances in Neural Information Processing Systems 22, pages: 673-681, (Editors: Bengio, Y. , D. Schuurmans, J. Lafferty, C. Williams, A. Culotta), Curran, Red Hook, NY, USA, 23rd Annual Conference on Neural Information Processing Systems (NIPS), 2009 (inproceedings)

Abstract
A kernel embedding of probability distributions into reproducing kernel Hilbert spaces (RKHS) has recently been proposed, which allows the comparison of two probability measures P and Q based on the distance between their respective embeddings: for a sufficiently rich RKHS, this distance is zero if and only if P and Q coincide. In using this distance as a statistic for a test of whether two samples are from different distributions, a major difficulty arises in computing the significance threshold, since the empirical statistic has as its null distribution (where P = Q) an infinite weighted sum of x2 random variables. Prior finite sample approximations to the null distribution include using bootstrap resampling, which yields a consistent estimate but is computationally costly; and fitting a parametric model with the low order moments of the test statistic, which can work well in practice but has no consistency or accuracy guarantees. The main result of the present work is a novel estimate of the null distribution, computed from the eigenspectrum of the Gram matrix on the aggregate sample from P and Q, and having lower computational cost than the bootstrap. A proof of consistency of this estimate is provided. The performance of the null distribution estimate is compared with the bootstrap and parametric approaches on an artificial example, high dimensional multivariate data, and text.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Augmenting Feature-driven fMRI Analyses: Semi-supervised learning and resting state activity

Blaschko, M., Shelton, J., Bartels, A.

In Advances in Neural Information Processing Systems 22, pages: 126-134, (Editors: Bengio, Y. , D. Schuurmans, J. Lafferty, C. Williams, A. Culotta), Curran, Red Hook, NY, USA, 23rd Annual Conference on Neural Information Processing Systems (NIPS), 2009 (inproceedings)

Abstract
Resting state activity is brain activation that arises in the absence of any task, and is usually measured in awake subjects during prolonged fMRI scanning sessions where the only instruction given is to close the eyes and do nothing. It has been recognized in recent years that resting state activity is implicated in a wide variety of brain function. While certain networks of brain areas have different levels of activation at rest and during a task, there is nevertheless significant similarity between activations in the two cases. This suggests that recordings of resting state activity can be used as a source of unlabeled data to augment discriminative regression techniques in a semi-supervised setting. We evaluate this setting empirically yielding three main results: (i) regression tends to be improved by the use of Laplacian regularization even when no additional unlabeled data are available, (ii) resting state data seem to have a similar marginal distribution to that recorded during the execution of a visual processing task implying largely similar types of activation, and (iii) this source of information can be broadly exploited to improve the robustness of empirical inference in fMRI studies, an inherently data poor domain.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Fast subtree kernels on graphs

Shervashidze, N., Borgwardt, K.

In Advances in Neural Information Processing Systems 22, pages: 1660-1668, (Editors: Bengio, Y. , D. Schuurmans, J. Lafferty, C. Williams, A. Culotta), Curran, Red Hook, NY, USA, 23rd Annual Conference on Neural Information Processing Systems (NIPS), 2009 (inproceedings)

Abstract
In this article, we propose fast subtree kernels on graphs. On graphs with n nodes and m edges and maximum degree d, these kernels comparing subtrees of height h can be computed in O(mh), whereas the classic subtree kernel by Ramon & G{\"a}rtner scales as O(n24dh). Key to this efficiency is the observation that the Weisfeiler-Lehman test of isomorphism from graph theory elegantly computes a subtree kernel as a byproduct. Our fast subtree kernels can deal with labeled graphs, scale up easily to large graphs and outperform state-of-the-art graph kernels on several classification benchmark datasets in terms of accuracy and runtime.

PDF Web [BibTex]

PDF Web [BibTex]


Thumb xl screen shot 2012 02 21 at 15.56.00  2
On feature combination for multiclass object classification

Gehler, P., Nowozin, S.

In Proceedings of the Twelfth IEEE International Conference on Computer Vision, pages: 221-228, ICCV, 2009, oral presentation (inproceedings)

project page, code, data GoogleScholar pdf DOI [BibTex]

project page, code, data GoogleScholar pdf DOI [BibTex]

2006


no image
Global Biclustering of Microarray Data

Wolf, T., Brors, B., Hofmann, T., Georgii, E.

In ICDMW 2006, pages: 125-129, (Editors: Tsumoto, S. , C. W. Clifton, N. Zhong, X. Wu, J. Liu, B. W. Wah, Y.-M. Cheung), IEEE Computer Society, Los Alamitos, CA, USA, Sixth IEEE International Conference on Data Mining, December 2006 (inproceedings)

Abstract
We consider the problem of simultaneously clustering genes and conditions of a gene expression data matrix. A bicluster is defined as a subset of genes that show similar behavior within a subset of conditions. Finding biclusters can be useful for revealing groups of genes involved in the same molecular process as well as groups of conditions where this process takes place. Previous work either deals with local, bicluster-based criteria or assumes a very specific structure of the data matrix (e.g. checkerboard or block-diagonal) [11]. In contrast, our goal is to find a set of flexibly arranged biclusters which is optimal in regard to a global objective function. As this is a NP-hard combinatorial problem, we describe several techniques to obtain approximate solutions. We benchmarked our approach successfully on the Alizadeh B-cell lymphoma data set [1].

Web DOI [BibTex]

2006

Web DOI [BibTex]


no image
Conformal Multi-Instance Kernels

Blaschko, M., Hofmann, T.

In NIPS 2006 Workshop on Learning to Compare Examples, pages: 1-6, NIPS Workshop on Learning to Compare Examples, December 2006 (inproceedings)

Abstract
In the multiple instance learning setting, each observation is a bag of feature vectors of which one or more vectors indicates membership in a class. The primary task is to identify if any vectors in the bag indicate class membership while ignoring vectors that do not. We describe here a kernel-based technique that defines a parametric family of kernels via conformal transformations and jointly learns a discriminant function over bags together with the optimal parameter settings of the kernel. Learning a conformal transformation effectively amounts to weighting regions in the feature space according to their contribution to classification accuracy; regions that are discriminative will be weighted higher than regions that are not. This allows the classifier to focus on regions contributing to classification accuracy while ignoring regions that correspond to vectors found both in positive and in negative bags. We show how parameters of this transformation can be learned for support vector machines by posing the problem as a multiple kernel learning problem. The resulting multiple instance classifier gives competitive accuracy for several multi-instance benchmark datasets from different domains.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Information-theoretic Metric Learning

Davis, J., Kulis, B., Sra, S., Dhillon, I.

In NIPS 2006 Workshop on Learning to Compare Examples, pages: 1-5, NIPS Workshop on Learning to Compare Examples, December 2006 (inproceedings)

Abstract
We formulate the metric learning problem as that of minimizing the differential relative entropy between two multivariate Gaussians under constraints on the Mahalanobis distance function. Via a surprising equivalence, we show that this problem can be solved as a low-rank kernel learning problem. Specifically, we minimize the Burg divergence of a low-rank kernel to an input kernel, subject to pairwise distance constraints. Our approach has several advantages over existing methods. First, we present a natural information-theoretic formulation for the problem. Second, the algorithm utilizes the methods developed by Kulis et al. [6], which do not involve any eigenvector computation; in particular, the running time of our method is faster than most existing techniques. Third, the formulation offers insights into connections between metric learning and kernel learning.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Pattern Mining in Frequent Dynamic Subgraphs

Borgwardt, KM., Kriegel, H-P., Wackersreuther, P.

In pages: 818-822, (Editors: Clifton, C.W.), IEEE Computer Society, Los Alamitos, CA, USA, Sixth International Conference on Data Mining (ICDM), December 2006 (inproceedings)

Abstract
Graph-structured data is becoming increasingly abundant in many application domains. Graph mining aims at finding interesting patterns within this data that represent novel knowledge. While current data mining deals with static graphs that do not change over time, coming years will see the advent of an increasing number of time series of graphs. In this article, we investigate how pattern mining on static graphs can be extended to time series of graphs. In particular, we are considering dynamic graphs with edge insertions and edge deletions over time. We define frequency in this setting and provide algorithmic solutions for finding frequent dynamic subgraph patterns. Existing subgraph mining algorithms can be easily integrated into our framework to make them handle dynamic graphs. Experimental results on real-world data confirm the practical feasibility of our approach.

Web DOI [BibTex]

Web DOI [BibTex]


no image
3DString: a feature string kernel for 3D object classification on voxelized data

Assfalg, J., Borgwardt, KM., Kriegel, H-P.

In pages: 198-207, (Editors: Yu, P.S. , V.J. Tsotras, E.A. Fox, B. Liu), ACM Press, New York, NY, USA, 15th ACM International Conference on Information and Knowledge Management (CIKM), November 2006 (inproceedings)

Abstract
Classification of 3D objects remains an important task in many areas of data management such as engineering, medicine or biology. As a common preprocessing step in current approaches to classification of voxelized 3D objects, voxel representations are transformed into a feature vector description.In this article, we introduce an approach of transforming 3D objects into feature strings which represent the distribution of voxels over the voxel grid. Attractively, this feature string extraction can be performed in linear runtime with respect to the number of voxels. We define a similarity measure on these feature strings that counts common k-mers in two input strings, which is referred to as the spectrum kernel in the field of kernel methods. We prove that on our feature strings, this similarity measure can be computed in time linear to the number of different characters in these strings. This linear runtime behavior makes our kernel attractive even for large datasets that occur in many application domains. Furthermore, we explain that our similarity measure induces a metric which allows to combine it with an M-tree for handling of large volumes of data. Classification experiments on two published benchmark datasets show that our novel approach is competitive with the best state-of-the-art methods for 3D object classification.

DOI [BibTex]

DOI [BibTex]


no image
Adapting Spatial Filter Methods for Nonstationary BCIs

Tomioka, R., Hill, J., Blankertz, B., Aihara, K.

In IBIS 2006, pages: 65-70, 2006 Workshop on Information-Based Induction Sciences, November 2006 (inproceedings)

Abstract
A major challenge in applying machine learning methods to Brain-Computer Interfaces (BCIs) is to overcome the possible nonstationarity in the data from the datablock the method is trained on and that the method is applied to. Assuming the joint distributions of the whitened signal and the class label to be identical in two blocks, where the whitening is done in each block independently, we propose a simple adaptation formula that is applicable to a broad class of spatial filtering methods including ICA, CSP, and logistic regression classifiers. We characterize the class of linear transformations for which the above assumption holds. Experimental results on 60 BCI datasets show improved classification accuracy compared to (a) fixed spatial filter approach (no adaptation) and (b) fixed spatial pattern approach (proposed by Hill et al., 2006 [1]).

PDF [BibTex]

PDF [BibTex]