Header logo is ei


2008


no image
Frequent Subgraph Retrieval in Geometric Graph Databases

Nowozin, S., Tsuda, K.

(180), Max-Planck Institute for Biological Cybernetics, Tübingen, Germany, November 2008 (techreport)

Abstract
Discovery of knowledge from geometric graph databases is of particular importance in chemistry and biology, because chemical compounds and proteins are represented as graphs with 3D geometric coordinates. In such applications, scientists are not interested in the statistics of the whole database. Instead they need information about a novel drug candidate or protein at hand, represented as a query graph. We propose a polynomial-delay algorithm for geometric frequent subgraph retrieval. It enumerates all subgraphs of a single given query graph which are frequent geometric epsilon-subgraphs under the entire class of rigid geometric transformations in a database. By using geometric epsilon-subgraphs, we achieve tolerance against variations in geometry. We compare the proposed algorithm to gSpan on chemical compound data, and we show that for a given minimum support the total number of frequent patterns is substantially limited by requiring geometric matching. Although the computation time per pattern is larger than for non-geometric graph mining, the total time is within a reasonable level even for small minimum support.

PDF [BibTex]

2008

PDF [BibTex]


no image
Simultaneous Implicit Surface Reconstruction and Meshing

Giesen, J., Maier, M., Schölkopf, B.

(179), Max-Planck Institute for Biological Cybernetics, Tübingen, Germany, November 2008 (techreport)

Abstract
We investigate an implicit method to compute a piecewise linear representation of a surface from a set of sample points. As implicit surface functions we use the weighted sum of piecewise linear kernel functions. For such a function we can partition Rd in such a way that these functions are linear on the subsets of the partition. For each subset in the partition we can then compute the zero level set of the function exactly as the intersection of a hyperplane with the subset.

PDF [BibTex]

PDF [BibTex]


no image
Taxonomy Inference Using Kernel Dependence Measures

Blaschko, M., Gretton, A.

(181), Max-Planck Institute for Biological Cybernetics, Tübingen, Germany, November 2008 (techreport)

Abstract
We introduce a family of unsupervised algorithms, numerical taxonomy clustering, to simultaneously cluster data, and to learn a taxonomy that encodes the relationship between the clusters. The algorithms work by maximizing the dependence between the taxonomy and the original data. The resulting taxonomy is a more informative visualization of complex data than simple clustering; in addition, taking into account the relations between different clusters is shown to substantially improve the quality of the clustering, when compared with state-of-the-art algorithms in the literature (both spectral clustering and a previous dependence maximization approach). We demonstrate our algorithm on image and text data.

PDF [BibTex]

PDF [BibTex]


no image
Infinite Kernel Learning

Gehler, P., Nowozin, S.

(178), Max-Planck Institute for Biological Cybernetics, Tübingen, Germany, October 2008 (techreport)

Abstract
In this paper we consider the problem of automatically learning the kernel from general kernel classes. Specifically we build upon the Multiple Kernel Learning (MKL) framework and in particular on the work of (Argyriou, Hauser, Micchelli, & Pontil, 2006). We will formulate a Semi-Infinite Program (SIP) to solve the problem and devise a new algorithm to solve it (Infinite Kernel Learning, IKL). The IKL algorithm is applicable to both the finite and infinite case and we find it to be faster and more stable than SimpleMKL (Rakotomamonjy, Bach, Canu, & Grandvalet, 2007) for cases of many kernels. In the second part we present the first large scale comparison of SVMs to MKL on a variety of benchmark datasets, also comparing IKL. The results show two things: a) for many datasets there is no benefit in linearly combining kernels with MKL/IKL instead of the SVM classifier, thus the flexibility of using more than one kernel seems to be of no use, b) on some datasets IKL yields impressive increases in accuracy over SVM/MKL due to the possibility of using a largely increased kernel set. In those cases, IKL remains practical, whereas both cross-validation or standard MKL is infeasible.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Large Scale Variational Inference and Experimental Design for Sparse Generalized Linear Models

Seeger, M., Nickisch, H.

(175), Max-Planck Institute for Biological Cybernetics, Tübingen, Germany, September 2008 (techreport)

PDF [BibTex]

PDF [BibTex]


no image
Block-Iterative Algorithms for Non-Negative Matrix Approximation

Sra, S.

(176), Max-Planck Institute for Biological Cybernetics, Tübingen, Germany, September 2008 (techreport)

Abstract
In this report we present new algorithms for non-negative matrix approximation (NMA), commonly known as the NMF problem. Our methods improve upon the well-known methods of Lee & Seung [19] for both the Frobenius norm as well the Kullback-Leibler divergence versions of the problem. For the latter problem, our results are especially interesting because it seems to have witnessed much lesser algorithmic progress as compared to the Frobenius norm NMA problem. Our algorithms are based on a particular block-iterative acceleration technique for EM, which preserves the multiplicative nature of the updates and also ensures monotonicity. Furthermore, our algorithms also naturally apply to the Bregman-divergence NMA algorithms of Dhillon and Sra [8]. Experimentally, we show that our algorithms outperform the traditional Lee/Seung approach most of the time.

PDF [BibTex]

PDF [BibTex]


no image
Approximation Algorithms for Bregman Clustering Co-clustering and Tensor Clustering

Sra, S., Jegelka, S., Banerjee, A.

(177), Max-Planck Institute for Biological Cybernetics, Tübingen, Germany, September 2008 (techreport)

Abstract
The Euclidean K-means problem is fundamental to clustering and over the years it has been intensely investigated. More recently, generalizations such as Bregman k-means [8], co-clustering [10], and tensor (multi-way) clustering [40] have also gained prominence. A well-known computational difficulty encountered by these clustering problems is the NP-Hardness of the associated optimization task, and commonly used methods guarantee at most local optimality. Consequently, approximation algorithms of varying degrees of sophistication have been developed, though largely for the basic Euclidean K-means (or `1-norm K-median) problem. In this paper we present approximation algorithms for several Bregman clustering problems by building upon the recent paper of Arthur and Vassilvitskii [5]. Our algorithms obtain objective values within a factor O(logK) for Bregman k-means, Bregman co-clustering, Bregman tensor clustering, and weighted kernel k-means. To our knowledge, except for some special cases, approximation algorithms have not been considered for these general clustering problems. There are several important implications of our work: (i) under the same assumptions as Ackermann et al. [1] it yields a much faster algorithm (non-exponential in K, unlike [1]) for information-theoretic clustering, (ii) it answers several open problems posed by [4], including generalizations to Bregman co-clustering, and tensor clustering, (iii) it provides practical and easy to implement methods—in contrast to several other common approximation approaches.

PDF [BibTex]

PDF [BibTex]


no image
Combining Appearance and Motion for Human Action Classification in Videos

Dhillon, P., Nowozin, S., Lampert, C.

(174), Max-Planck-Institute for Biological Cybernetics, Tübingen, Germany, August 2008 (techreport)

Abstract
We study the question of activity classification in videos and present a novel approach for recognizing human action categories in videos by combining information from appearance and motion of human body parts. Our approach uses a tracking step which involves Particle Filtering and a local non - parametric clustering step. The motion information is provided by the trajectory of the cluster modes of a local set of particles. The statistical information about the particles of that cluster over a number of frames provides the appearance information. Later we use a “Bag ofWords” model to build one histogram per video sequence from the set of these robust appearance and motion descriptors. These histograms provide us characteristic information which helps us to discriminate among various human actions and thus classify them correctly. We tested our approach on the standard KTH and Weizmann human action datasets and the results were comparable to the state of the art. Additionally our approach is able to distinguish between activities that involve the motion of complete body from those in which only certain body parts move. In other words, our method discriminates well between activities with “gross motion” like running, jogging etc. and “local motion” like waving, boxing etc.

PDF [BibTex]

PDF [BibTex]


no image
Example-based Learning for Single-image Super-resolution and JPEG Artifact Removal

Kim, K., Kwon, Y.

(173), Max-Planck-Institute for Biological Cybernetics, Tübingen, Germany, August 2008 (techreport)

Abstract
This paper proposes a framework for single-image super-resolution and JPEG artifact removal. The underlying idea is to learn a map from input low-quality images (suitably preprocessed low-resolution or JPEG encoded images) to target high-quality images based on example pairs of input and output images. To retain the complexity of the resulting learning problem at a moderate level, a patch-based approach is taken such that kernel ridge regression (KRR) scans the input image with a small window (patch) and produces a patchvalued output for each output pixel location. These constitute a set of candidate images each of which reflects different local information. An image output is then obtained as a convex combination of candidates for each pixel based on estimated confidences of candidates. To reduce the time complexity of training and testing for KRR, a sparse solution is found by combining the ideas of kernel matching pursuit and gradient descent. As a regularized solution, KRR leads to a better generalization than simply storing the examples as it has been done in existing example-based super-resolution algorithms and results in much less noisy images. However, this may introduce blurring and ringing artifacts around major edges as sharp changes are penalized severely. A prior model of a generic image class which takes into account the discontinuity property of images is adopted to resolve this problem. Comparison with existing super-resolution and JPEG artifact removal methods shows the effectiveness of the proposed method. Furthermore, the proposed method is generic in that it has the potential to be applied to many other image enhancement applications.

PDF [BibTex]

PDF [BibTex]


no image
Unsupervised Bayesian Time-series Segmentation based on Linear Gaussian State-space Models

Chiappa, S.

(171), Max-Planck-Institute for Biological Cybernetics, Tübingen, Germany, June 2008 (techreport)

Abstract
Unsupervised time-series segmentation in the general scenario in which the number of segment-types and segment boundaries are a priori unknown is a fundamental problem in many applications and requires an accurate segmentation model as well as a way of determining an appropriate number of segment-types. In most approaches, segmentation and determination of number of segment-types are addressed in two separate steps, since the segmentation model assumes a predefined number of segment-types. The determination of number of segment-types is thus achieved by training and comparing several separate models. In this paper, we take a Bayesian approach to a segmentation model based on linear Gaussian state-space models to achieve structure selection within the model. An appropriate prior distribution on the parameters is used to enforce a sparse parametrization, such that the model automatically selects the smallest number of underlying dynamical systems that explain the data well and a parsimonious structure for each dynamical system. As the resulting model is computationally intractable, we introduce a variational approximation, in which a reformulation of the problem enables to use an efficient inference algorithm.

[BibTex]

[BibTex]


no image
A New Non-monotonic Gradient Projection Method for the Non-negative Least Squares Problem

Kim, D., Sra, S., Dhillon, I.

(TR-08-28), University of Texas, Austin, TX, USA, June 2008 (techreport)

Web [BibTex]

Web [BibTex]


no image
Non-monotonic Poisson Likelihood Maximization

Sra, S., Kim, D., Schölkopf, B.

(170), Max-Planck Institute for Biological Cybernetics, Tübingen, Germany, June 2008 (techreport)

Abstract
This report summarizes the theory and some main applications of a new non-monotonic algorithm for maximizing a Poisson Likelihood, which for Positron Emission Tomography (PET) is equivalent to minimizing the associated Kullback-Leibler Divergence, and for Transmission Tomography is similar to maximizing the dual of a maximum entropy problem. We call our method non-monotonic maximum likelihood (NMML) and show its application to different problems such as tomography and image restoration. We discuss some theoretical properties such as convergence for our algorithm. Our experimental results indicate that speedups obtained via our non-monotonic methods are substantial.

PDF [BibTex]

PDF [BibTex]


no image
New Frontiers in Characterizing Structure and Dynamics by NMR

Nilges, M., Markwick, P., Malliavin, TE., Rieping, W., Habeck, M.

In Computational Structural Biology: Methods and Applications, pages: 655-680, (Editors: Schwede, T. , M. C. Peitsch), World Scientific, New Jersey, NJ, USA, May 2008 (inbook)

Abstract
Nuclear Magnetic Resonance (NMR) spectroscopy has emerged as the method of choice for studying both the structure and the dynamics of biological macromolecule in solution. Despite the maturity of the NMR method for structure determination, its application faces a number of challenges. The method is limited to systems of relatively small molecular mass, data collection times are long, data analysis remains a lengthy procedure, and it is difficult to evaluate the quality of the final structures. The last years have seen significant advances in experimental techniques to overcome or reduce some limitations. The function of bio-macromolecules is determined by both their 3D structure and conformational dynamics. These molecules are inherently flexible systems displaying a broad range of dynamics on time–scales from picoseconds to seconds. NMR is unique in its ability to obtain dynamic information on an atomic scale. The experimental information on structure and dynamics is intricately mixed. It is however difficult to unite both structural and dynamical information into one consistent model, and protocols for the determination of structure and dynamics are performed independently. This chapter deals with the challenges posed by the interpretation of NMR data on structure and dynamics. We will first relate the standard structure calculation methods to Bayesian probability theory. We will then briefly describe the advantages of a fully Bayesian treatment of structure calculation. Then, we will illustrate the advantages of using Bayesian reasoning at least partly in standard structure calculations. The final part will be devoted to interpretation of experimental data on dynamics.

Web [BibTex]

Web [BibTex]


no image
A Kernel Method for the Two-sample Problem

Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B., Smola, A.

(157), Max-Planck-Institute for Biological Cybernetics Tübingen, April 2008 (techreport)

Abstract
We propose a framework for analyzing and comparing distributions, allowing us to design statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS). We present two tests based on large deviation bounds for the test statistic, while a third is based on the asymptotic distribution of this statistic. The test statistic can be computed in quadratic time, although efficient linear time approximations are available. Several classical metrics on distributions are recovered when the function space used to compute the difference in expectations is allowed to be more general (eg.~a Banach space). We apply our two-sample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests.

PDF [BibTex]

PDF [BibTex]


no image
Energy Functionals for Manifold-valued Mappings and Their Properties

Hein, M., Steinke, F., Schölkopf, B.

(167), Max Planck Institute for Biological Cybernetics, Tübingen, January 2008 (techreport)

Abstract
This technical report is merely an extended version of the appendix of Steinke et.al. "Manifold-valued Thin-Plate Splines with Applications in Computer Graphics" (2008) with complete proofs, which had to be omitted due to space restrictions. This technical report requires a basic knowledge of differential geometry. However, apart from that requirement the technical report is self-contained.

PDF [BibTex]

PDF [BibTex]


no image
A Robot System for Biomimetic Navigation: From Snapshots to Metric Embeddings of View Graphs

Franz, MO., Stürzl, W., Reichardt, W., Mallot, HA.

In Robotics and Cognitive Approaches to Spatial Mapping, pages: 297-314, Springer Tracts in Advanced Robotics ; 38, (Editors: Jefferies, M.E. , W.-K. Yeap), Springer, Berlin, Germany, 2008 (inbook)

Abstract
Complex navigation behaviour (way-finding) involves recognizing several places and encoding a spatial relationship between them. Way-finding skills can be classified into a hierarchy according to the complexity of the tasks that can be performed [8]. The most basic form of way-finding is route navigation, followed by topological navigation where several routes are integrated into a graph-like representation. The highest level, survey navigation, is reached when this graph can be embedded into a common reference frame. In this chapter, we present the building blocks for a biomimetic robot navigation system that encompasses all levels of this hierarchy. As a local navigation method, we use scene-based homing. In this scheme, a goal location is characterized either by a panoramic snapshot of the light intensities as seen from the place, or by a record of the distances to the surrounding objects. The goal is found by moving in the direction that minimizes the discrepancy between the recorded intensities or distances and the current sensory input. For learning routes, the robot selects distinct views during exploration that are close enough to be reached by snapshot-based homing. When it encounters already visited places during route learning, it connects the routes and thus forms a topological representation of its environment termed a view graph. The final stage, survey navigation, is achieved by a graph embedding procedure which complements the topologic information of the view graph with odometric position estimates. Calculation of the graph embedding is done with a modified multidimensional scaling algorithm which makes use of distances and angles between nodes.

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]

2006


no image
A New Projected Quasi-Newton Approach for the Nonnegative Least Squares Problem

Kim, D., Sra, S., Dhillon, I.

(TR-06-54), Univ. of Texas, Austin, December 2006 (techreport)

PDF [BibTex]

2006

PDF [BibTex]


no image
Probabilistic inference for solving (PO)MDPs

Toussaint, M., Harmeling, S., Storkey, A.

(934), School of Informatics, University of Edinburgh, December 2006 (techreport)

PDF [BibTex]

PDF [BibTex]


no image
Minimal Logical Constraint Covering Sets

Sinz, F., Schölkopf, B.

(155), Max Planck Institute for Biological Cybernetics, Tübingen, December 2006 (techreport)

Abstract
We propose a general framework for computing minimal set covers under class of certain logical constraints. The underlying idea is to transform the problem into a mathematical programm under linear constraints. In this sense it can be seen as a natural extension of the vector quantization algorithm proposed by Tipping and Schoelkopf. We show which class of logical constraints can be cast and relaxed into linear constraints and give an algorithm for the transformation.

PDF [BibTex]

PDF [BibTex]


no image
Prediction of Protein Function from Networks

Shin, H., Tsuda, K.

In Semi-Supervised Learning, pages: 361-376, Adaptive Computation and Machine Learning, (Editors: Chapelle, O. , B. Schölkopf, A. Zien), MIT Press, Cambridge, MA, USA, November 2006 (inbook)

Abstract
In computational biology, it is common to represent domain knowledge using graphs. Frequently there exist multiple graphs for the same set of nodes, representing information from different sources, and no single graph is sufficient to predict class labels of unlabelled nodes reliably. One way to enhance reliability is to integrate multiple graphs, since individual graphs are partly independent and partly complementary to each other for prediction. In this chapter, we describe an algorithm to assign weights to multiple graphs within graph-based semi-supervised learning. Both predicting class labels and searching for weights for combining multiple graphs are formulated into one convex optimization problem. The graph-combining method is applied to functional class prediction of yeast proteins.When compared with individual graphs, the combined graph with optimized weights performs significantly better than any single graph.When compared with the semidefinite programming-based support vector machine (SDP/SVM), it shows comparable accuracy in a remarkably short time. Compared with a combined graph with equal-valued weights, our method could select important graphs without loss of accuracy, which implies the desirable property of integration with selectivity.

Web [BibTex]

Web [BibTex]


no image
Discrete Regularization

Zhou, D., Schölkopf, B.

In Semi-supervised Learning, pages: 237-250, Adaptive computation and machine learning, (Editors: O Chapelle and B Schölkopf and A Zien), MIT Press, Cambridge, MA, USA, November 2006 (inbook)

Abstract
Many real-world machine learning problems are situated on finite discrete sets, including dimensionality reduction, clustering, and transductive inference. A variety of approaches for learning from finite sets has been proposed from different motivations and for different problems. In most of those approaches, a finite set is modeled as a graph, in which the edges encode pairwise relationships among the objects in the set. Consequently many concepts and methods from graph theory are adopted. In particular, the graph Laplacian is widely used. In this chapter we present a systemic framework for learning from a finite set represented as a graph. We develop discrete analogues of a number of differential operators, and then construct a discrete analogue of classical regularization theory based on those discrete differential operators. The graph Laplacian based approaches are special cases of this general discrete regularization framework. An important thing implied in this framework is that we have a wide choices of regularization on graph in addition to the widely-used graph Laplacian based one.

PDF Web [BibTex]

PDF Web [BibTex]


no image
New Methods for the P300 Visual Speller

Biessmann, F.

(1), (Editors: Hill, J. ), Max-Planck Institute for Biological Cybernetics, Tübingen, Germany, November 2006 (techreport)

PDF [BibTex]

PDF [BibTex]


no image
Geometric Analysis of Hilbert Schmidt Independence criterion based ICA contrast function

Shen, H., Jegelka, S., Gretton, A.

(PA006080), National ICT Australia, Canberra, Australia, October 2006 (techreport)

Web [BibTex]

Web [BibTex]


no image
A tutorial on spectral clustering

von Luxburg, U.

(149), Max Planck Institute for Biological Cybernetics, Tübingen, August 2006 (techreport)

Abstract
In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm. Nevertheless, on the first glance spectral clustering looks a bit mysterious, and it is not obvious to see why it works at all and what it really does. This article is a tutorial introduction to spectral clustering. We describe different graph Laplacians and their basic properties, present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches. Advantages and disadvantages of the different spectral clustering algorithms are discussed.

PDF [BibTex]

PDF [BibTex]


no image
Towards the Inference of Graphs on Ordered Vertexes

Zien, A., Raetsch, G., Ong, C.

(150), Max Planck Institute for Biological Cybernetics, Tübingen, August 2006 (techreport)

Abstract
We propose novel methods for machine learning of structured output spaces. Specifically, we consider outputs which are graphs with vertices that have a natural order. We consider the usual adjacency matrix representation of graphs, as well as two other representations for such a graph: (a) decomposing the graph into a set of paths, (b) converting the graph into a single sequence of nodes with labeled edges. For each of the three representations, we propose an encoding and decoding scheme. We also propose an evaluation measure for comparing two graphs.

PDF [BibTex]

PDF [BibTex]


no image
Nonnegative Matrix Approximation: Algorithms and Applications

Sra, S., Dhillon, I.

Univ. of Texas, Austin, May 2006 (techreport)

[BibTex]

[BibTex]


no image
An Automated Combination of Sequence Motif Kernels for Predicting Protein Subcellular Localization

Zien, A., Ong, C.

(146), Max Planck Institute for Biological Cybernetics, Tübingen, April 2006 (techreport)

Abstract
Protein subcellular localization is a crucial ingredient to many important inferences about cellular processes, including prediction of protein function and protein interactions. While many predictive computational tools have been proposed, they tend to have complicated architectures and require many design decisions from the developer. We propose an elegant and fully automated approach to building a prediction system for protein subcellular localization. We propose a new class of protein sequence kernels which considers all motifs including motifs with gaps. This class of kernels allows the inclusion of pairwise amino acid distances into their computation. We further propose a multiclass support vector machine method which directly solves protein subcellular localization without resorting to the common approach of splitting the problem into several binary classification problems. To automatically search over families of possible amino acid motifs, we generalize our method to optimize over multiple kernels at the same time. We compare our automated approach to four other predictors on three different datasets.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Training a Support Vector Machine in the Primal

Chapelle, O.

(147), Max Planck Institute for Biological Cybernetics, Tübingen, April 2006, The version in the "Large Scale Kernel Machines" book is more up to date. (techreport)

Abstract
Most literature on Support Vector Machines (SVMs) concentrate on the dual optimization problem. In this paper, we would like to point out that the primal problem can also be solved efficiently, both for linear and non-linear SVMs, and there is no reason for ignoring it. Moreover, from the primal point of view, new families of algorithms for large scale SVM training can be investigated.

PDF [BibTex]

PDF [BibTex]


no image
Cross-Validation Optimization for Structured Hessian Kernel Methods

Seeger, M., Chapelle, O.

Max-Planck Institute for Biological Cybernetics, Tübingen, Germany, February 2006 (techreport)

Abstract
We address the problem of learning hyperparameters in kernel methods for which the Hessian of the objective is structured. We propose an approximation to the cross-validation log likelihood whose gradient can be computed analytically, solving the hyperparameter learning problem efficiently through nonlinear optimization. Crucially, our learning method is based entirely on matrix-vector multiplication primitives with the kernel matrices and their derivatives, allowing straightforward specialization to new kernels or to large datasets. When applied to the problem of multi-way classification, our method scales linearly in the number of classes and gives rise to state-of-the-art results on a remote imaging task.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Combining a Filter Method with SVMs

Lal, T., Chapelle, O., Schölkopf, B.

In Feature Extraction: Foundations and Applications, Studies in Fuzziness and Soft Computing, Vol. 207, pages: 439-446, Studies in Fuzziness and Soft Computing ; 207, (Editors: I Guyon and M Nikravesh and S Gunn and LA Zadeh), Springer, Berlin, Germany, 2006 (inbook)

Abstract
Our goal for the competition (feature selection competition NIPS 2003) was to evaluate the usefulness of simple machine learning techniques. We decided to use the correlation criteria as a feature selection method and Support Vector Machines for the classification part. Here we explain how we chose the regularization parameter C of the SVM, how we determined the kernel parameter and how we estimated the number of features used for each data set. All analyzes were carried out on the training sets of the competition data. We choose the data set Arcene as an example to explain the approach step by step. In our view the point of this competition was the construction of a well performing classifier rather than the systematic analysis of a specific approach. This is why our search for the best classifier was only guided by the described methods and that we deviated from the road map at several occasions. All calculations were done with the software Spider [2004].

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Embedded methods

Lal, T., Chapelle, O., Weston, J., Elisseeff, A.

In Feature Extraction: Foundations and Applications, pages: 137-165, Studies in Fuzziness and Soft Computing ; 207, (Editors: Guyon, I. , S. Gunn, M. Nikravesh, L. A. Zadeh), Springer, Berlin, Germany, 2006 (inbook)

Abstract
Embedded methods are a relatively new approach to feature selection. Unlike filter methods, which do not incorporate learning, and wrapper approaches, which can be used with arbitrary classifiers, in embedded methods the features selection part can not be separated from the learning part. Existing embedded methods are reviewed based on a unifying mathematical framework.

PDF Web [BibTex]

PDF Web [BibTex]


Thumb xl screen shot 2012 06 06 at 11.31.38 am
Implicit Wiener Series, Part II: Regularised estimation

Gehler, P., Franz, M.

(148), Max Planck Institute, 2006 (techreport)

pdf [BibTex]

2002


no image
Kernel Dependency Estimation

Weston, J., Chapelle, O., Elisseeff, A., Schölkopf, B., Vapnik, V.

(98), Max Planck Institute for Biological Cybernetics, August 2002 (techreport)

Abstract
We consider the learning problem of finding a dependency between a general class of objects and another, possibly different, general class of objects. The objects can be for example: vectors, images, strings, trees or graphs. Such a task is made possible by employing similarity measures in both input and output spaces using kernel functions, thus embedding the objects into vector spaces. Output kernels also make it possible to encode prior information and/or invariances in the loss function in an elegant way. We experimentally validate our approach on several tasks: mapping strings to strings, pattern recognition, and reconstruction from partial images.

PDF [BibTex]

2002

PDF [BibTex]


no image
Global Geometry of SVM Classifiers

Zhou, D., Xiao, B., Zhou, H., Dai, R.

Max Planck Institute for Biological Cybernetics, Tübingen, Germany, June 2002 (techreport)

Abstract
We construct an geometry framework for any norm Support Vector Machine (SVM) classifiers. Within this framework, separating hyperplanes, dual descriptions and solutions of SVM classifiers are constructed by a purely geometric fashion. In contrast with the optimization theory used in SVM classifiers, we have no complicated computations any more. Each step in our theory is guided by elegant geometric intuitions.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Computationally Efficient Face Detection

Romdhani, S., Torr, P., Schölkopf, B., Blake, A.

(MSR-TR-2002-69), Microsoft Research, June 2002 (techreport)

Web [BibTex]

Web [BibTex]


no image
Kernel-based nonlinear blind source separation

Harmeling, S., Ziehe, A., Kawanabe, M., Müller, K.

EU-Project BLISS, January 2002 (techreport)

GZIP [BibTex]

GZIP [BibTex]


no image
A compression approach to support vector model selection

von Luxburg, U., Bousquet, O., Schölkopf, B.

(101), Max Planck Institute for Biological Cybernetics, 2002, see more detailed JMLR version (techreport)

Abstract
In this paper we investigate connections between statistical learning theory and data compression on the basis of support vector machine (SVM) model selection. Inspired by several generalization bounds we construct ``compression coefficients'' for SVMs, which measure the amount by which the training labels can be compressed by some classification hypothesis. The main idea is to relate the coding precision of this hypothesis to the width of the margin of the SVM. The compression coefficients connect well known quantities such as the radius-margin ratio R^2/rho^2, the eigenvalues of the kernel matrix and the number of support vectors. To test whether they are useful in practice we ran model selection experiments on several real world datasets. As a result we found that compression coefficients can fairly accurately predict the parameters for which the test error is minimized.

[BibTex]

[BibTex]


no image
Feature Selection and Transduction for Prediction of Molecular Bioactivity for Drug Design

Weston, J., Perez-Cruz, F., Bousquet, O., Chapelle, O., Elisseeff, A., Schölkopf, B.

Max Planck Institute for Biological Cybernetics / Biowulf Technologies, 2002 (techreport)

Web [BibTex]

Web [BibTex]


no image
Observations on the Nyström Method for Gaussian Process Prediction

Williams, C., Rasmussen, C., Schwaighofer, A., Tresp, V.

Max Planck Institute for Biological Cybernetics, Tübingen, Germany, 2002 (techreport)

Abstract
A number of methods for speeding up Gaussian Process (GP) prediction have been proposed, including the Nystr{\"o}m method of Williams and Seeger (2001). In this paper we focus on two issues (1) the relationship of the Nystr{\"o}m method to the Subset of Regressors method (Poggio and Girosi 1990; Luo and Wahba, 1997) and (2) understanding in what circumstances the Nystr{\"o}m approximation would be expected to provide a good approximation to exact GP regression.

PostScript [BibTex]

PostScript [BibTex]

2000


no image
Robust ensemble learning

Rätsch, G., Schölkopf, B., Smola, A., Mika, S., Onoda, T., Müller, K.

In Advances in Large Margin Classifiers, pages: 207-220, Neural Information Processing Series, (Editors: AJ Smola and PJ Bartlett and B Schölkopf and D. Schuurmans), MIT Press, Cambridge, MA, USA, October 2000 (inbook)

[BibTex]

2000

[BibTex]


no image
Entropy numbers for convex combinations and MLPs

Smola, A., Elisseeff, A., Schölkopf, B., Williamson, R.

In Advances in Large Margin Classifiers, pages: 369-387, Neural Information Processing Series, (Editors: AJ Smola and PL Bartlett and B Schölkopf and D Schuurmans), MIT Press, Cambridge, MA,, October 2000 (inbook)

[BibTex]

[BibTex]


no image
Natural Regularization from Generative Models

Oliver, N., Schölkopf, B., Smola, A.

In Advances in Large Margin Classifiers, pages: 51-60, Neural Information Processing Series, (Editors: AJ Smola and PJ Bartlett and B Schölkopf and D Schuurmans), MIT Press, Cambridge, MA, USA, October 2000 (inbook)

[BibTex]

[BibTex]


no image
Solving Satisfiability Problems with Genetic Algorithms

Harmeling, S.

In Genetic Algorithms and Genetic Programming at Stanford 2000, pages: 206-213, (Editors: Koza, J. R.), Stanford Bookstore, Stanford, CA, USA, June 2000 (inbook)

Abstract
We show how to solve hard 3-SAT problems using genetic algorithms. Furthermore, we explore other genetic operators that may be useful to tackle 3-SAT problems, and discuss their pros and cons.

PDF [BibTex]

PDF [BibTex]


no image
Statistical Learning and Kernel Methods

Schölkopf, B.

In CISM Courses and Lectures, International Centre for Mechanical Sciences Vol.431, CISM Courses and Lectures, International Centre for Mechanical Sciences, 431(23):3-24, (Editors: G Della Riccia and H-J Lenz and R Kruse), Springer, Vienna, Data Fusion and Perception, 2000 (inbook)

[BibTex]

[BibTex]


no image
An Introduction to Kernel-Based Learning Algorithms

Müller, K., Mika, S., Rätsch, G., Tsuda, K., Schölkopf, B.

In Handbook of Neural Network Signal Processing, 4, (Editors: Yu Hen Hu and Jang-Neng Hwang), CRC Press, 2000 (inbook)

[BibTex]

[BibTex]


no image
The Kernel Trick for Distances

Schölkopf, B.

(MSR-TR-2000-51), Microsoft Research, Redmond, WA, USA, 2000 (techreport)

Abstract
A method is described which, like the kernel trick in support vector machines (SVMs), lets us generalize distance-based algorithms to operate in feature spaces, usually nonlinearly related to the input space. This is done by identifying a class of kernels which can be represented as normbased distances in Hilbert spaces. It turns out that common kernel algorithms, such as SVMs and kernel PCA, are actually really distance based algorithms and can be run with that class of kernels, too. As well as providing a useful new insight into how these algorithms work, the present work can form the basis for conceiving new algorithms.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Kernel method for percentile feature extraction

Schölkopf, B., Platt, J., Smola, A.

(MSR-TR-2000-22), Microsoft Research, 2000 (techreport)

Abstract
A method is proposed which computes a direction in a dataset such that a speci􏰘ed fraction of a particular class of all examples is separated from the overall mean by a maximal margin􏰤 The pro jector onto that direction can be used for class􏰣speci􏰘c feature extraction􏰤 The algorithm is carried out in a feature space associated with a support vector kernel function􏰢 hence it can be used to construct a large class of nonlinear fea􏰣 ture extractors􏰤 In the particular case where there exists only one class􏰢 the method can be thought of as a robust form of principal component analysis􏰢 where instead of variance we maximize percentile thresholds􏰤 Fi􏰣 nally􏰢 we generalize it to also include the possibility of specifying negative examples􏰤

PDF [BibTex]

PDF [BibTex]

1998


no image
Generalization bounds and learning rates for Regularized principal manifolds

Smola, A., Williamson, R., Schölkopf, B.

NeuroCOLT, 1998, NeuroColt2-TR 1998-027 (techreport)

[BibTex]

1998

[BibTex]


no image
Generalization Bounds for Convex Combinations of Kernel Functions

Smola, A., Williamson, R., Schölkopf, B.

Royal Holloway College, 1998 (techreport)

[BibTex]

[BibTex]


no image
Generalization Performance of Regularization Networks and Support Vector Machines via Entropy Numbers of Compact Operators

Williamson, R., Smola, A., Schölkopf, B.

(19), NeuroCOLT, 1998, Accepted for publication in IEEE Transactions on Information Theory (techreport)

[BibTex]

[BibTex]