Header logo is ei


2013


no image
Studying large-scale brain networks: electrical stimulation and neural-event-triggered fMRI

Logothetis, N., Eschenko, O., Murayama, Y., Augath, M., Steudel, T., Evrard, H., Besserve, M., Oeltermann, A.

Twenty-Second Annual Computational Neuroscience Meeting (CNS*2013), July 2013, journal = {BMC Neuroscience}, year = {2013}, month = {7}, volume = {14}, number = {Supplement 1}, pages = {A1}, (talk)

Web [BibTex]

2013

Web [BibTex]


no image
Animating Samples from Gaussian Distributions

Hennig, P.

(8), Max Planck Institute for Intelligent Systems, Tübingen, Germany, 2013 (techreport)

PDF [BibTex]

PDF [BibTex]


no image
Domain Generalization via Invariant Feature Representation

Muandet, K.

30th International Conference on Machine Learning (ICML2013), 2013 (talk)

PDF [BibTex]

PDF [BibTex]


no image
Maximizing Kepler science return per telemetered pixel: Detailed models of the focal plane in the two-wheel era

Hogg, D. W., Angus, R., Barclay, T., Dawson, R., Fergus, R., Foreman-Mackey, D., Harmeling, S., Hirsch, M., Lang, D., Montet, B. T., Schiminovich, D., Schölkopf, B.

arXiv:1309.0653, 2013 (techreport)

link (url) [BibTex]

link (url) [BibTex]


no image
Maximizing Kepler science return per telemetered pixel: Searching the habitable zones of the brightest stars

Montet, B. T., Angus, R., Barclay, T., Dawson, R., Fergus, R., Foreman-Mackey, D., Harmeling, S., Hirsch, M., Hogg, D. W., Lang, D., Schiminovich, D., Schölkopf, B.

arXiv:1309.0654, 2013 (techreport)

link (url) [BibTex]

link (url) [BibTex]

2006


no image
A Kernel Method for the Two-Sample-Problem

Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B., Smola, A.

20th Annual Conference on Neural Information Processing Systems (NIPS), December 2006 (talk)

Abstract
We propose two statistical tests to determine if two samples are from different distributions. Our test statistic is in both cases the distance between the means of the two samples mapped into a reproducing kernel Hilbert space (RKHS). The first test is based on a large deviation bound for the test statistic, while the second is based on the asymptotic distribution of this statistic. We show that the test statistic can be computed in $O(m^2)$ time. We apply our approach to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where our test performs strongly. We also demonstrate excellent performance when comparing distributions over graphs, for which no alternative tests currently exist.

PDF [BibTex]

2006

PDF [BibTex]


no image
Ab-initio gene finding using machine learning

Schweikert, G., Zeller, G., Zien, A., Ong, C., de Bona, F., Sonnenburg, S., Phillips, P., Rätsch, G.

NIPS Workshop on New Problems and Methods in Computational Biology, December 2006 (talk)

Web [BibTex]

Web [BibTex]


no image
Graph boosting for molecular QSAR analysis

Saigo, H., Kadowaki, T., Kudo, T., Tsuda, K.

NIPS Workshop on New Problems and Methods in Computational Biology, December 2006 (talk)

Abstract
We propose a new boosting method that systematically combines graph mining and mathematical programming-based machine learning. Informative and interpretable subgraph features are greedily found by a series of graph mining calls. Due to our mathematical programming formulation, subgraph features and pre-calculated real-valued features are seemlessly integrated. We tested our algorithm on a quantitative structure-activity relationship (QSAR) problem, which is basically a regression problem when given a set of chemical compounds. In benchmark experiments, the prediction accuracy of our method favorably compared with the best results reported on each dataset.

Web [BibTex]

Web [BibTex]


no image
Inferring Causal Directions by Evaluating the Complexity of Conditional Distributions

Sun, X., Janzing, D., Schölkopf, B.

NIPS Workshop on Causality and Feature Selection, December 2006 (talk)

Abstract
We propose a new approach to infer the causal structure that has generated the observed statistical dependences among n random variables. The idea is that the factorization of the joint measure of cause and effect into P(cause)P(effect|cause) leads typically to simpler conditionals than non-causal factorizations. To evaluate the complexity of the conditionals we have tried two methods. First, we have compared them to those which maximize the conditional entropy subject to the observed first and second moments since we consider the latter as the simplest conditionals. Second, we have fitted the data with conditional probability measures being exponents of functions in an RKHS space and defined the complexity by a Hilbert-space semi-norm. Such a complexity measure has several properties that are useful for our purpose. We describe some encouraging results with both methods applied to real-world data. Moreover, we have combined constraint-based approaches to causal discovery (i.e., methods using only information on conditional statistical dependences) with our method in order to distinguish between causal hypotheses which are equivalent with respect to the imposed independences. Furthermore, we compare the performance to Bayesian approaches to causal inference.

Web [BibTex]


no image
Minimal Logical Constraint Covering Sets

Sinz, F., Schölkopf, B.

(155), Max Planck Institute for Biological Cybernetics, Tübingen, December 2006 (techreport)

Abstract
We propose a general framework for computing minimal set covers under class of certain logical constraints. The underlying idea is to transform the problem into a mathematical programm under linear constraints. In this sense it can be seen as a natural extension of the vector quantization algorithm proposed by Tipping and Schoelkopf. We show which class of logical constraints can be cast and relaxed into linear constraints and give an algorithm for the transformation.

PDF [BibTex]

PDF [BibTex]


no image
Learning Optimal EEG Features Across Time, Frequency and Space

Farquhar, J., Hill, J., Schölkopf, B.

NIPS Workshop on Current Trends in Brain-Computer Interfacing, December 2006 (talk)

PDF Web [BibTex]

PDF Web [BibTex]


no image
Semi-Supervised Learning

Zien, A.

Advanced Methods in Sequence Analysis Lectures, November 2006 (talk)

Web [BibTex]

Web [BibTex]


no image
New Methods for the P300 Visual Speller

Biessmann, F.

(1), (Editors: Hill, J. ), Max-Planck Institute for Biological Cybernetics, Tübingen, Germany, November 2006 (techreport)

PDF [BibTex]

PDF [BibTex]


no image
A Machine Learning Approach for Determining the PET Attenuation Map from Magnetic Resonance Images

Hofmann, M., Steinke, F., Judenhofer, M., Claussen, C., Schölkopf, B., Pichler, B.

IEEE Medical Imaging Conference, November 2006 (talk)

Abstract
A promising new combination in multimodality imaging is MR-PET, where the high soft tissue contrast of Magnetic Resonance Imaging (MRI) and the functional information of Positron Emission Tomography (PET) are combined. Although many technical problems have recently been solved, it is still an open problem to determine the attenuation map from the available MR scan, as the MR intensities are not directly related to the attenuation values. One standard approach is an atlas registration where the atlas MR image is aligned with the patient MR thus also yielding an attenuation image for the patient. We also propose another approach, which to our knowledge has not been tried before: Using Support Vector Machines we predict the attenuation value directly from the local image information. We train this well-established machine learning algorithm using small image patches. Although both approaches sometimes yielded acceptable results, they also showed their specific shortcomings: The registration often fails with large deformations whereas the prediction approach is problematic when the local image structure is not characteristic enough. However, the failures often do not coincide and integration of both information sources is promising. We therefore developed a combination method extending Support Vector Machines to use not only local image structure but also atlas registered coordinates. We demonstrate the strength of this combination approach on a number of examples.

[BibTex]

[BibTex]


no image
Geometric Analysis of Hilbert Schmidt Independence criterion based ICA contrast function

Shen, H., Jegelka, S., Gretton, A.

(PA006080), National ICT Australia, Canberra, Australia, October 2006 (techreport)

Web [BibTex]

Web [BibTex]


no image
Semi-Supervised Support Vector Machines and Application to Spam Filtering

Zien, A.

ECML Discovery Challenge Workshop, September 2006 (talk)

Abstract
After introducing the semi-supervised support vector machine (aka TSVM for "transductive SVM"), a few popular training strategies are briefly presented. Then the assumptions underlying semi-supervised learning are reviewed. Finally, two modern TSVM optimization techniques are applied to the spam filtering data sets of the workshop; it is shown that they can achieve excellent results, if the problem of the data being non-iid can be handled properly.

PDF Web [BibTex]


no image
Inferential Structure Determination: Probabilistic determination and validation of NMR structures

Habeck, M.

Gordon Research Conference on Computational Aspects of Biomolecular NMR, September 2006 (talk)

Web [BibTex]

Web [BibTex]


no image
A tutorial on spectral clustering

von Luxburg, U.

(149), Max Planck Institute for Biological Cybernetics, Tübingen, August 2006 (techreport)

Abstract
In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm. Nevertheless, on the first glance spectral clustering looks a bit mysterious, and it is not obvious to see why it works at all and what it really does. This article is a tutorial introduction to spectral clustering. We describe different graph Laplacians and their basic properties, present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches. Advantages and disadvantages of the different spectral clustering algorithms are discussed.

PDF [BibTex]

PDF [BibTex]


no image
Machine Learning Algorithms for Polymorphism Detection

Schweikert, G., Zeller, G., Clark, R., Ossowski, S., Warthmann, N., Shinn, P., Frazer, K., Ecker, J., Huson, D., Weigel, D., Schölkopf, B., Rätsch, G.

2nd ISCB Student Council Symposium, August 2006 (talk)

Abstract
Analyzing resequencing array data using machine learning, we obtain a genome-wide inventory of polymorphisms in 20 wild strains of Arabidopsis thaliana, including 750,000 single nucleotide poly- morphisms (SNPs) and thousands of highly polymorphic regions and deletions. We thus provide an unprecedented resource for the study of natural variation in plants.

Web [BibTex]

Web [BibTex]


no image
Towards the Inference of Graphs on Ordered Vertexes

Zien, A., Raetsch, G., Ong, C.

(150), Max Planck Institute for Biological Cybernetics, Tübingen, August 2006 (techreport)

Abstract
We propose novel methods for machine learning of structured output spaces. Specifically, we consider outputs which are graphs with vertices that have a natural order. We consider the usual adjacency matrix representation of graphs, as well as two other representations for such a graph: (a) decomposing the graph into a set of paths, (b) converting the graph into a single sequence of nodes with labeled edges. For each of the three representations, we propose an encoding and decoding scheme. We also propose an evaluation measure for comparing two graphs.

PDF [BibTex]

PDF [BibTex]


no image
Inferential structure determination: Overview and new developments

Habeck, M.

Sixth CCPN Annual Conference: Efficient and Rapid Structure Determination by NMR, July 2006 (talk)

Web [BibTex]

Web [BibTex]


no image
MCMC inference in (Conditionally) Conjugate Dirichlet Process Gaussian Mixture Models

Rasmussen, C., Görür, D.

ICML Workshop on Learning with Nonparametric Bayesian Methods, June 2006 (talk)

Abstract
We compare the predictive accuracy of the Dirichlet Process Gaussian mixture models using conjugate and conditionally conjugate priors and show that better density models result from using the wider class of priors. We explore several MCMC schemes exploiting conditional conjugacy and show their computational merits on several multidimensional density estimation problems.

Web [BibTex]

Web [BibTex]


no image
Sampling for non-conjugate infinite latent feature models

Görür, D., Rasmussen, C.

(Editors: Bernardo, J. M.), 8th Valencia International Meeting on Bayesian Statistics (ISBA), June 2006 (talk)

Abstract
Latent variable models are powerful tools to model the underlying structure in data. Infinite latent variable models can be defined using Bayesian nonparametrics. Dirichlet process (DP) models constitute an example of infinite latent class models in which each object is assumed to belong to one of the, mutually exclusive, infinitely many classes. Recently, the Indian buffet process (IBP) has been defined as an extension of the DP. IBP is a distribution over sparse binary matrices with infinitely many columns which can be used as a distribution for non-exclusive features. Inference using Markov chain Monte Carlo (MCMC) in conjugate IBP models has been previously described, however requiring conjugacy restricts the use of IBP. We describe an MCMC algorithm for non-conjugate IBP models. Modelling the choice behaviour is an important topic in psychology, economics and related fields. Elimination by Aspects (EBA) is a choice model that assumes each alternative has latent features with associated weights that lead to the observed choice outcomes. We formulate a non-parametric version of EBA by using IBP as the prior over the latent binary features. We infer the features of objects that lead to the choice data by using our sampling scheme for inference.

PDF [BibTex]

PDF [BibTex]


no image
An Automated Combination of Sequence Motif Kernels for Predicting Protein Subcellular Localization

Zien, A., Ong, C.

(146), Max Planck Institute for Biological Cybernetics, Tübingen, April 2006 (techreport)

Abstract
Protein subcellular localization is a crucial ingredient to many important inferences about cellular processes, including prediction of protein function and protein interactions. While many predictive computational tools have been proposed, they tend to have complicated architectures and require many design decisions from the developer. We propose an elegant and fully automated approach to building a prediction system for protein subcellular localization. We propose a new class of protein sequence kernels which considers all motifs including motifs with gaps. This class of kernels allows the inclusion of pairwise amino acid distances into their computation. We further propose a multiclass support vector machine method which directly solves protein subcellular localization without resorting to the common approach of splitting the problem into several binary classification problems. To automatically search over families of possible amino acid motifs, we generalize our method to optimize over multiple kernels at the same time. We compare our automated approach to four other predictors on three different datasets.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Training a Support Vector Machine in the Primal

Chapelle, O.

(147), Max Planck Institute for Biological Cybernetics, Tübingen, April 2006, The version in the "Large Scale Kernel Machines" book is more up to date. (techreport)

Abstract
Most literature on Support Vector Machines (SVMs) concentrate on the dual optimization problem. In this paper, we would like to point out that the primal problem can also be solved efficiently, both for linear and non-linear SVMs, and there is no reason for ignoring it. Moreover, from the primal point of view, new families of algorithms for large scale SVM training can be investigated.

PDF [BibTex]

PDF [BibTex]


no image
An Inventory of Sequence Polymorphisms For Arabidopsis

Clark, R., Ossowski, S., Schweikert, G., Rätsch, G., Shinn, P., Zeller, G., Warthmann, N., Fu, G., Hinds, D., Chen, H., Frazer, K., Huson, D., Schölkopf, B., Nordborg, M., Ecker, J., Weigel, D.

17th International Conference on Arabidopsis Research, April 2006 (talk)

Abstract
We have used high-density oligonucleotide arrays to characterize common sequence variation in 20 wild strains of Arabidopsis thaliana that were chosen for maximal genetic diversity. Both strands of each possible SNP of the 119 Mb reference genome were represented on the arrays, which were hybridized with whole genome, isothermally amplified DNA to minimize ascertainment biases. Using two complementary approaches, a model based algorithm, and a newly developed machine learning method, we identified over 550,000 SNPs with a false discovery rate of ~ 0.03 (average of 1 SNP for every 216 bp of the genome). A heuristic algorithm predicted in addition ~700 highly polymorphic or deleted regions per accession. Over 700 predicted polymorphisms with major functional effects (e.g., premature stop codons, or deletions of coding sequence) were validated by dideoxy sequencing. Using this data set, we provide the first systematic description of the types of genes that harbor major effect polymorphisms in natural populations at moderate allele frequencies. The data also provide an unprecedented resource for the study of genetic variation in an experimentally tractable, multicellular model organism.

[BibTex]

[BibTex]


no image
Cross-Validation Optimization for Structured Hessian Kernel Methods

Seeger, M., Chapelle, O.

Max-Planck Institute for Biological Cybernetics, Tübingen, Germany, February 2006 (techreport)

Abstract
We address the problem of learning hyperparameters in kernel methods for which the Hessian of the objective is structured. We propose an approximation to the cross-validation log likelihood whose gradient can be computed analytically, solving the hyperparameter learning problem efficiently through nonlinear optimization. Crucially, our learning method is based entirely on matrix-vector multiplication primitives with the kernel matrices and their derivatives, allowing straightforward specialization to new kernels or to large datasets. When applied to the problem of multi-way classification, our method scales linearly in the number of classes and gives rise to state-of-the-art results on a remote imaging task.

PDF Web [BibTex]

PDF Web [BibTex]

2004


no image
Joint Kernel Maps

Weston, J., Schölkopf, B., Bousquet, O., Mann, .., Noble, W.

(131), Max-Planck-Institute for Biological Cybernetics, Tübingen, November 2004 (techreport)

PDF [BibTex]

2004

PDF [BibTex]


no image
Discrete vs. Continuous: Two Sides of Machine Learning

Zhou, D.

October 2004 (talk)

Abstract
We consider the problem of transductive inference. In many real-world problems, unlabeled data is far easier to obtain than labeled data. Hence transductive inference is very significant in many practical problems. According to Vapnik's point of view, one should predict the function value only on the given points directly rather than a function defined on the whole space, the latter being a more complicated problem. Inspired by this idea, we develop discrete calculus on finite discrete spaces, and then build discrete regularization. A family of transductive algorithms is naturally derived from this regularization framework. We validate the algorithms on both synthetic and real-world data from text/web categorization to bioinformatics problems. A significant by-product of this work is a powerful way of ranking data based on examples including images, documents, proteins and many other kinds of data. This talk is mainly based on the followiing contribution: (1) D. Zhou and B. Sch{\"o}lkopf: Transductive Inference with Graphs, MPI Technical report, August, 2004; (2) D. Zhou, B. Sch{\"o}lkopf and T. Hofmann. Semi-supervised Learning on Directed Graphs. NIPS 2004; (3) D. Zhou, O. Bousquet, T.N. Lal, J. Weston and B. Sch{\"o}lkopf. Learning with Local and Global Consistency. NIPS 2003.

PDF [BibTex]


no image
Grundlagen von Support Vector Maschinen und Anwendungen in der Bildverarbeitung

Eichhorn, J.

September 2004 (talk)

Abstract
Invited talk at the workshop "Numerical, Statistical and Discrete Methods in Image Processing" at the TU M{\"u}nchen (in GERMAN)

PDF [BibTex]


no image
Semi-Supervised Induction

Yu, K., Tresp, V., Zhou, D.

(141), Max Planck Institute for Biological Cybernetics, Tuebingen, Germany, August 2004 (techreport)

Abstract
Considerable progress was recently achieved on semi-supervised learning, which differs from the traditional supervised learning by additionally exploring the information of the unlabelled examples. However, a disadvantage of many existing methods is that it does not generalize to unseen inputs. This paper investigates learning methods that effectively make use of both labelled and unlabelled data to build predictive functions, which are defined on not just the seen inputs but the whole space. As a nice property, the proposed method allows effcient training and can easily handle new test points. We validate the method based on both toy data and real world data sets.

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Object categorization with SVM: kernels for local features

Eichhorn, J., Chapelle, O.

(137), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, July 2004 (techreport)

Abstract
In this paper, we propose to combine an efficient image representation based on local descriptors with a Support Vector Machine classifier in order to perform object categorization. For this purpose, we apply kernels defined on sets of vectors. After testing different combinations of kernel / local descriptors, we have been able to identify a very performant one.

PDF [BibTex]

PDF [BibTex]


no image
Hilbertian Metrics and Positive Definite Kernels on Probability Measures

Hein, M., Bousquet, O.

(126), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, July 2004 (techreport)

Abstract
We investigate the problem of defining Hilbertian metrics resp. positive definite kernels on probability measures, continuing previous work. This type of kernels has shown very good results in text classification and has a wide range of possible applications. In this paper we extend the two-parameter family of Hilbertian metrics of Topsoe such that it now includes all commonly used Hilbertian metrics on probability measures. This allows us to do model selection among these metrics in an elegant and unified way. Second we investigate further our approach to incorporate similarity information of the probability space into the kernel. The analysis provides a better understanding of these kernels and gives in some cases a more efficient way to compute them. Finally we compare all proposed kernels in two text and one image classification problem.

PDF [BibTex]

PDF [BibTex]


no image
Kernels, Associated Structures and Generalizations

Hein, M., Bousquet, O.

(127), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, July 2004 (techreport)

Abstract
This paper gives a survey of results in the mathematical literature on positive definite kernels and their associated structures. We concentrate on properties which seem potentially relevant for Machine Learning and try to clarify some results that have been misused in the literature. Moreover we consider different lines of generalizations of positive definite kernels. Namely we deal with operator-valued kernels and present the general framework of Hilbertian subspaces of Schwartz which we use to introduce kernels which are distributions. Finally indefinite kernels and their associated reproducing kernel spaces are considered.

PDF [BibTex]

PDF [BibTex]


no image
Riemannian Geometry on Graphs and its Application to Ranking and Classification

Zhou, D.

June 2004 (talk)

Abstract
We consider the problem of transductive inference. In many real-world problems, unlabeled data is far easier to obtain than labeled data. Hence transductive inference is very significant in many practical problems. According to Vapnik's point of view, one should predict the function value only on the given points directly rather than a function defined on the whole space, the latter being a more complicated problem. Inspired by this idea, we develop discrete calculus on finite discrete spaces, and then build discrete regularization. A family of transductive algorithms is naturally derived from this regularization framework. We validate the algorithms on both synthetic and real-world data from text/web categorization to bioinformatics problems. A significant by-product of this work is a powerful way of ranking data based on examples including images, documents, proteins and many other kinds of data.

PDF [BibTex]


no image
Kamerakalibrierung und Tiefenschätzung: Ein Vergleich von klassischer Bündelblockausgleichung und statistischen Lernalgorithmen

Sinz, FH.

Wilhelm-Schickard-Institut für Informatik, Universität Tübingen, Tübingen, Germany, March 2004 (techreport)

Abstract
Die Arbeit verleicht zwei Herangehensweisen an das Problem der Sch{\"a}tzung der r{\"a}umliche Position eines Punktes aus den Bildkoordinaten in zwei verschiedenen Kameras. Die klassische Methode der B{\"u}ndelblockausgleichung modelliert zwei Einzelkameras und sch{\"a}tzt deren {\"a}ußere und innere Orientierung mit einer iterativen Kalibrationsmethode, deren Konvergenz sehr stark von guten Startwerten abh{\"a}ngt. Die Tiefensch{\"a}tzung eines Punkts geschieht durch die Invertierung von drei der insgesamt vier Projektionsgleichungen der Einzalkameramodelle. Die zweite Methode benutzt Kernel Ridge Regression und Support Vector Regression, um direkt eine Abbildung von den Bild- auf die Raumkoordinaten zu lernen. Die Resultate zeigen, daß der Ansatz mit maschinellem Lernen, neben einer erheblichen Vereinfachung des Kalibrationsprozesses, zu h{\"o}heren Positionsgenaugikeiten f{\"u}hren kann.

PDF [BibTex]

PDF [BibTex]


no image
Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking

Zhou, D.

January 2004 (talk)

Abstract
We consider the general problem of learning from labeled and unlabeled data, which is often called semi-supervised learning or transductive inference. A principled approach to semi-supervised learning is to design a classifying function which is sufficiently smooth with respect to the intrinsic structure collectively revealed by known labeled and unlabeled points. We present a simple algorithm to obtain such a smooth solution. Our method yields encouraging experimental results on a number of classification problems and demonstrates effective use of unlabeled data.

PDF [BibTex]


no image
Introduction to Category Theory

Bousquet, O.

Internal Seminar, January 2004 (talk)

Abstract
A brief introduction to the general idea behind category theory with some basic definitions and examples. A perspective on higher dimensional categories is given.

PDF [BibTex]

PDF [BibTex]


no image
Multivariate Regression with Stiefel Constraints

Bakir, G., Gretton, A., Franz, M., Schölkopf, B.

(128), MPI for Biological Cybernetics, Spemannstr 38, 72076, Tuebingen, 2004 (techreport)

Abstract
We introduce a new framework for regression between multi-dimensional spaces. Standard methods for solving this problem typically reduce the problem to one-dimensional regression by choosing features in the input and/or output spaces. These methods, which include PLS (partial least squares), KDE (kernel dependency estimation), and PCR (principal component regression), select features based on different a-priori judgments as to their relevance. Moreover, loss function and constraints are chosen not primarily on statistical grounds, but to simplify the resulting optimisation. By contrast, in our approach the feature construction and the regression estimation are performed jointly, directly minimizing a loss function that we specify, subject to a rank constraint. A major advantage of this approach is that the loss is no longer chosen according to the algorithmic requirements, but can be tailored to the characteristics of the task at hand; the features will then be optimal with respect to this objective. Our approach also allows for the possibility of using a regularizer in the optimization. Finally, by processing the observations sequentially, our algorithm is able to work on large scale problems.

PDF [BibTex]

PDF [BibTex]


no image
Learning from Labeled and Unlabeled Data Using Random Walks

Zhou, D., Schölkopf, B.

Max Planck Institute for Biological Cybernetics, 2004 (techreport)

Abstract
We consider the general problem of learning from labeled and unlabeled data. Given a set of points, some of them are labeled, and the remaining points are unlabeled. The goal is to predict the labels of the unlabeled points. Any supervised learning algorithm can be applied to this problem, for instance, Support Vector Machines (SVMs). The problem of our interest is if we can implement a classifier which uses the unlabeled data information in some way and has higher accuracy than the classifiers which use the labeled data only. Recently we proposed a simple algorithm, which can substantially benefit from large amounts of unlabeled data and demonstrates clear superiority to supervised learning methods. In this paper we further investigate the algorithm using random walks and spectral graph theory, which shed light on the key steps in this algorithm.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Behaviour and Convergence of the Constrained Covariance

Gretton, A., Smola, A., Bousquet, O., Herbrich, R., Schölkopf, B., Logothetis, N.

(130), MPI for Biological Cybernetics, 2004 (techreport)

Abstract
We discuss reproducing kernel Hilbert space (RKHS)-based measures of statistical dependence, with emphasis on constrained covariance (COCO), a novel criterion to test dependence of random variables. We show that COCO is a test for independence if and only if the associated RKHSs are universal. That said, no independence test exists that can distinguish dependent and independent random variables in all circumstances. Dependent random variables can result in a COCO which is arbitrarily close to zero when the source densities are highly non-smooth, which can make dependence hard to detect empirically. All current kernel-based independence tests share this behaviour. Finally, we demonstrate exponential convergence between the population and empirical COCO, which implies that COCO does not suffer from slow learning rates when used as a dependence test.

PDF [BibTex]

PDF [BibTex]


no image
Confidence Sets for Ratios: A Purely Geometric Approach To Fieller’s Theorem

von Luxburg, U., Franz, V.

(133), Max Planck Institute for Biological Cybernetics, 2004 (techreport)

Abstract
We present a simple, geometric method to construct Fieller's exact confidence sets for ratios of jointly normally distributed random variables. Contrary to previous geometric approaches in the literature, our method is valid in the general case where both sample mean and covariance are unknown. Moreover, not only the construction but also its proof are purely geometric and elementary, thus giving intuition into the nature of the confidence sets.

PDF [BibTex]

PDF [BibTex]


no image
Transductive Inference with Graphs

Zhou, D., Schölkopf, B.

Max Planck Institute for Biological Cybernetics, 2004, See the improved version Regularization on Discrete Spaces. (techreport)

Abstract
We propose a general regularization framework for transductive inference. The given data are thought of as a graph, where the edges encode the pairwise relationships among data. We develop discrete analysis and geometry on graphs, and then naturally adapt the classical regularization in the continuous case to the graph situation. A new and effective algorithm is derived from this general framework, as well as an approach we developed before.

[BibTex]

[BibTex]


no image
Advanced Statistical Learning Theory

Bousquet, O.

Machine Learning Summer School, 2004 (talk)

PDF [BibTex]

PDF [BibTex]

2002


no image
Kernel Dependency Estimation

Weston, J., Chapelle, O., Elisseeff, A., Schölkopf, B., Vapnik, V.

(98), Max Planck Institute for Biological Cybernetics, August 2002 (techreport)

Abstract
We consider the learning problem of finding a dependency between a general class of objects and another, possibly different, general class of objects. The objects can be for example: vectors, images, strings, trees or graphs. Such a task is made possible by employing similarity measures in both input and output spaces using kernel functions, thus embedding the objects into vector spaces. Output kernels also make it possible to encode prior information and/or invariances in the loss function in an elegant way. We experimentally validate our approach on several tasks: mapping strings to strings, pattern recognition, and reconstruction from partial images.

PDF [BibTex]

2002

PDF [BibTex]


no image
A compression approach to support vector model selection

von Luxburg, U., Bousquet, O., Schölkopf, B.

(101), Max Planck Institute for Biological Cybernetics, 2002, see more detailed JMLR version (techreport)

Abstract
In this paper we investigate connections between statistical learning theory and data compression on the basis of support vector machine (SVM) model selection. Inspired by several generalization bounds we construct ``compression coefficients'' for SVMs, which measure the amount by which the training labels can be compressed by some classification hypothesis. The main idea is to relate the coding precision of this hypothesis to the width of the margin of the SVM. The compression coefficients connect well known quantities such as the radius-margin ratio R^2/rho^2, the eigenvalues of the kernel matrix and the number of support vectors. To test whether they are useful in practice we ran model selection experiments on several real world datasets. As a result we found that compression coefficients can fairly accurately predict the parameters for which the test error is minimized.

[BibTex]

[BibTex]