Header logo is ei


2009


no image
Covariate shift and local learning by distribution matching

Gretton, A., Smola, A., Huang, J., Schmittfull, M., Borgwardt, K., Schölkopf, B.

In Dataset Shift in Machine Learning, pages: 131-160, (Editors: Quiñonero-Candela, J., Sugiyama, M., Schwaighofer, A. and Lawrence, N. D.), MIT Press, Cambridge, MA, USA, 2009 (inbook)

Abstract
Given sets of observations of training and test data, we consider the problem of re-weighting the training data such that its distribution more closely matches that of the test data. We achieve this goal by matching covariate distributions between training and test sets in a high dimensional feature space (specifically, a reproducing kernel Hilbert space). This approach does not require distribution estimation. Instead, the sample weights are obtained by a simple quadratic programming procedure. We provide a uniform convergence bound on the distance between the reweighted training feature mean and the test feature mean, a transductive bound on the expected loss of an algorithm trained on the reweighted data, and a connection to single class SVMs. While our method is designed to deal with the case of simple covariate shift (in the sense of Chapter ??), we have also found benefits for sample selection bias on the labels. Our correction procedure yields its greatest and most consistent advantages when the learning algorithm returns a classifier/regressor that is simpler" than the data might suggest.

PDF Web [BibTex]

2009

PDF Web [BibTex]


no image
An introduction to Kernel Learning Algorithms

Gehler, P., Schölkopf, B.

In Kernel Methods for Remote Sensing Data Analysis, pages: 25-48, 2, (Editors: Gustavo Camps-Valls and Lorenzo Bruzzone), Wiley, New York, NY, USA, 2009 (inbook)

Abstract
Kernel learning algorithms are currently becoming a standard tool in the area of machine learning and pattern recognition. In this chapter we review the fundamental theory of kernel learning. As the basic building block we introduce the kernel function, which provides an elegant and general way to compare possibly very complex objects. We then review the concept of a reproducing kernel Hilbert space and state the representer theorem. Finally we give an overview of the most prominent algorithms, which are support vector classification and regression, Gaussian Processes and kernel principal analysis. With multiple kernel learning and structured output prediction we also introduce some more recent advancements in the field.

link (url) DOI [BibTex]

link (url) DOI [BibTex]

2002


no image
Kernel Dependency Estimation

Weston, J., Chapelle, O., Elisseeff, A., Schölkopf, B., Vapnik, V.

(98), Max Planck Institute for Biological Cybernetics, August 2002 (techreport)

Abstract
We consider the learning problem of finding a dependency between a general class of objects and another, possibly different, general class of objects. The objects can be for example: vectors, images, strings, trees or graphs. Such a task is made possible by employing similarity measures in both input and output spaces using kernel functions, thus embedding the objects into vector spaces. Output kernels also make it possible to encode prior information and/or invariances in the loss function in an elegant way. We experimentally validate our approach on several tasks: mapping strings to strings, pattern recognition, and reconstruction from partial images.

PDF [BibTex]

2002

PDF [BibTex]


no image
Global Geometry of SVM Classifiers

Zhou, D., Xiao, B., Zhou, H., Dai, R.

Max Planck Institute for Biological Cybernetics, Tübingen, Germany, June 2002 (techreport)

Abstract
We construct an geometry framework for any norm Support Vector Machine (SVM) classifiers. Within this framework, separating hyperplanes, dual descriptions and solutions of SVM classifiers are constructed by a purely geometric fashion. In contrast with the optimization theory used in SVM classifiers, we have no complicated computations any more. Each step in our theory is guided by elegant geometric intuitions.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Computationally Efficient Face Detection

Romdhani, S., Torr, P., Schölkopf, B., Blake, A.

(MSR-TR-2002-69), Microsoft Research, June 2002 (techreport)

Web [BibTex]

Web [BibTex]


no image
Kernel-based nonlinear blind source separation

Harmeling, S., Ziehe, A., Kawanabe, M., Müller, K.

EU-Project BLISS, January 2002 (techreport)

GZIP [BibTex]

GZIP [BibTex]


no image
A compression approach to support vector model selection

von Luxburg, U., Bousquet, O., Schölkopf, B.

(101), Max Planck Institute for Biological Cybernetics, 2002, see more detailed JMLR version (techreport)

Abstract
In this paper we investigate connections between statistical learning theory and data compression on the basis of support vector machine (SVM) model selection. Inspired by several generalization bounds we construct ``compression coefficients'' for SVMs, which measure the amount by which the training labels can be compressed by some classification hypothesis. The main idea is to relate the coding precision of this hypothesis to the width of the margin of the SVM. The compression coefficients connect well known quantities such as the radius-margin ratio R^2/rho^2, the eigenvalues of the kernel matrix and the number of support vectors. To test whether they are useful in practice we ran model selection experiments on several real world datasets. As a result we found that compression coefficients can fairly accurately predict the parameters for which the test error is minimized.

[BibTex]

[BibTex]


no image
Feature Selection and Transduction for Prediction of Molecular Bioactivity for Drug Design

Weston, J., Perez-Cruz, F., Bousquet, O., Chapelle, O., Elisseeff, A., Schölkopf, B.

Max Planck Institute for Biological Cybernetics / Biowulf Technologies, 2002 (techreport)

Web [BibTex]

Web [BibTex]


no image
Observations on the Nyström Method for Gaussian Process Prediction

Williams, C., Rasmussen, C., Schwaighofer, A., Tresp, V.

Max Planck Institute for Biological Cybernetics, Tübingen, Germany, 2002 (techreport)

Abstract
A number of methods for speeding up Gaussian Process (GP) prediction have been proposed, including the Nystr{\"o}m method of Williams and Seeger (2001). In this paper we focus on two issues (1) the relationship of the Nystr{\"o}m method to the Subset of Regressors method (Poggio and Girosi 1990; Luo and Wahba, 1997) and (2) understanding in what circumstances the Nystr{\"o}m approximation would be expected to provide a good approximation to exact GP regression.

PostScript [BibTex]

PostScript [BibTex]