Header logo is ei


2003


no image
Concentration Inequalities for Sub-Additive Functions Using the Entropy Method

Bousquet, O.

Stochastic Inequalities and Applications, 56, pages: 213-247, Progress in Probability, (Editors: Giné, E., C. Houdré and D. Nualart), November 2003 (article)

Abstract
We obtain exponential concentration inequalities for sub-additive functions of independent random variables under weak conditions on the increments of those functions, like the existence of exponential moments for these increments. As a consequence of these general inequalities, we obtain refinements of Talagrand's inequality for empirical processes and new bounds for randomized empirical processes. These results are obtained by further developing the entropy method introduced by Ledoux.

PostScript [BibTex]

2003

PostScript [BibTex]


no image
Statistical Learning Theory, Capacity and Complexity

Schölkopf, B.

Complexity, 8(4):87-94, July 2003 (article)

Abstract
We give an exposition of the ideas of statistical learning theory, followed by a discussion of how a reinterpretation of the insights of learning theory could potentially also benefit our understanding of a certain notion of complexity.

Web DOI [BibTex]


no image
Dealing with large Diagonals in Kernel Matrices

Weston, J., Schölkopf, B., Eskin, E., Leslie, C., Noble, W.

Annals of the Institute of Statistical Mathematics, 55(2):391-408, June 2003 (article)

Abstract
In kernel methods, all the information about the training data is contained in the Gram matrix. If this matrix has large diagonal values, which arises for many types of kernels, then kernel methods do not perform well: We propose and test several methods for dealing with this problem by reducing the dynamic range of the matrix while preserving the positive definiteness of the Hessian of the quadratic programming problem that one has to solve when training a Support Vector Machine, which is a common kernel approach for pattern recognition.

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
The em Algorithm for Kernel Matrix Completion with Auxiliary Data

Tsuda, K., Akaho, S., Asai, K.

Journal of Machine Learning Research, 4, pages: 67-81, May 2003 (article)

PDF [BibTex]

PDF [BibTex]


no image
Constructing Descriptive and Discriminative Non-linear Features: Rayleigh Coefficients in Kernel Feature Spaces

Mika, S., Rätsch, G., Weston, J., Schölkopf, B., Smola, A., Müller, K.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(5):623-628, May 2003 (article)

Abstract
We incorporate prior knowledge to construct nonlinear algorithms for invariant feature extraction and discrimination. Employing a unified framework in terms of a nonlinearized variant of the Rayleigh coefficient, we propose nonlinear generalizations of Fisher‘s discriminant and oriented PCA using support vector kernel functions. Extensive simulations show the utility of our approach.

DOI [BibTex]

DOI [BibTex]


no image
Tractable Inference for Probabilistic Data Models

Csato, L., Opper, M., Winther, O.

Complexity, 8(4):64-68, April 2003 (article)

Abstract
We present an approximation technique for probabilistic data models with a large number of hidden variables, based on ideas from statistical physics. We give examples for two nontrivial applications. © 2003 Wiley Periodicals, Inc.

PDF GZIP Web [BibTex]

PDF GZIP Web [BibTex]


no image
Feature selection and transduction for prediction of molecular bioactivity for drug design

Weston, J., Perez-Cruz, F., Bousquet, O., Chapelle, O., Elisseeff, A., Schölkopf, B.

Bioinformatics, 19(6):764-771, April 2003 (article)

Abstract
Motivation: In drug discovery a key task is to identify characteristics that separate active (binding) compounds from inactive (non-binding) ones. An automated prediction system can help reduce resources necessary to carry out this task. Results: Two methods for prediction of molecular bioactivity for drug design are introduced and shown to perform well in a data set previously studied as part of the KDD (Knowledge Discovery and Data Mining) Cup 2001. The data is characterized by very few positive examples, a very large number of features (describing three-dimensional properties of the molecules) and rather different distributions between training and test data. Two techniques are introduced specifically to tackle these problems: a feature selection method for unbalanced data and a classifier which adapts to the distribution of the the unlabeled test data (a so-called transductive method). We show both techniques improve identification performance and in conjunction provide an improvement over using only one of the techniques. Our results suggest the importance of taking into account the characteristics in this data which may also be relevant in other problems of a similar type.

Web [BibTex]


no image
Use of the Zero-Norm with Linear Models and Kernel Methods

Weston, J., Elisseeff, A., Schölkopf, B., Tipping, M.

Journal of Machine Learning Research, 3, pages: 1439-1461, March 2003 (article)

Abstract
We explore the use of the so-called zero-norm of the parameters of linear models in learning. Minimization of such a quantity has many uses in a machine learning context: for variable or feature selection, minimizing training error and ensuring sparsity in solutions. We derive a simple but practical method for achieving these goals and discuss its relationship to existing techniques of minimizing the zero-norm. The method boils down to implementing a simple modification of vanilla SVM, namely via an iterative multiplicative rescaling of the training data. Applications we investigate which aid our discussion include variable and feature selection on biological microarray data, and multicategory classification.

PDF PostScript PDF [BibTex]

PDF PostScript PDF [BibTex]


no image
An Introduction to Variable and Feature Selection.

Guyon, I., Elisseeff, A.

Journal of Machine Learning, 3, pages: 1157-1182, 2003 (article)

[BibTex]

[BibTex]


no image
New Approaches to Statistical Learning Theory

Bousquet, O.

Annals of the Institute of Statistical Mathematics, 55(2):371-389, 2003 (article)

Abstract
We present new tools from probability theory that can be applied to the analysis of learning algorithms. These tools allow to derive new bounds on the generalization performance of learning algorithms and to propose alternative measures of the complexity of the learning task, which in turn can be used to derive new learning algorithms.

PostScript [BibTex]

PostScript [BibTex]

2001


no image
Regularized principal manifolds

Smola, A., Mika, S., Schölkopf, B., Williamson, R.

Journal of Machine Learning Research, 1, pages: 179-209, June 2001 (article)

Abstract
Many settings of unsupervised learning can be viewed as quantization problems - the minimization of the expected quantization error subject to some restrictions. This allows the use of tools such as regularization from the theory of (supervised) risk minimization for unsupervised learning. This setting turns out to be closely related to principal curves, the generative topographic map, and robust coding. We explore this connection in two ways: (1) we propose an algorithm for finding principal manifolds that can be regularized in a variety of ways; and (2) we derive uniform convergence bounds and hence bounds on the learning rates of the algorithm. In particular, we give bounds on the covering numbers which allows us to obtain nearly optimal learning rates for certain types of regularization operators. Experimental results demonstrate the feasibility of the approach.

PDF [BibTex]

2001

PDF [BibTex]


no image
The psychometric function: II. Bootstrap-based confidence intervals and sampling

Wichmann, F., Hill, N.

Perception and Psychophysics, 63 (8), pages: 1314-1329, 2001 (article)

PDF [BibTex]

PDF [BibTex]


no image
The psychometric function: I. Fitting, sampling and goodness-of-fit

Wichmann, F., Hill, N.

Perception and Psychophysics, 63 (8), pages: 1293-1313, 2001 (article)

Abstract
The psychometric function relates an observer'sperformance to an independent variable, usually some physical quantity of a stimulus in a psychophysical task. This paper, together with its companion paper (Wichmann & Hill, 2001), describes an integrated approach to (1) fitting psychometric functions, (2) assessing the goodness of fit, and (3) providing confidence intervals for the function'sparameters and other estimates derived from them, for the purposes of hypothesis testing. The present paper deals with the first two topics, describing a constrained maximum-likelihood method of parameter estimation and developing several goodness-of-fit tests. Using Monte Carlo simulations, we deal with two specific difficulties that arise when fitting functions to psychophysical data. First, we note that human observers are prone to stimulus-independent errors (or lapses ). We show that failure to account for this can lead to serious biases in estimates of the psychometric function'sparameters and illustrate how the problem may be overcome. Second, we note that psychophysical data sets are usually rather small by the standards required by most of the commonly applied statistical tests. We demonstrate the potential errors of applying traditional X^2 methods to psychophysical data and advocate use of Monte Carlo resampling techniques that do not rely on asymptotic theory. We have made available the software to implement our methods

PDF [BibTex]

PDF [BibTex]


no image
Markovian domain fingerprinting: statistical segmentation of protein sequences

Bejerano, G., Seldin, Y., Margalit, H., Tishby, N.

Bioinformatics, 17(10):927-934, 2001 (article)

PDF Web [BibTex]

PDF Web [BibTex]