Header logo is ei


2005


no image
Invariance of Neighborhood Relation under Input Space to Feature Space Mapping

Shin, H., Cho, S.

Pattern Recognition Letters, 26(6):707-718, 2005 (article)

Abstract
If the training pattern set is large, it takes a large memory and a long time to train support vector machine (SVM). Recently, we proposed neighborhood property based pattern selection algorithm (NPPS) which selects only the patterns that are likely to be near the decision boundary ahead of SVM training [Proc. of the 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Lecture Notes in Artificial Intelligence (LNAI 2637), Seoul, Korea, pp. 376–387]. NPPS tries to identify those patterns that are likely to become support vectors in feature space. Preliminary reports show its effectiveness: SVM training time was reduced by two orders of magnitude with almost no loss in accuracy for various datasets. It has to be noted, however, that decision boundary of SVM and support vectors are all defined in feature space while NPPS described above operates in input space. If neighborhood relation in input space is not preserved in feature space, NPPS may not always be effective. In this paper, we sh ow that the neighborhood relation is invariant under input to feature space mapping. The result assures that the patterns selected by NPPS in input space are likely to be located near decision boundary in feature space.

PDF PDF [BibTex]

2005

PDF PDF [BibTex]


no image
Intrinsic Dimensionality Estimation of Submanifolds in Euclidean space

Hein, M., Audibert, Y.

In Proceedings of the 22nd International Conference on Machine Learning, pages: 289 , (Editors: De Raedt, L. , S. Wrobel), ICML Bonn, 2005 (inproceedings)

Abstract
We present a new method to estimate the intrinsic dimensionality of a submanifold M in Euclidean space from random samples. The method is based on the convergence rates of a certain U-statistic on the manifold. We solve at least partially the question of the choice of the scale of the data. Moreover the proposed method is easy to implement, can handle large data sets and performs very well even for small sample sizes. We compare the proposed method to two standard estimators on several artificial as well as real data sets.

PDF [BibTex]

PDF [BibTex]


no image
Large Scale Genomic Sequence SVM Classifiers

Sonnenburg, S., Rätsch, G., Schölkopf, B.

In Proceedings of the 22nd International Conference on Machine Learning, pages: 849-856, (Editors: L De Raedt and S Wrobel), ACM, New York, NY, USA, ICML, 2005 (inproceedings)

Abstract
In genomic sequence analysis tasks like splice site recognition or promoter identification, large amounts of training sequences are available, and indeed needed to achieve sufficiently high classification performances. In this work we study two recently proposed and successfully used kernels, namely the Spectrum kernel and the Weighted Degree kernel (WD). In particular, we suggest several extensions using Suffix Trees and modi cations of an SMO-like SVM training algorithm in order to accelerate the training of the SVMs and their evaluation on test sequences. Our simulations show that for the spectrum kernel and WD kernel, large scale SVM training can be accelerated by factors of 20 and 4 times, respectively, while using much less memory (e.g. no kernel caching). The evaluation on new sequences is often several thousand times faster using the new techniques (depending on the number of Support Vectors). Our method allows us to train on sets as large as one million sequences.

PDF [BibTex]

PDF [BibTex]


no image
Joint Kernel Maps

Weston, J., Schölkopf, B., Bousquet, O.

In Proceedings of the 8th InternationalWork-Conference on Artificial Neural Networks, LNCS 3512, pages: 176-191, (Editors: J Cabestany and A Prieto and F Sandoval), Springer, Berlin Heidelberg, Germany, IWANN, 2005 (inproceedings)

Abstract
We develop a methodology for solving high dimensional dependency estimation problems between pairs of data types, which is viable in the case where the output of interest has very high dimension, e.g., thousands of dimensions. This is achieved by mapping the objects into continuous or discrete spaces, using joint kernels. Known correlations between input and output can be defined by such kernels, some of which can maintain linearity in the outputs to provide simple (closed form) pre-images. We provide examples of such kernels and empirical results.

PostScript DOI [BibTex]

PostScript DOI [BibTex]


no image
Analysis of Some Methods for Reduced Rank Gaussian Process Regression

Quinonero Candela, J., Rasmussen, C.

In Switching and Learning in Feedback Systems, pages: 98-127, (Editors: Murray Smith, R. , R. Shorten), Springer, Berlin, Germany, European Summer School on Multi-Agent Control, 2005 (inproceedings)

Abstract
While there is strong motivation for using Gaussian Processes (GPs) due to their excellent performance in regression and classification problems, their computational complexity makes them impractical when the size of the training set exceeds a few thousand cases. This has motivated the recent proliferation of a number of cost-effective approximations to GPs, both for classification and for regression. In this paper we analyze one popular approximation to GPs for regression: the reduced rank approximation. While generally GPs are equivalent to infinite linear models, we show that Reduced Rank Gaussian Processes (RRGPs) are equivalent to finite sparse linear models. We also introduce the concept of degenerate GPs and show that they correspond to inappropriate priors. We show how to modify the RRGP to prevent it from being degenerate at test time. Training RRGPs consists both in learning the covariance function hyperparameters and the support set. We propose a method for learning hyperparameters for a given support set. We also review the Sparse Greedy GP (SGGP) approximation (Smola and Bartlett, 2001), which is a way of learning the support set for given hyperparameters based on approximating the posterior. We propose an alternative method to the SGGP that has better generalization capabilities. Finally we make experiments to compare the different ways of training a RRGP. We provide some Matlab code for learning RRGPs.

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Approximate Inference for Robust Gaussian Process Regression

Kuss, M., Pfingsten, T., Csato, L., Rasmussen, C.

(136), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, 2005 (techreport)

Abstract
Gaussian process (GP) priors have been successfully used in non-parametric Bayesian regression and classification models. Inference can be performed analytically only for the regression model with Gaussian noise. For all other likelihood models inference is intractable and various approximation techniques have been proposed. In recent years expectation-propagation (EP) has been developed as a general method for approximate inference. This article provides a general summary of how expectation-propagation can be used for approximate inference in Gaussian process models. Furthermore we present a case study describing its implementation for a new robust variant of Gaussian process regression. To gain further insights into the quality of the EP approximation we present experiments in which we compare to results obtained by Markov chain Monte Carlo (MCMC) sampling.

PDF [BibTex]

PDF [BibTex]


no image
Global image statistics of natural scenes

Drewes, J., Wichmann, F., Gegenfurtner, K.

Bioinspired Information Processing, 08, pages: 1, 2005 (poster)

[BibTex]

[BibTex]


no image
Graph Kernels for Chemical Informatics

Ralaivola, L., Swamidass, J., Saigo, H., Baldi, P.

Neural Networks, 18(8):1093-1110, 2005 (article)

Abstract
Increased availability of large repositories of chemical compounds is creating new challenges and opportunities for the application of machine learning methods to problems in computational chemistry and chemical informatics. Because chemical compounds are often represented by the graph of their covalent bonds, machine learning methods in this domain must be capable of processing graphical structures with variable size. Here we first briefly review the literature on graph kernels and then introduce three new kernels (Tanimoto, MinMax, Hybrid) based on the idea of molecular fingerprints and counting labeled paths of depth up to d using depthfirst search from each possible vertex. The kernels are applied to three classification problems to predict mutagenicity, toxicity, and anti-cancer activity on three publicly available data sets. The kernels achieve performances at least comparable, and most often superior, to those previously reported in the literature reaching accuracies of 91.5% on the Mutag dataset, 65-67% on the PTC (Predictive Toxicology Challenge) dataset, and 72% on the NCI (National Cancer Institute) dataset. Properties and tradeoffs of these kernels, as well as other proposed kernels that leverage 1D or 3D representations of molecules, are briefly discussed.

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Extended Gaussianization Method for Blind Separation of Post-Nonlinear Mixtures

Zhang, K., Chan, L.

Neural Computation, 17(2):425-452, 2005 (article)

Abstract
The linear mixture model has been investigated in most articles tackling the problem of blind source separation. Recently, several articles have addressed a more complex model: blind source separation (BSS) of post-nonlinear (PNL) mixtures. These mixtures are assumed to be generated by applying an unknown invertible nonlinear distortion to linear instantaneous mixtures of some independent sources. The gaussianization technique for BSS of PNL mixtures emerged based on the assumption that the distribution of the linear mixture of independent sources is gaussian. In this letter, we review the gaussianization method and then extend it to apply to PNL mixture in which the linear mixture is close to gaussian. Our proposed method approximates the linear mixture using the Cornish-Fisher expansion. We choose the mutual information as the independence measurement to develop a learning algorithm to separate PNL mixtures. This method provides better applicability and accuracy. We then discuss the sufficient condition for the method to be valid. The characteristics of the nonlinearity do not affect the performance of this method. With only a few parameters to tune, our algorithm has a comparatively low computation. Finally, we present experiments to illustrate the efficiency of our method.

Web DOI [BibTex]


no image
Theory of Classification: A Survey of Some Recent Advances

Boucheron, S., Bousquet, O., Lugosi, G.

ESAIM: Probability and Statistics, 9, pages: 323 , 2005 (article)

Abstract
The last few years have witnessed important new developments in the theory and practice of pattern classification. We intend to survey some of the main new ideas that have lead to these important recent developments.

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
From Graphs to Manifolds - Weak and Strong Pointwise Consistency of Graph Laplacians

Hein, M., Audibert, J., von Luxburg, U.

In Proceedings of the 18th Conference on Learning Theory (COLT), pages: 470-485, Conference on Learning Theory, 2005, Student Paper Award (inproceedings)

Abstract
In the machine learning community it is generally believed that graph Laplacians corresponding to a finite sample of data points converge to a continuous Laplace operator if the sample size increases. Even though this assertion serves as a justification for many Laplacian-based algorithms, so far only some aspects of this claim have been rigorously proved. In this paper we close this gap by establishing the strong pointwise consistency of a family of graph Laplacians with data-dependent weights to some weighted Laplace operator. Our investigation also includes the important case where the data lies on a submanifold of $R^d$.

PDF [BibTex]

PDF [BibTex]


no image
Propagating Distributions on a Hypergraph by Dual Information Regularization

Tsuda, K.

In Proceedings of the 22nd International Conference on Machine Learning, pages: 921 , (Editors: De Raedt, L. , S. Wrobel), ICML Bonn, 2005 (inproceedings)

Abstract
In the information regularization framework by Corduneanu and Jaakkola (2005), the distributions of labels are propagated on a hypergraph for semi-supervised learning. The learning is efficiently done by a Blahut-Arimoto-like two step algorithm, but, unfortunately, one of the steps cannot be solved in a closed form. In this paper, we propose a dual version of information regularization, which is considered as more natural in terms of information geometry. Our learning algorithm has two steps, each of which can be solved in a closed form. Also it can be naturally applied to exponential family distributions such as Gaussians. In experiments, our algorithm is applied to protein classification based on a metabolic network and known functional categories.

[BibTex]

[BibTex]


no image
Support Vector Machines and Kernel Algorithms

Schölkopf, B., Smola, A.

In Encyclopedia of Biostatistics (2nd edition), Vol. 8, 8, pages: 5328-5335, (Editors: P Armitage and T Colton), John Wiley & Sons, NY USA, 2005 (inbook)

[BibTex]

[BibTex]


no image
Moment Inequalities for Functions of Independent Random Variables

Boucheron, S., Bousquet, O., Lugosi, G., Massart, P.

To appear in Annals of Probability, 33, pages: 514-560, 2005 (article)

Abstract
A general method for obtaining moment inequalities for functions of independent random variables is presented. It is a generalization of the entropy method which has been used to derive concentration inequalities for such functions cite{BoLuMa01}, and is based on a generalized tensorization inequality due to Lata{l}a and Oleszkiewicz cite{LaOl00}. The new inequalities prove to be a versatile tool in a wide range of applications. We illustrate the power of the method by showing how it can be used to effortlessly re-derive classical inequalities including Rosenthal and Kahane-Khinchine-type inequalities for sums of independent random variables, moment inequalities for suprema of empirical processes, and moment inequalities for Rademacher chaos and $U$-statistics. Some of these corollaries are apparently new. In particular, we generalize Talagrands exponential inequality for Rademacher chaos of order two to any order. We also discuss applications for other complex functions of independent random variables, such as suprema of boolean polynomials which include, as special cases, subgraph counting problems in random graphs.

PDF [BibTex]

PDF [BibTex]


no image
A Brain Computer Interface with Online Feedback based on Magnetoencephalography

Lal, T., Schröder, M., Hill, J., Preissl, H., Hinterberger, T., Mellinger, J., Bogdan, M., Rosenstiel, W., Hofmann, T., Birbaumer, N., Schölkopf, B.

In Proceedings of the 22nd International Conference on Machine Learning, pages: 465-472, (Editors: L De Raedt and S Wrobel), ACM, New York, NY, USA, ICML, 2005 (inproceedings)

Abstract
The aim of this paper is to show that machine learning techniques can be used to derive a classifying function for human brain signal data measured by magnetoencephalography (MEG), for the use in a brain computer interface (BCI). This is especially helpful for evaluating quickly whether a BCI approach based on electroencephalography, on which training may be slower due to lower signalto- noise ratio, is likely to succeed. We apply recursive channel elimination and regularized SVMs to the experimental data of ten healthy subjects performing a motor imagery task. Four subjects were able to use a trained classifier together with a decision tree interface to write a short name. Further analysis gives evidence that the proposed imagination task is suboptimal for the possible extension to a multiclass interface. To the best of our knowledge this paper is the first working online BCI based on MEG recordings and is therefore a “proof of concept”.

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Healing the Relevance Vector Machine through Augmentation

Rasmussen, CE., Candela, JQ.

In Proceedings of the 22nd International Conference on Machine Learning, pages: 689 , (Editors: De Raedt, L. , S. Wrobel), ICML, 2005 (inproceedings)

Abstract
The Relevance Vector Machine (RVM) is a sparse approximate Bayesian kernel method. It provides full predictive distributions for test cases. However, the predictive uncertainties have the unintuitive property, that emph{they get smaller the further you move away from the training cases}. We give a thorough analysis. Inspired by the analogy to non-degenerate Gaussian Processes, we suggest augmentation to solve the problem. The purpose of the resulting model, RVM*, is primarily to corroborate the theoretical and experimental analysis. Although RVM* could be used in practical applications, it is no longer a truly sparse model. Experiments show that sparsity comes at the expense of worse predictive distributions.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Visual perception I: Basic principles

Wagemans, J., Wichmann, F., de Beeck, H.

In Handbook of Cognition, pages: 3-47, (Editors: Lamberts, K. , R. Goldstone), Sage, London, 2005 (inbook)

[BibTex]

[BibTex]


no image
Maximum-Margin Feature Combination for Detection and Categorization

BakIr, G., Wu, M., Eichhorn, J.

Max Planck Institute for Biological Cybernetics, Tübingen, Germany, 2005 (techreport)

Abstract
In this paper we are concerned with the optimal combination of features of possibly different types for detection and estimation tasks in machine vision. We propose to combine features such that the resulting classifier maximizes the margin between classes. In contrast to existing approaches which are non-convex and/or generative we propose to use a discriminative model leading to convex problem formulation and complexity control. Furthermore we assert that decision functions should not compare apples and oranges by comparing features of different types directly. Instead we propose to combine different similarity measures for each different feature type. Furthermore we argue that the question: ”Which feature type is more discriminative for task X?” is ill-posed and show empirically that the answer to this question might depend on the complexity of the decision function.

PDF [BibTex]

PDF [BibTex]


no image
Kernel-Methods, Similarity, and Exemplar Theories of Categorization

Jäkel, F., Wichmann, F.

ASIC, 4, 2005 (poster)

Abstract
Kernel-methods are popular tools in machine learning and statistics that can be implemented in a simple feed-forward neural network. They have strong connections to several psychological theories. For example, Shepard‘s universal law of generalization can be given a kernel interpretation. This leads to an inner product and a metric on the psychological space that is different from the usual Minkowski norm. The metric has psychologically interesting properties: It is bounded from above and does not have additive segments. As categorization models often rely on Shepard‘s law as a model for psychological similarity some of them can be recast as kernel-methods. In particular, ALCOVE is shown to be closely related to kernel logistic regression. The relationship to the Generalized Context Model is also discussed. It is argued that functional analysis which is routinely used in machine learning provides valuable insights also for psychology.

Web [BibTex]


no image
Rapid animal detection in natural scenes: critical features are local

Wichmann, F., Rosas, P., Gegenfurtner, K.

Experimentelle Psychologie. Beitr{\"a}ge zur 47. Tagung experimentell arbeitender Psychologen, 47, pages: 225, 2005 (poster)

[BibTex]

[BibTex]


no image
A novel representation of protein sequences for prediction of subcellular location using support vector machines

Matsuda, S., Vert, J., Saigo, H., Ueda, N., Toh, H., Akutsu, T.

Protein Science, 14, pages: 2804-2813, 2005 (article)

Abstract
As the number of complete genomes rapidly increases, accurate methods to automatically predict the subcellular location of proteins are increasingly useful to help their functional annotation. In order to improve the predictive accuracy of the many prediction methods developed to date, a novel representation of protein sequences is proposed. This representation involves local compositions of amino acids and twin amino acids, and local frequencies of distance between successive (basic, hydrophobic, and other) amino acids. For calculating the local features, each sequence is split into three parts: N-terminal, middle, and C-terminal. The N-terminal part is further divided into four regions to consider ambiguity in the length and position of signal sequences. We tested this representation with support vector machines on two data sets extracted from the SWISS-PROT database. Through fivefold cross-validation tests, overall accuracies of more than 87% and 91% were obtained for eukaryotic and prokaryotic proteins, respectively. It is concluded that considering the respective features in the N-terminal, middle, and C-terminal parts is helpful to predict the subcellular location. Keywords: subcellular location; signal sequence; amino acid composition; distance frequency; support vector machine; predictive accuracy

Web DOI [BibTex]

Web DOI [BibTex]


no image
Long Term Prediction of Product Quality in a Glass Manufacturing Process Using a Kernel Based Approach

Jung, T., Herrera, L., Schölkopf, B.

In Proceedings of the 8th International Work-Conferenceon Artificial Neural Networks (Computational Intelligence and Bioinspired Systems), Lecture Notes in Computer Science, Vol. 3512, LNCS 3512, pages: 960-967, (Editors: J Cabestany and A Prieto and F Sandoval), Springer, Berlin Heidelberg, Germany, IWANN, 2005 (inproceedings)

Abstract
In this paper we report the results obtained using a kernel-based approach to predict the temporal development of four response signals in the process control of a glass melting tank with 16 input parameters. The data set is a revised version1 from the modelling challenge in EUNITE-2003. The central difficulties are: large time-delays between changes in the inputs and the outputs, large number of data, and a general lack of knowledge about the relevant variables that intervene in the process. The methodology proposed here comprises Support Vector Machines (SVM) and Regularization Networks (RN). We use the idea of sparse approximation both as a means of regularization and as a means of reducing the computational complexity. Furthermore, we will use an incremental approach to add new training examples to the kernel-based method and efficiently update the current solution. This allows us to use a sophisticated learning scheme, where we iterate between prediction and training, with good computational efficiency and satisfactory results.

DOI [BibTex]

DOI [BibTex]


no image
Object correspondence as a machine learning problem

Schölkopf, B., Steinke, F., Blanz, V.

In Proceedings of the 22nd International Conference on Machine Learning, pages: 777-784, (Editors: L De Raedt and S Wrobel), ACM, New York, NY, USA, ICML, 2005 (inproceedings)

Abstract
We propose machine learning methods for the estimation of deformation fields that transform two given objects into each other, thereby establishing a dense point to point correspondence. The fields are computed using a modified support vector machine containing a penalty enforcing that points of one object will be mapped to ``similar‘‘ points on the other one. Our system, which contains little engineering or domain knowledge, delivers state of the art performance. We present application results including close to photorealistic morphs of 3D head models.

PDF [BibTex]

PDF [BibTex]


no image
Towards a Statistical Theory of Clustering. Presented at the PASCAL workshop on clustering, London

von Luxburg, U., Ben-David, S.

Presented at the PASCAL workshop on clustering, London, 2005 (techreport)

Abstract
The goal of this paper is to discuss statistical aspects of clustering in a framework where the data to be clustered has been sampled from some unknown probability distribution. Firstly, the clustering of the data set should reveal some structure of the underlying data rather than model artifacts due to the random sampling process. Secondly, the more sample points we have, the more reliable the clustering should be. We discuss which methods can and cannot be used to tackle those problems. In particular we argue that generalization bounds as they are used in statistical learning theory of classification are unsuitable in a general clustering framework. We suggest that the main replacements of generalization bounds should be convergence proofs and stability considerations. This paper should be considered as a road map paper which identifies important questions and potentially fruitful directions for future research about statistical clustering. We do not attempt to present a complete statistical theory of clustering.

PDF [BibTex]

PDF [BibTex]


no image
The human brain as large margin classifier

Graf, A., Wichmann, F., Bülthoff, H., Schölkopf, B.

Proceedings of the Computational & Systems Neuroscience Meeting (COSYNE), 2, pages: 1, 2005 (poster)

[BibTex]

[BibTex]


no image
A tutorial on v-support vector machines

Chen, P., Lin, C., Schölkopf, B.

Applied Stochastic Models in Business and Industry, 21(2):111-136, 2005 (article)

Abstract
We briefly describe the main ideas of statistical learning theory, support vector machines (SVMs), and kernel feature spaces. We place particular emphasis on a description of the so-called -SVM, including details of the algorithm and its implementation, theoretical results, and practical applications. Copyright © 2005 John Wiley & Sons, Ltd.

PDF [BibTex]

PDF [BibTex]


no image
Robust EEG Channel Selection Across Subjects for Brain Computer Interfaces

Schröder, M., Lal, T., Hinterberger, T., Bogdan, M., Hill, J., Birbaumer, N., Rosenstiel, W., Schölkopf, B.

EURASIP Journal on Applied Signal Processing, 2005(19, Special Issue: Trends in Brain Computer Interfaces):3103-3112, (Editors: Vesin, J. M., T. Ebrahimi), 2005 (article)

Abstract
Most EEG-based Brain Computer Interface (BCI) paradigms come along with specific electrode positions, e.g.~for a visual based BCI electrode positions close to the primary visual cortex are used. For new BCI paradigms it is usually not known where task relevant activity can be measured from the scalp. For individual subjects Lal et.~al showed that recording positions can be found without the use of prior knowledge about the paradigm used. However it remains unclear to what extend their method of Recursive Channel Elimination (RCE) can be generalized across subjects. In this paper we transfer channel rankings from a group of subjects to a new subject. For motor imagery tasks the results are promising, although cross-subject channel selection does not quite achieve the performance of channel selection on data of single subjects. Although the RCE method was not provided with prior knowledge about the mental task, channels that are well known to be important (from a physiological point of view) were consistently selected whereas task-irrelevant channels were reliably disregarded.

Web DOI [BibTex]

Web DOI [BibTex]


no image
Implicit Surface Modelling as an Eigenvalue Problem

Walder, C., Chapelle, O., Schölkopf, B.

In Proceedings of the 22nd International Conference on Machine Learning, pages: 937-944, (Editors: L De Raedt and S Wrobel), ACM, New York, NY, USA, ICML, 2005 (inproceedings)

Abstract
We discuss the problem of fitting an implicit shape model to a set of points sampled from a co-dimension one manifold of arbitrary topology. The method solves a non-convex optimisation problem in the embedding function that defines the implicit by way of its zero level set. By assuming that the solution is a mixture of radial basis functions of varying widths we attain the globally optimal solution by way of an equivalent eigenvalue problem, without using or constructing as an intermediate step the normal vectors of the manifold at each data point. We demonstrate the system on two and three dimensional data, with examples of missing data interpolation and set operations on the resultant shapes.

PDF [BibTex]

PDF [BibTex]


no image
Approximate Bayesian Inference for Psychometric Functions using MCMC Sampling

Kuss, M., Jäkel, F., Wichmann, F.

(135), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, 2005 (techreport)

Abstract
In psychophysical studies the psychometric function is used to model the relation between the physical stimulus intensity and the observer's ability to detect or discriminate between stimuli of different intensities. In this report we propose the use of Bayesian inference to extract the information contained in experimental data estimate the parameters of psychometric functions. Since Bayesian inference cannot be performed analytically we describe how a Markov chain Monte Carlo method can be used to generate samples from the posterior distribution over parameters. These samples are used to estimate Bayesian confidence intervals and other characteristics of the posterior distribution. In addition we discuss the parameterisation of psychometric functions and the role of prior distributions in the analysis. The proposed approach is exemplified using artificially generate d data and in a case study for real experimental data. Furthermore, we compare our approach with traditional methods based on maximum-likelihood parameter estimation combined with bootstrap techniques for confidence interval estimation. The appendix provides a description of an implementation for the R environment for statistical computing and provides the code for reproducing the results discussed in the experiment section.

PDF [BibTex]

PDF [BibTex]


no image
Natural Actor-Critic

Peters, J., Vijayakumar, S., Schaal, S.

In Proceedings of the 16th European Conference on Machine Learning, 3720, pages: 280-291, (Editors: Gama, J.;Camacho, R.;Brazdil, P.;Jorge, A.;Torgo, L.), Springer, ECML, 2005, clmc (inproceedings)

Abstract
This paper investigates a novel model-free reinforcement learning architecture, the Natural Actor-Critic. The actor updates are based on stochastic policy gradients employing AmariÕs natural gradient approach, while the critic obtains both the natural policy gradient and additional parameters of a value function simultaneously by linear regres- sion. We show that actor improvements with natural policy gradients are particularly appealing as these are independent of coordinate frame of the chosen policy representation, and can be estimated more efficiently than regular policy gradients. The critic makes use of a special basis function parameterization motivated by the policy-gradient compatible function approximation. We show that several well-known reinforcement learning methods such as the original Actor-Critic and BradtkeÕs Linear Quadratic Q-Learning are in fact Natural Actor-Critic algorithms. Em- pirical evaluations illustrate the effectiveness of our techniques in com- parison to previous methods, and also demonstrate their applicability for learning control on an anthropomorphic robot arm.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Comparative experiments on task space control with redundancy resolution

Nakanishi, J., Cory, R., Mistry, M., Peters, J., Schaal, S.

In Proceedings of the 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages: 3901-3908, Edmonton, Alberta, Canada, Aug. 2-6, IROS, 2005, clmc (inproceedings)

Abstract
Understanding the principles of motor coordination with redundant degrees of freedom still remains a challenging problem, particularly for new research in highly redundant robots like humanoids. Even after more than a decade of research, task space control with redundacy resolution still remains an incompletely understood theoretical topic, and also lacks a larger body of thorough experimental investigation on complex robotic systems. This paper presents our first steps towards the development of a working redundancy resolution algorithm which is robust against modeling errors and unforeseen disturbances arising from contact forces. To gain a better understanding of the pros and cons of different approaches to redundancy resolution, we focus on a comparative empirical evaluation. First, we review several redundancy resolution schemes at the velocity, acceleration and torque levels presented in the literature in a common notational framework and also introduce some new variants of these previous approaches. Second, we present experimental comparisons of these approaches on a seven-degree-of-freedom anthropomorphic robot arm. Surprisingly, one of our simplest algorithms empirically demonstrates the best performance, despite, from a theoretical point, the algorithm does not share the same beauty as some of the other methods. Finally, we discuss practical properties of these control algorithms, particularly in light of inevitable modeling errors of the robot dynamics.

link (url) DOI [BibTex]

link (url) DOI [BibTex]

1999


no image
Some Aspects of Modelling Human Spatial Vision: Contrast Discrimination

Wichmann, F.

University of Oxford, University of Oxford, October 1999 (phdthesis)

[BibTex]

1999

[BibTex]


no image
Engineering Support Vector Machine Kernels That Recognize Translation Initiation Sites in DNA

Zien, A., Rätsch, G., Mika, S., Schölkopf, B., Lemmen, C., Smola, A., Lengauer, T., Müller, K.

In German Conference on Bioinformatics (GCB 1999), October 1999 (inproceedings)

Abstract
In order to extract protein sequences from nucleotide sequences, it is an important step to recognize points from which regions encoding pro­ teins start, the so­called translation initiation sites (TIS). This can be modeled as a classification prob­ lem. We demonstrate the power of support vector machines (SVMs) for this task, and show how to suc­ cessfully incorporate biological prior knowledge by engineering an appropriate kernel function.

Web [BibTex]

Web [BibTex]


no image
Unexpected and anticipated pain: identification of specific brain activations by correlation with reference functions derived form conditioning theory

Ploghaus, A., Clare, S., Wichmann, F., Tracey, I.

29, 29th Annual Meeting of the Society for Neuroscience (Neuroscience), October 1999 (poster)

[BibTex]

[BibTex]


no image
Lernen mit Kernen: Support-Vektor-Methoden zur Analyse hochdimensionaler Daten

Schölkopf, B., Müller, K., Smola, A.

Informatik - Forschung und Entwicklung, 14(3):154-163, September 1999 (article)

Abstract
We describe recent developments and results of statistical learning theory. In the framework of learning from examples, two factors control generalization ability: explaining the training data by a learning machine of a suitable complexity. We describe kernel algorithms in feature spaces as elegant and efficient methods of realizing such machines. Examples thereof are Support Vector Machines (SVM) and Kernel PCA (Principal Component Analysis). More important than any individual example of a kernel algorithm, however, is the insight that any algorithm that can be cast in terms of dot products can be generalized to a nonlinear setting using kernels. Finally, we illustrate the significance of kernel algorithms by briefly describing industrial and academic applications, including ones where we obtained benchmark record results.

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Input space versus feature space in kernel-based methods

Schölkopf, B., Mika, S., Burges, C., Knirsch, P., Müller, K., Rätsch, G., Smola, A.

IEEE Transactions On Neural Networks, 10(5):1000-1017, September 1999 (article)

Abstract
This paper collects some ideas targeted at advancing our understanding of the feature spaces associated with support vector (SV) kernel functions. We first discuss the geometry of feature space. In particular, we review what is known about the shape of the image of input space under the feature space map, and how this influences the capacity of SV methods. Following this, we describe how the metric governing the intrinsic geometry of the mapped surface can be computed in terms of the kernel, using the example of the class of inhomogeneous polynomial kernels, which are often used in SV pattern recognition. We then discuss the connection between feature space and input space by dealing with the question of how one can, given some vector in feature space, find a preimage (exact or approximate) in input space. We describe algorithms to tackle this issue, and show their utility in two applications of kernel methods. First, we use it to reduce the computational complexity of SV decision functions; second, we combine it with the kernel PCA algorithm, thereby constructing a nonlinear statistical denoising technique which is shown to perform well on real-world data.

Web DOI [BibTex]

Web DOI [BibTex]


no image
p73 and p63 are homotetramers capable of weak heterotypic interactions with each other but not with p53.

Davison, T., Vagner, C., Kaghad, M., Ayed, A., Caput, D., CH, ..

Journal of Biological Chemistry, 274(26):18709-18714, June 1999 (article)

Abstract
Mutations in the p53 tumor suppressor gene are the most frequent genetic alterations found in human cancers. Recent identification of two human homologues of p53 has raised the prospect of functional interactions between family members via a conserved oligomerization domain. Here we report in vitro and in vivo analysis of homo- and hetero-oligomerization of p53 and its homologues, p63 and p73. The oligomerization domains of p63 and p73 can independently fold into stable homotetramers, as previously observed for p53. However, the oligomerization domain of p53 does not associate with that of either p73 or p63, even when p53 is in 15-fold excess. On the other hand, the oligomerization domains of p63 and p73 are able to weakly associate with one another in vitro. In vivo co-transfection assays of the ability of p53 and its homologues to activate reporter genes showed that a DNA-binding mutant of p53 was not able to act in a dominant negative manner over wild-type p73 or p63 but that a p73 mutant could inhibit the activity of wild-type p63. These data suggest that mutant p53 in cancer cells will not interact with endogenous or exogenous p63 or p73 via their respective oligomerization domains. It also establishes that the multiple isoforms of p63 as well as those of p73 are capable of interacting via their common oligomerization domain.

Web [BibTex]

Web [BibTex]


no image
Shrinking the tube: a new support vector regression algorithm

Schölkopf, B., Bartlett, P., Smola, A., Williamson, R.

In Advances in Neural Information Processing Systems 11, pages: 330-336 , (Editors: MS Kearns and SA Solla and DA Cohn), MIT Press, Cambridge, MA, USA, 12th Annual Conference on Neural Information Processing Systems (NIPS), June 1999 (inproceedings)

PDF Web [BibTex]

PDF Web [BibTex]


no image
Semiparametric support vector and linear programming machines

Smola, A., Friess, T., Schölkopf, B.

In Advances in Neural Information Processing Systems 11, pages: 585-591 , (Editors: MS Kearns and SA Solla and DA Cohn), MIT Press, Cambridge, MA, USA, Twelfth Annual Conference on Neural Information Processing Systems (NIPS), June 1999 (inproceedings)

Abstract
Semiparametric models are useful tools in the case where domain knowledge exists about the function to be estimated or emphasis is put onto understandability of the model. We extend two learning algorithms - Support Vector machines and Linear Programming machines to this case and give experimental results for SV machines.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Kernel PCA and De-noising in feature spaces

Mika, S., Schölkopf, B., Smola, A., Müller, K., Scholz, M., Rätsch, G.

In Advances in Neural Information Processing Systems 11, pages: 536-542 , (Editors: MS Kearns and SA Solla and DA Cohn), MIT Press, Cambridge, MA, USA, 12th Annual Conference on Neural Information Processing Systems (NIPS), June 1999 (inproceedings)

Abstract
Kernel PCA as a nonlinear feature extractor has proven powerful as a preprocessing step for classification algorithms. But it can also be considered as a natural generalization of linear principal component analysis. This gives rise to the question how to use nonlinear features for data compression, reconstruction, and de-noising, applications common in linear PCA. This is a nontrivial task, as the results provided by kernel PCA live in some high dimensional feature space and need not have pre-images in input space. This work presents ideas for finding approximate pre-images, focusing on Gaussian kernels, and shows experimental results using these pre-images in data reconstruction and de-noising on toy examples as well as on real world data.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Kernel principal component analysis.

Schölkopf, B., Smola, A., Müller, K.

In Advances in Kernel Methods—Support Vector Learning, pages: 327-352, (Editors: B Schölkopf and CJC Burges and AJ Smola), MIT Press, Cambridge, MA, 1999 (inbook)

[BibTex]

[BibTex]


no image
Estimating the support of a high-dimensional distribution

Schölkopf, B., Platt, J., Shawe-Taylor, J., Smola, A., Williamson, R.

(MSR-TR-99-87), Microsoft Research, 1999 (techreport)

Web [BibTex]

Web [BibTex]


no image
Single-class Support Vector Machines

Schölkopf, B., Williamson, R., Smola, A., Shawe-Taylor, J.

Dagstuhl-Seminar on Unsupervised Learning, pages: 19-20, (Editors: J. Buhmann, W. Maass, H. Ritter and N. Tishby), 1999 (poster)

[BibTex]

[BibTex]


no image
Spatial Learning and Localization in Animals: A Computational Model and Its Implications for Mobile Robots

Balakrishnan, K., Bousquet, O., Honavar, V.

Adaptive Behavior, 7(2):173-216, 1999 (article)

[BibTex]


no image
SVMs for Histogram Based Image Classification

Chapelle, O., Haffner, P., Vapnik, V.

IEEE Transactions on Neural Networks, (9), 1999 (article)

Abstract
Traditional classification approaches generalize poorly on image classification tasks, because of the high dimensionality of the feature space. This paper shows that Support Vector Machines (SVM) can generalize well on difficult image classification problems where the only features are high dimensional histograms. Heavy-tailed RBF kernels of the form $K(mathbf{x},mathbf{y})=e^{-rhosum_i |x_i^a-y_i^a|^{b}}$ with $aleq 1$ and $b leq 2$ are evaluated on the classification of images extracted from the Corel Stock Photo Collection and shown to far outperform traditional polynomial or Gaussian RBF kernels. Moreover, we observed that a simple remapping of the input $x_i rightarrow x_i^a$ improves the performance of linear SVMs to such an extend that it makes them, for this problem, a valid alternative to RBF kernels.

GZIP [BibTex]

GZIP [BibTex]


no image
Classifying LEP data with support vector algorithms.

Vannerem, P., Müller, K., Smola, A., Schölkopf, B., Söldner-Rembold, S.

In Artificial Intelligence in High Energy Nuclear Physics 99, Artificial Intelligence in High Energy Nuclear Physics 99, 1999 (inproceedings)

[BibTex]

[BibTex]


no image
Generalization Bounds via Eigenvalues of the Gram matrix

Schölkopf, B., Shawe-Taylor, J., Smola, A., Williamson, R.

(99-035), NeuroCOLT, 1999 (techreport)

[BibTex]

[BibTex]


no image
Pedestal effects with periodic pulse trains

Henning, G., Wichmann, F.

Perception, 28, pages: S137, 1999 (poster)

Abstract
It is important to know for theoretical reasons how performance varies with stimulus contrast. But, for objects on CRT displays, retinal contrast is limited by the linear range of the display and the modulation transfer function of the eye. For example, with an 8 c/deg sinusoidal grating at 90% contrast, the contrast of the retinal image is barely 45%; more retinal contrast is required, however, to discriminate among theories of contrast discrimination (Wichmann, Henning and Ploghaus, 1998). The stimulus with the greatest contrast at any spatial-frequency component is a periodic pulse train which has 200% contrast at every harmonic. Such a waveform cannot, of course, be produced; the best we can do with our Mitsubishi display provides a contrast of 150% at an 8-c/deg fundamental thus producing a retinal image with about 75% contrast. The penalty of using this stimulus is that the 2nd harmonic of the retinal image also has high contrast (with an emmetropic eye, more than 60% of the contrast of the 8-c/deg fundamental ) and the mean luminance is not large (24.5 cd/m2 on our display). We have used standard 2-AFC experiments to measure the detectability of an 8-c/deg pulse train against the background of an identical pulse train of different contrasts. An unusually large improvement in detetectability was measured, the pedestal effect or "dipper," and the dipper was unusually broad. The implications of these results will be discussed.

[BibTex]

[BibTex]


no image
Apprentissage Automatique et Simplicite

Bousquet, O.

Biologische Kybernetik, 1999, In french (diplomathesis)

PostScript [BibTex]

PostScript [BibTex]


no image
Classification on proximity data with LP-machines

Graepel, T., Herbrich, R., Schölkopf, B., Smola, A., Bartlett, P., Müller, K., Obermayer, K., Williamson, R.

In Artificial Neural Networks, 1999. ICANN 99, 470, pages: 304-309, Conference Publications , IEEE, 9th International Conference on Artificial Neural Networks, 1999 (inproceedings)

DOI [BibTex]

DOI [BibTex]