Header logo is ei


2003


no image
The Kernel Mutual Information

Gretton, A., Herbrich, R., Smola, A.

In IEEE ICASSP Vol. 4, pages: 880-883, IEEE ICASSP, April 2003 (inproceedings)

Abstract
We introduce a new contrast function, the kernel mutual information (KMI), to measure the degree of independence of continuous random variables. This contrast function provides an approximate upper bound on the mutual information, as measured near independence, and is based on a kernel density estimate of the mutual information between a discretised approximation of the continuous random variables. We show that Bach and Jordan‘s kernel generalised variance (KGV) is also an upper bound on the same kernel density estimate, but is looser. Finally, we suggest that the addition of a regularising term in the KGV causes it to approach the KMI, which motivates the introduction of this regularisation.

PostScript [BibTex]

2003

PostScript [BibTex]


no image
Analysing ICA component by injection noise

Harmeling, S., Meinecke, F., Müller, K.

In ICA 2003, pages: 149-154, (Editors: Amari, S.-I. , A. Cichocki, S. Makino, N. Murata), 4th International Symposium on Independent Component Analysis and Blind Signal Separation, April 2003 (inproceedings)

Abstract
Usually, noise is considered to be destructive. We present a new method that constructively injects noise to assess the reliability and the group structure of empirical ICA components. Simulations show that the true root-mean squared angle distances between the real sources and some source estimates can be approximated by our method. In a toy experiment, we see that we are also able to reveal the underlying group structure of extracted ICA components. Furthermore, an experiment with fetal ECG data demonstrates that our approach is useful for exploratory data analysis of real-world data.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Hierarchical Spatio-Temporal Morphable Models for Representation of complex movements for Imitation Learning

Ilg, W., Bakir, GH., Franz, MO., Giese, M.

In 11th International Conference on Advanced Robotics, (2):453-458, (Editors: Nunes, U., A. de Almeida, A. Bejczy, K. Kosuge and J.A.T. Machado), 11th International Conference on Advanced Robotics, January 2003 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Hyperkernels

Ong, CS., Smola, AJ., Williamson, RC.

In pages: 495-502, 2003 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Feature Selection for Support Vector Machines by Means of Genetic Algorithms

Fröhlich, H., Chapelle, O., Schölkopf, B.

In 15th IEEE International Conference on Tools with AI, pages: 142-148, 15th IEEE International Conference on Tools with AI, 2003 (inproceedings)

[BibTex]

[BibTex]


no image
Propagation of Uncertainty in Bayesian Kernel Models - Application to Multiple-Step Ahead Forecasting

Quiñonero-Candela, J., Girard, A., Larsen, J., Rasmussen, CE.

In IEEE International Conference on Acoustics, Speech and Signal Processing, 2, pages: 701-704, IEEE International Conference on Acoustics, Speech and Signal Processing, 2003 (inproceedings)

Abstract
The object of Bayesian modelling is the predictive distribution, which in a forecasting scenario enables improved estimates of forecasted values and their uncertainties. In this paper we focus on reliably estimating the predictive mean and variance of forecasted values using Bayesian kernel based models such as the Gaussian Process and the Relevance Vector Machine. We derive novel analytic expressions for the predictive mean and variance for Gaussian kernel shapes under the assumption of a Gaussian input distribution in the static case, and of a recursive Gaussian predictive density in iterative forecasting. The capability of the method is demonstrated for forecasting of time-series and compared to approximate methods.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Unsupervised Clustering of Images using their Joint Segmentation

Seldin, Y., Starik, S., Werman, M.

In The 3rd International Workshop on Statistical and Computational Theories of Vision (SCTV 2003), pages: 1-24, 3rd International Workshop on Statistical and Computational Theories of Vision (SCTV), 2003 (inproceedings)

PDF Web [BibTex]

PDF Web [BibTex]


no image
Support Vector Machines

Schölkopf, B., Smola, A.

In Handbook of Brain Theory and Neural Networks (2nd edition), pages: 1119-1125, (Editors: MA Arbib), MIT Press, Cambridge, MA, USA, 2003 (inbook)

[BibTex]

[BibTex]


no image
Kernel Methods and Their Applications to Signal Processing

Bousquet, O., Perez-Cruz, F.

In Proceedings. (ICASSP ‘03), Special Session on Kernel Methods, pages: 860 , ICASSP, 2003 (inproceedings)

Abstract
Recently introduced in Machine Learning, the notion of kernels has drawn a lot of interest as it allows to obtain non-linear algorithms from linear ones in a simple and elegant manner. This, in conjunction with the introduction of new linear classification methods such as the Support Vector Machines has produced significant progress. The successes of such algorithms is now spreading as they are applied to more and more domains. Many Signal Processing problems, by their non-linear and high-dimensional nature may benefit from such techniques. We give an overview of kernel methods and their recent applications.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Predictive control with Gaussian process models

Kocijan, J., Murray-Smith, R., Rasmussen, CE., Likar, B.

In Proceedings of IEEE Region 8 Eurocon 2003: Computer as a Tool, pages: 352-356, (Editors: Zajc, B. and M. Tkal), Proceedings of IEEE Region 8 Eurocon: Computer as a Tool, 2003 (inproceedings)

Abstract
This paper describes model-based predictive control based on Gaussian processes.Gaussian process models provide a probabilistic non-parametric modelling approach for black-box identification of non-linear dynamic systems. It offers more insight in variance of obtained model response, as well as fewer parameters to determine than other models. The Gaussian processes can highlight areas of the input space where prediction quality is poor, due to the lack of data or its complexity, by indicating the higher variance around the predicted mean. This property is used in predictive control, where optimisation of control signal takes the variance information into account. The predictive control principle is demonstrated on a simulated example of nonlinear system.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Extension of the nu-SVM range for classification

Perez-Cruz, F., Weston, J., Herrmann, D., Schölkopf, B.

In Advances in Learning Theory: Methods, Models and Applications, NATO Science Series III: Computer and Systems Sciences, Vol. 190, 190, pages: 179-196, NATO Science Series III: Computer and Systems Sciences, (Editors: J Suykens and G Horvath and S Basu and C Micchelli and J Vandewalle), IOS Press, Amsterdam, 2003 (inbook)

[BibTex]

[BibTex]


no image
Distance-based classification with Lipschitz functions

von Luxburg, U., Bousquet, O.

In Learning Theory and Kernel Machines, Proceedings of the 16th Annual Conference on Computational Learning Theory, pages: 314-328, (Editors: Schölkopf, B. and M.K. Warmuth), Learning Theory and Kernel Machines, Proceedings of the 16th Annual Conference on Computational Learning Theory, 2003 (inproceedings)

Abstract
The goal of this article is to develop a framework for large margin classification in metric spaces. We want to find a generalization of linear decision functions for metric spaces and define a corresponding notion of margin such that the decision function separates the training points with a large margin. It will turn out that using Lipschitz functions as decision functions, the inverse of the Lipschitz constant can be interpreted as the size of a margin. In order to construct a clean mathematical setup we isometrically embed the given metric space into a Banach space and the space of Lipschitz functions into its dual space. Our approach leads to a general large margin algorithm for classification in metric spaces. To analyze this algorithm, we first prove a representer theorem. It states that there exists a solution which can be expressed as linear combination of distances to sets of training points. Then we analyze the Rademacher complexity of some Lipschitz function classes. The generality of the Lipschitz approach can be seen from the fact that several well-known algorithms are special cases of the Lipschitz algorithm, among them the support vector machine, the linear programming machine, and the 1-nearest neighbor classifier.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
An Introduction to Support Vector Machines

Schölkopf, B.

In Recent Advances and Trends in Nonparametric Statistics , pages: 3-17, (Editors: MG Akritas and DN Politis), Elsevier, Amsterdam, The Netherlands, 2003 (inbook)

Web DOI [BibTex]

Web DOI [BibTex]


no image
Statistical Learning and Kernel Methods in Bioinformatics

Schölkopf, B., Guyon, I., Weston, J.

In Artificial Intelligence and Heuristic Methods in Bioinformatics, 183, pages: 1-21, 3, (Editors: P Frasconi und R Shamir), IOS Press, Amsterdam, The Netherlands, 2003 (inbook)

[BibTex]

[BibTex]


no image
Semi-Supervised Learning through Principal Directions Estimation

Chapelle, O., Schölkopf, B., Weston, J.

In ICML Workshop, The Continuum from Labeled to Unlabeled Data in Machine Learning & Data Mining, pages: 7, ICML Workshop: The Continuum from Labeled to Unlabeled Data in Machine Learning & Data Mining, 2003 (inproceedings)

Abstract
We describe methods for taking into account unlabeled data in the training of a kernel-based classifier, such as a Support Vector Machines (SVM). We propose two approaches utilizing unlabeled points in the vicinity of labeled ones. Both of the approaches effectively modify the metric of the pattern space, either by using non-spherical Gaussian density estimates which are determined using EM, or by modifying the kernel function using displacement vectors computed from pairs of unlabeled and labeled points. The latter is linked to techniques for training invariant SVMs. We present experimental results indicating that the proposed technique can lead to substantial improvements of classification accuracy.

PostScript [BibTex]

PostScript [BibTex]


no image
Statistical Learning and Kernel Methods

Navia-Vázquez, A., Schölkopf, B.

In Adaptivity and Learning—An Interdisciplinary Debate, pages: 161-186, (Editors: R.Kühn and R Menzel and W Menzel and U Ratsch and MM Richter and I-O Stamatescu), Springer, Berlin, Heidelberg, Germany, 2003 (inbook)

[BibTex]

[BibTex]


no image
A Short Introduction to Learning with Kernels

Schölkopf, B., Smola, A.

In Proceedings of the Machine Learning Summer School, Lecture Notes in Artificial Intelligence, Vol. 2600, pages: 41-64, LNAI 2600, (Editors: S Mendelson and AJ Smola), Springer, Berlin, Heidelberg, Germany, 2003 (inbook)

[BibTex]

[BibTex]


no image
Bayesian Kernel Methods

Smola, A., Schölkopf, B.

In Advanced Lectures on Machine Learning, Machine Learning Summer School 2002, Lecture Notes in Computer Science, Vol. 2600, LNAI 2600, pages: 65-117, 0, (Editors: S Mendelson and AJ Smola), Springer, Berlin, Germany, 2003 (inbook)

DOI [BibTex]

DOI [BibTex]


no image
Machine Learning with Hyperkernels

Ong, CS., Smola, AJ.

In pages: 568-575, 2003 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Gaussian Processes to Speed up Hybrid Monte Carlo for Expensive Bayesian Integrals

Rasmussen, CE.

In Bayesian Statistics 7, pages: 651-659, (Editors: J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West), Bayesian Statistics 7, 2003 (inproceedings)

Abstract
Hybrid Monte Carlo (HMC) is often the method of choice for computing Bayesian integrals that are not analytically tractable. However the success of this method may require a very large number of evaluations of the (un-normalized) posterior and its partial derivatives. In situations where the posterior is computationally costly to evaluate, this may lead to an unacceptable computational load for HMC. I propose to use a Gaussian Process model of the (log of the) posterior for most of the computations required by HMC. Within this scheme only occasional evaluation of the actual posterior is required to guarantee that the samples generated have exactly the desired distribution, even if the GP model is somewhat inaccurate. The method is demonstrated on a 10 dimensional problem, where 200 evaluations suffice for the generation of 100 roughly independent points from the posterior. Thus, the proposed scheme allows Bayesian treatment of models with posteriors that are computationally demanding, such as models involving computer simulation.

PDF PostScript Web [BibTex]

PDF PostScript Web [BibTex]


no image
Dimension Reduction Based on Orthogonality — a Decorrelation Method in ICA

Zhang, K., Chan, L.

In Artificial Neural Networks and Neural Information Processing - ICANN/ICONIP 2003, pages: 132-139, (Editors: O Kaynak and E Alpaydin and E Oja and L Xu), Springer, Berlin, Germany, International Conference on Artificial Neural Networks and International Conference on Neural Information Processing, ICANN/ICONIP, 2003, Lecture Notes in Computer Science, Volume 2714 (inproceedings)

Web DOI [BibTex]

Web DOI [BibTex]


no image
Stability of ensembles of kernel machines

Elisseeff, A., Pontil, M.

In 190, pages: 111-124, NATO Science Series III: Computer and Systems Science, (Editors: Suykens, J., G. Horvath, S. Basu, C. Micchelli and J. Vandewalle), IOS press, Netherlands, 2003 (inbook)

[BibTex]

[BibTex]

1999


no image
Engineering Support Vector Machine Kernels That Recognize Translation Initiation Sites in DNA

Zien, A., Rätsch, G., Mika, S., Schölkopf, B., Lemmen, C., Smola, A., Lengauer, T., Müller, K.

In German Conference on Bioinformatics (GCB 1999), October 1999 (inproceedings)

Abstract
In order to extract protein sequences from nucleotide sequences, it is an important step to recognize points from which regions encoding pro­ teins start, the so­called translation initiation sites (TIS). This can be modeled as a classification prob­ lem. We demonstrate the power of support vector machines (SVMs) for this task, and show how to suc­ cessfully incorporate biological prior knowledge by engineering an appropriate kernel function.

Web [BibTex]

1999

Web [BibTex]


no image
Shrinking the tube: a new support vector regression algorithm

Schölkopf, B., Bartlett, P., Smola, A., Williamson, R.

In Advances in Neural Information Processing Systems 11, pages: 330-336 , (Editors: MS Kearns and SA Solla and DA Cohn), MIT Press, Cambridge, MA, USA, 12th Annual Conference on Neural Information Processing Systems (NIPS), June 1999 (inproceedings)

PDF Web [BibTex]

PDF Web [BibTex]


no image
Semiparametric support vector and linear programming machines

Smola, A., Friess, T., Schölkopf, B.

In Advances in Neural Information Processing Systems 11, pages: 585-591 , (Editors: MS Kearns and SA Solla and DA Cohn), MIT Press, Cambridge, MA, USA, Twelfth Annual Conference on Neural Information Processing Systems (NIPS), June 1999 (inproceedings)

Abstract
Semiparametric models are useful tools in the case where domain knowledge exists about the function to be estimated or emphasis is put onto understandability of the model. We extend two learning algorithms - Support Vector machines and Linear Programming machines to this case and give experimental results for SV machines.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Kernel PCA and De-noising in feature spaces

Mika, S., Schölkopf, B., Smola, A., Müller, K., Scholz, M., Rätsch, G.

In Advances in Neural Information Processing Systems 11, pages: 536-542 , (Editors: MS Kearns and SA Solla and DA Cohn), MIT Press, Cambridge, MA, USA, 12th Annual Conference on Neural Information Processing Systems (NIPS), June 1999 (inproceedings)

Abstract
Kernel PCA as a nonlinear feature extractor has proven powerful as a preprocessing step for classification algorithms. But it can also be considered as a natural generalization of linear principal component analysis. This gives rise to the question how to use nonlinear features for data compression, reconstruction, and de-noising, applications common in linear PCA. This is a nontrivial task, as the results provided by kernel PCA live in some high dimensional feature space and need not have pre-images in input space. This work presents ideas for finding approximate pre-images, focusing on Gaussian kernels, and shows experimental results using these pre-images in data reconstruction and de-noising on toy examples as well as on real world data.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Kernel principal component analysis.

Schölkopf, B., Smola, A., Müller, K.

In Advances in Kernel Methods—Support Vector Learning, pages: 327-352, (Editors: B Schölkopf and CJC Burges and AJ Smola), MIT Press, Cambridge, MA, 1999 (inbook)

[BibTex]

[BibTex]


no image
Classifying LEP data with support vector algorithms.

Vannerem, P., Müller, K., Smola, A., Schölkopf, B., Söldner-Rembold, S.

In Artificial Intelligence in High Energy Nuclear Physics 99, Artificial Intelligence in High Energy Nuclear Physics 99, 1999 (inproceedings)

[BibTex]

[BibTex]


no image
Classification on proximity data with LP-machines

Graepel, T., Herbrich, R., Schölkopf, B., Smola, A., Bartlett, P., Müller, K., Obermayer, K., Williamson, R.

In Artificial Neural Networks, 1999. ICANN 99, 470, pages: 304-309, Conference Publications , IEEE, 9th International Conference on Artificial Neural Networks, 1999 (inproceedings)

DOI [BibTex]

DOI [BibTex]


no image
Kernel-dependent support vector error bounds

Schölkopf, B., Shawe-Taylor, J., Smola, A., Williamson, R.

In Artificial Neural Networks, 1999. ICANN 99, 470, pages: 103-108 , Conference Publications , IEEE, 9th International Conference on Artificial Neural Networks, 1999 (inproceedings)

DOI [BibTex]

DOI [BibTex]


no image
Linear programs for automatic accuracy control in regression

Smola, A., Schölkopf, B., Rätsch, G.

In Artificial Neural Networks, 1999. ICANN 99, 470, pages: 575-580 , Conference Publications , IEEE, 9th International Conference on Artificial Neural Networks, 1999 (inproceedings)

DOI [BibTex]

DOI [BibTex]


no image
Regularized principal manifolds.

Smola, A., Williamson, R., Mika, S., Schölkopf, B.

In Lecture Notes in Artificial Intelligence, Vol. 1572, 1572, pages: 214-229 , Lecture Notes in Artificial Intelligence, (Editors: P Fischer and H-U Simon), Springer, Berlin, Germany, Computational Learning Theory: 4th European Conference, 1999 (inproceedings)

[BibTex]

[BibTex]


no image
Entropy numbers, operators and support vector kernels.

Williamson, R., Smola, A., Schölkopf, B.

In Lecture Notes in Artificial Intelligence, Vol. 1572, 1572, pages: 285-299, Lecture Notes in Artificial Intelligence, (Editors: P Fischer and H-U Simon), Springer, Berlin, Germany, Computational Learning Theory: 4th European Conference, 1999 (inproceedings)

[BibTex]

[BibTex]


no image
Is the Hippocampus a Kalman Filter?

Bousquet, O., Balakrishnan, K., Honavar, V.

In Proceedings of the Pacific Symposium on Biocomputing, 3, pages: 619-630, Proceedings of the Pacific Symposium on Biocomputing, 1999 (inproceedings)

[BibTex]

[BibTex]


no image
A Comparison of Artificial Neural Networks and Cluster Analysis for Typing Biometrics Authentication

Maisuria, K., Ong, CS., Lai, .

In unknown, pages: 9999-9999, International Joint Conference on Neural Networks, 1999 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Entropy numbers, operators and support vector kernels.

Williamson, R., Smola, A., Schölkopf, B.

In Advances in Kernel Methods - Support Vector Learning, pages: 127-144, (Editors: B Schölkopf and CJC Burges and AJ Smola), MIT Press, Cambridge, MA, 1999 (inbook)

[BibTex]

[BibTex]


no image
Fisher discriminant analysis with kernels

Mika, S., Rätsch, G., Weston, J., Schölkopf, B., Müller, K.

In Proceedings of the 1999 IEEE Signal Processing Society Workshop, 9, pages: 41-48, (Editors: Y-H Hu and J Larsen and E Wilson and S Douglas), IEEE, Neural Networks for Signal Processing IX, 1999 (inproceedings)

DOI [BibTex]

DOI [BibTex]