Header logo is ei


2005


no image
Semi-supervised Learning on Directed Graphs

Zhou, D., Schölkopf, B., Hofmann, T.

In Advances in Neural Information Processing Systems 17, pages: 1633-1640, (Editors: LK Saul and Y Weiss and L Bottou), MIT Press, Cambridge, MA, USA, 18th Annual Conference on Neural Information Processing Systems (NIPS), July 2005 (inproceedings)

Abstract
Given a directed graph in which some of the nodes are labeled, we investigate the question of how to exploit the link structure of the graph to infer the labels of the remaining unlabeled nodes. To that extent we propose a regularization framework for functions defined over nodes of a directed graph that forces the classification function to change slowly on densely linked subgraphs. A powerful, yet computationally simple classification algorithm is derived within the proposed framework. The experimental evaluation on real-world Web classification problems demonstrates encouraging results that validate our approach.

PDF Web [BibTex]

2005

PDF Web [BibTex]


no image
Splines with non positive kernels

Canu, S., Ong, CS., Mary, X.

In 5th International ISAAC Congress, pages: 1-10, (Editors: Begehr, H. G.W., F. Nicolosi), World Scientific, Singapore, 5th International ISAAC Congress, July 2005 (inproceedings)

Abstract
Non parametric regressions methods can be presented in two main clusters. The one of smoothing splines methods requiring positive kernels and the other one known as Nonparametric Kernel Regression allowing the use of non positive kernels such as the Epanechnikov kernel. We propose a generalization of the smoothing spline method to include kernels which are still symmetric but not positive semi definite (they are called indefinite). The general relationship between smoothing spline, Reproducing Kernel Hilbert Spaces and positive kernels no longer exists with indefinite kernel. Instead they are associated with functional spaces called Reproducing Kernel Krein Spaces (RKKS) embedded with an indefinite inner product and thus not directly associated with a norm. Smothing splines in RKKS have many of the interesting properties of splines in RKHS, such as orthogon ality, projection, representer theorem and generalization bounds. We show that smoothing splines can be defined in RKKS as the regularized solution of the interpolation problem. Since no norm is available in a RKKS, Tikhonov regularization cannot be defined. Instead, we proposed to use iterative methods of conjugate gradient type with early stopping as regularization mechanism. Several iterative algorithms were collected which can be used to solve the optimization problems associated with learning in indefinite spaces. Some preliminary experiments with indefinite kernels for spline smoothing are reported revealing the computational efficiency of the approach.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Kernel Methods for Implicit Surface Modeling

Schölkopf, B., Giesen, J., Spalinger, S.

In Advances in Neural Information Processing Systems 17, pages: 1193-1200, (Editors: LK Saul and Y Weiss and L Bottou), MIT Press, Cambridge, MA, USA, 18th Annual Conference on Neural Information Processing Systems (NIPS), July 2005 (inproceedings)

Abstract
We describe methods for computing an implicit model of a hypersurface that is given only by a finite sampling. The methods work by mapping the sample points into a reproducing kernel Hilbert space and then determining regions in terms of hyperplanes.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Machine-Learning Approaches to BCI in Tübingen

Bensch, M., Bogdan, M., Hill, N., Lal, T., Rosenstiel, W., Schölkopf, B., Schröder, M.

Brain-Computer Interface Technology, June 2005, Talk given by NJH. (talk)

[BibTex]

[BibTex]


no image
Learning Motor Primitives with Reinforcement Learning

Peters, J., Schaal, S.

ROBOTICS Workshop on Modular Foundations for Control and Perception, June 2005 (talk)

Web [BibTex]

Web [BibTex]


no image
Combining Local and Global Image Features for Object Class Recognition

Lisin, DA., Mattar, MA., Blaschko, MB., Benfield, MC., Learned-Miller, EG.

In CVPR, pages: 47-47, CVPR, June 2005 (inproceedings)

[BibTex]

[BibTex]


no image
Motor Skill Learning for Humanoid Robots

Peters, J.

First Conference Undergraduate Computer Sciences and Informations Sciences (CS/IS), May 2005 (talk)

[BibTex]

[BibTex]


no image
To apply score function difference based ICA algorithms to high-dimensional data

Zhang, K., Chan, L.

In Proceedings of the 13th European Symposium on Artificial Neural Networks (ESANN 2005), pages: 291-297, 13th European Symposium on Artificial Neural Networks (ESANN), April 2005 (inproceedings)

PDF Web [BibTex]

PDF Web [BibTex]


no image
Joint Regularization

Borgwardt, KM., Guttman, O., Vishwanathan, SVN., Smola, AJ.

In pages: 455-460, (Editors: Verleysen, M.), d-side, Evere, Belgium, 13th European Symposium on Artificial Neural Networks (ESANN), April 2005 (inproceedings)

Abstract
We present a principled method to combine kernels under joint regularization constraints. Central to our method is an extension of the representer theorem for handling multiple joint regularization constraints. Experimental evidence shows the feasibility of our approach.

PDF Web [BibTex]

PDF Web [BibTex]


no image
EEG Source Localization for Brain-Computer-Interfaces

Grosse-Wentrup, M.

In 2nd International IEEE EMBS Conference on Neural Engineering, pages: 128-131, IEEE, 2nd International IEEE EMBS Conference on Neural Engineering, March 2005 (inproceedings)

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Active Learning for Parzen Window Classifier

Chapelle, O.

In AISTATS 2005, pages: 49-56, (Editors: Cowell, R. , Z. Ghahramani), Tenth International Workshop on Artificial Intelligence and Statistics (AI & Statistics), January 2005 (inproceedings)

Abstract
The problem of active learning is approached in this paper by minimizing directly an estimate of the expected test error. The main difficulty in this ``optimal'' strategy is that output probabilities need to be estimated accurately. We suggest here different methods for estimating those efficiently. In this context, the Parzen window classifier is considered because it is both simple and probabilistic. The analysis of experimental results highlights that regularization is a key ingredient for this strategy.

Web [BibTex]

Web [BibTex]


no image
Semi-Supervised Classification by Low Density Separation

Chapelle, O., Zien, A.

In AISTATS 2005, pages: 57-64, (Editors: Cowell, R. , Z. Ghahramani), Tenth International Workshop on Artificial Intelligence and Statistics (AI & Statistics), January 2005 (inproceedings)

Abstract
We believe that the cluster assumption is key to successful semi-supervised learning. Based on this, we propose three semi-supervised algorithms: 1. deriving graph-based distances that emphazise low density regions between clusters, followed by training a standard SVM; 2. optimizing the Transductive SVM objective function, which places the decision boundary in low density regions, by gradient descent; 3. combining the first two to make maximum use of the cluster assumption. We compare with state of the art algorithms and demonstrate superior accuracy for the latter two methods.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Automatic In Situ Identification of Plankton

Blaschko, MB., Holness, G., Mattar, MA., Lisin, D., Utgoff, PE., Hanson, AR., Schultz, H., Riseman, EM., Sieracki, ME., Balch, WM., Tupper, B.

In WACV, pages: 79 , WACV, January 2005 (inproceedings)

[BibTex]

[BibTex]


no image
Kernel Constrained Covariance for Dependence Measurement

Gretton, A., Smola, A., Bousquet, O., Herbrich, R., Belitski, A., Augath, M., Murayama, Y., Pauls, J., Schölkopf, B., Logothetis, N.

In Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics, pages: 112-119, (Editors: R Cowell, R and Z Ghahramani), AISTATS, January 2005 (inproceedings)

Abstract
We discuss reproducing kernel Hilbert space (RKHS)-based measures of statistical dependence, with emphasis on constrained covariance (COCO), a novel criterion to test dependence of random variables. We show that COCO is a test for independence if and only if the associated RKHSs are universal. That said, no independence test exists that can distinguish dependent and independent random variables in all circumstances. Dependent random variables can result in a COCO which is arbitrarily close to zero when the source densities are highly non-smooth. All current kernel-based independence tests share this behaviour. We demonstrate exponential convergence between the population and empirical COCO. Finally, we use COCO as a measure of joint neural activity between voxels in MRI recordings of the macaque monkey, and compare the results to the mutual information and the correlation. We also show the effect of removing breathing artefacts from the MRI recording.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Hilbertian Metrics and Positive Definite Kernels on Probability Measures

Hein, M., Bousquet, O.

In AISTATS 2005, pages: 136-143, (Editors: Cowell, R. , Z. Ghahramani), Tenth International Workshop on Artificial Intelligence and Statistics (AI & Statistics), January 2005 (inproceedings)

Abstract
We investigate the problem of defining Hilbertian metrics resp. positive definite kernels on probability measures, continuing previous work. This type of kernels has shown very good results in text classification and has a wide range of possible applications. In this paper we extend the two-parameter family of Hilbertian metrics of Topsoe such that it now includes all commonly used Hilbertian metrics on probability measures. This allows us to do model selection among these metrics in an elegant and unified way. Second we investigate further our approach to incorporate similarity information of the probability space into the kernel. The analysis provides a better understanding of these kernels and gives in some cases a more efficient way to compute them. Finally we compare all proposed kernels in two text and two image classification problems.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Kernel Constrained Covariance for Dependence Measurement

Gretton, A., Smola, A., Bousquet, O., Herbrich, R., Belitski, A., Augath, M., Murayama, Y., Schölkopf, B., Logothetis, N.

AISTATS, January 2005 (talk)

Abstract
We discuss reproducing kernel Hilbert space (RKHS)-based measures of statistical dependence, with emphasis on constrained covariance (COCO), a novel criterion to test dependence of random variables. We show that COCO is a test for independence if and only if the associated RKHSs are universal. That said, no independence test exists that can distinguish dependent and independent random variables in all circumstances. Dependent random variables can result in a COCO which is arbitrarily close to zero when the source densities are highly non-smooth. All current kernel-based independence tests share this behaviour. We demonstrate exponential convergence between the population and empirical COCO. Finally, we use COCO as a measure of joint neural activity between voxels in MRI recordings of the macaque monkey, and compare the results to the mutual information and the correlation. We also show the effect of removing breathing artefacts from the MRI recording.

PostScript [BibTex]

PostScript [BibTex]


no image
Intrinsic Dimensionality Estimation of Submanifolds in Euclidean space

Hein, M., Audibert, Y.

In Proceedings of the 22nd International Conference on Machine Learning, pages: 289 , (Editors: De Raedt, L. , S. Wrobel), ICML Bonn, 2005 (inproceedings)

Abstract
We present a new method to estimate the intrinsic dimensionality of a submanifold M in Euclidean space from random samples. The method is based on the convergence rates of a certain U-statistic on the manifold. We solve at least partially the question of the choice of the scale of the data. Moreover the proposed method is easy to implement, can handle large data sets and performs very well even for small sample sizes. We compare the proposed method to two standard estimators on several artificial as well as real data sets.

PDF [BibTex]

PDF [BibTex]


no image
Large Scale Genomic Sequence SVM Classifiers

Sonnenburg, S., Rätsch, G., Schölkopf, B.

In Proceedings of the 22nd International Conference on Machine Learning, pages: 849-856, (Editors: L De Raedt and S Wrobel), ACM, New York, NY, USA, ICML, 2005 (inproceedings)

Abstract
In genomic sequence analysis tasks like splice site recognition or promoter identification, large amounts of training sequences are available, and indeed needed to achieve sufficiently high classification performances. In this work we study two recently proposed and successfully used kernels, namely the Spectrum kernel and the Weighted Degree kernel (WD). In particular, we suggest several extensions using Suffix Trees and modi cations of an SMO-like SVM training algorithm in order to accelerate the training of the SVMs and their evaluation on test sequences. Our simulations show that for the spectrum kernel and WD kernel, large scale SVM training can be accelerated by factors of 20 and 4 times, respectively, while using much less memory (e.g. no kernel caching). The evaluation on new sequences is often several thousand times faster using the new techniques (depending on the number of Support Vectors). Our method allows us to train on sets as large as one million sequences.

PDF [BibTex]

PDF [BibTex]


no image
Joint Kernel Maps

Weston, J., Schölkopf, B., Bousquet, O.

In Proceedings of the 8th InternationalWork-Conference on Artificial Neural Networks, LNCS 3512, pages: 176-191, (Editors: J Cabestany and A Prieto and F Sandoval), Springer, Berlin Heidelberg, Germany, IWANN, 2005 (inproceedings)

Abstract
We develop a methodology for solving high dimensional dependency estimation problems between pairs of data types, which is viable in the case where the output of interest has very high dimension, e.g., thousands of dimensions. This is achieved by mapping the objects into continuous or discrete spaces, using joint kernels. Known correlations between input and output can be defined by such kernels, some of which can maintain linearity in the outputs to provide simple (closed form) pre-images. We provide examples of such kernels and empirical results.

PostScript DOI [BibTex]

PostScript DOI [BibTex]


no image
Analysis of Some Methods for Reduced Rank Gaussian Process Regression

Quinonero Candela, J., Rasmussen, C.

In Switching and Learning in Feedback Systems, pages: 98-127, (Editors: Murray Smith, R. , R. Shorten), Springer, Berlin, Germany, European Summer School on Multi-Agent Control, 2005 (inproceedings)

Abstract
While there is strong motivation for using Gaussian Processes (GPs) due to their excellent performance in regression and classification problems, their computational complexity makes them impractical when the size of the training set exceeds a few thousand cases. This has motivated the recent proliferation of a number of cost-effective approximations to GPs, both for classification and for regression. In this paper we analyze one popular approximation to GPs for regression: the reduced rank approximation. While generally GPs are equivalent to infinite linear models, we show that Reduced Rank Gaussian Processes (RRGPs) are equivalent to finite sparse linear models. We also introduce the concept of degenerate GPs and show that they correspond to inappropriate priors. We show how to modify the RRGP to prevent it from being degenerate at test time. Training RRGPs consists both in learning the covariance function hyperparameters and the support set. We propose a method for learning hyperparameters for a given support set. We also review the Sparse Greedy GP (SGGP) approximation (Smola and Bartlett, 2001), which is a way of learning the support set for given hyperparameters based on approximating the posterior. We propose an alternative method to the SGGP that has better generalization capabilities. Finally we make experiments to compare the different ways of training a RRGP. We provide some Matlab code for learning RRGPs.

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
From Graphs to Manifolds - Weak and Strong Pointwise Consistency of Graph Laplacians

Hein, M., Audibert, J., von Luxburg, U.

In Proceedings of the 18th Conference on Learning Theory (COLT), pages: 470-485, Conference on Learning Theory, 2005, Student Paper Award (inproceedings)

Abstract
In the machine learning community it is generally believed that graph Laplacians corresponding to a finite sample of data points converge to a continuous Laplace operator if the sample size increases. Even though this assertion serves as a justification for many Laplacian-based algorithms, so far only some aspects of this claim have been rigorously proved. In this paper we close this gap by establishing the strong pointwise consistency of a family of graph Laplacians with data-dependent weights to some weighted Laplace operator. Our investigation also includes the important case where the data lies on a submanifold of $R^d$.

PDF [BibTex]

PDF [BibTex]


no image
Propagating Distributions on a Hypergraph by Dual Information Regularization

Tsuda, K.

In Proceedings of the 22nd International Conference on Machine Learning, pages: 921 , (Editors: De Raedt, L. , S. Wrobel), ICML Bonn, 2005 (inproceedings)

Abstract
In the information regularization framework by Corduneanu and Jaakkola (2005), the distributions of labels are propagated on a hypergraph for semi-supervised learning. The learning is efficiently done by a Blahut-Arimoto-like two step algorithm, but, unfortunately, one of the steps cannot be solved in a closed form. In this paper, we propose a dual version of information regularization, which is considered as more natural in terms of information geometry. Our learning algorithm has two steps, each of which can be solved in a closed form. Also it can be naturally applied to exponential family distributions such as Gaussians. In experiments, our algorithm is applied to protein classification based on a metabolic network and known functional categories.

[BibTex]

[BibTex]


no image
A Brain Computer Interface with Online Feedback based on Magnetoencephalography

Lal, T., Schröder, M., Hill, J., Preissl, H., Hinterberger, T., Mellinger, J., Bogdan, M., Rosenstiel, W., Hofmann, T., Birbaumer, N., Schölkopf, B.

In Proceedings of the 22nd International Conference on Machine Learning, pages: 465-472, (Editors: L De Raedt and S Wrobel), ACM, New York, NY, USA, ICML, 2005 (inproceedings)

Abstract
The aim of this paper is to show that machine learning techniques can be used to derive a classifying function for human brain signal data measured by magnetoencephalography (MEG), for the use in a brain computer interface (BCI). This is especially helpful for evaluating quickly whether a BCI approach based on electroencephalography, on which training may be slower due to lower signalto- noise ratio, is likely to succeed. We apply recursive channel elimination and regularized SVMs to the experimental data of ten healthy subjects performing a motor imagery task. Four subjects were able to use a trained classifier together with a decision tree interface to write a short name. Further analysis gives evidence that the proposed imagination task is suboptimal for the possible extension to a multiclass interface. To the best of our knowledge this paper is the first working online BCI based on MEG recordings and is therefore a “proof of concept”.

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Healing the Relevance Vector Machine through Augmentation

Rasmussen, CE., Candela, JQ.

In Proceedings of the 22nd International Conference on Machine Learning, pages: 689 , (Editors: De Raedt, L. , S. Wrobel), ICML, 2005 (inproceedings)

Abstract
The Relevance Vector Machine (RVM) is a sparse approximate Bayesian kernel method. It provides full predictive distributions for test cases. However, the predictive uncertainties have the unintuitive property, that emph{they get smaller the further you move away from the training cases}. We give a thorough analysis. Inspired by the analogy to non-degenerate Gaussian Processes, we suggest augmentation to solve the problem. The purpose of the resulting model, RVM*, is primarily to corroborate the theoretical and experimental analysis. Although RVM* could be used in practical applications, it is no longer a truly sparse model. Experiments show that sparsity comes at the expense of worse predictive distributions.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Long Term Prediction of Product Quality in a Glass Manufacturing Process Using a Kernel Based Approach

Jung, T., Herrera, L., Schölkopf, B.

In Proceedings of the 8th International Work-Conferenceon Artificial Neural Networks (Computational Intelligence and Bioinspired Systems), Lecture Notes in Computer Science, Vol. 3512, LNCS 3512, pages: 960-967, (Editors: J Cabestany and A Prieto and F Sandoval), Springer, Berlin Heidelberg, Germany, IWANN, 2005 (inproceedings)

Abstract
In this paper we report the results obtained using a kernel-based approach to predict the temporal development of four response signals in the process control of a glass melting tank with 16 input parameters. The data set is a revised version1 from the modelling challenge in EUNITE-2003. The central difficulties are: large time-delays between changes in the inputs and the outputs, large number of data, and a general lack of knowledge about the relevant variables that intervene in the process. The methodology proposed here comprises Support Vector Machines (SVM) and Regularization Networks (RN). We use the idea of sparse approximation both as a means of regularization and as a means of reducing the computational complexity. Furthermore, we will use an incremental approach to add new training examples to the kernel-based method and efficiently update the current solution. This allows us to use a sophisticated learning scheme, where we iterate between prediction and training, with good computational efficiency and satisfactory results.

DOI [BibTex]

DOI [BibTex]


no image
Object correspondence as a machine learning problem

Schölkopf, B., Steinke, F., Blanz, V.

In Proceedings of the 22nd International Conference on Machine Learning, pages: 777-784, (Editors: L De Raedt and S Wrobel), ACM, New York, NY, USA, ICML, 2005 (inproceedings)

Abstract
We propose machine learning methods for the estimation of deformation fields that transform two given objects into each other, thereby establishing a dense point to point correspondence. The fields are computed using a modified support vector machine containing a penalty enforcing that points of one object will be mapped to ``similar‘‘ points on the other one. Our system, which contains little engineering or domain knowledge, delivers state of the art performance. We present application results including close to photorealistic morphs of 3D head models.

PDF [BibTex]

PDF [BibTex]


no image
Implicit Surface Modelling as an Eigenvalue Problem

Walder, C., Chapelle, O., Schölkopf, B.

In Proceedings of the 22nd International Conference on Machine Learning, pages: 937-944, (Editors: L De Raedt and S Wrobel), ACM, New York, NY, USA, ICML, 2005 (inproceedings)

Abstract
We discuss the problem of fitting an implicit shape model to a set of points sampled from a co-dimension one manifold of arbitrary topology. The method solves a non-convex optimisation problem in the embedding function that defines the implicit by way of its zero level set. By assuming that the solution is a mixture of radial basis functions of varying widths we attain the globally optimal solution by way of an equivalent eigenvalue problem, without using or constructing as an intermediate step the normal vectors of the manifold at each data point. We demonstrate the system on two and three dimensional data, with examples of missing data interpolation and set operations on the resultant shapes.

PDF [BibTex]

PDF [BibTex]


no image
Natural Actor-Critic

Peters, J., Vijayakumar, S., Schaal, S.

In Proceedings of the 16th European Conference on Machine Learning, 3720, pages: 280-291, (Editors: Gama, J.;Camacho, R.;Brazdil, P.;Jorge, A.;Torgo, L.), Springer, ECML, 2005, clmc (inproceedings)

Abstract
This paper investigates a novel model-free reinforcement learning architecture, the Natural Actor-Critic. The actor updates are based on stochastic policy gradients employing AmariÕs natural gradient approach, while the critic obtains both the natural policy gradient and additional parameters of a value function simultaneously by linear regres- sion. We show that actor improvements with natural policy gradients are particularly appealing as these are independent of coordinate frame of the chosen policy representation, and can be estimated more efficiently than regular policy gradients. The critic makes use of a special basis function parameterization motivated by the policy-gradient compatible function approximation. We show that several well-known reinforcement learning methods such as the original Actor-Critic and BradtkeÕs Linear Quadratic Q-Learning are in fact Natural Actor-Critic algorithms. Em- pirical evaluations illustrate the effectiveness of our techniques in com- parison to previous methods, and also demonstrate their applicability for learning control on an anthropomorphic robot arm.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Comparative experiments on task space control with redundancy resolution

Nakanishi, J., Cory, R., Mistry, M., Peters, J., Schaal, S.

In Proceedings of the 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages: 3901-3908, Edmonton, Alberta, Canada, Aug. 2-6, IROS, 2005, clmc (inproceedings)

Abstract
Understanding the principles of motor coordination with redundant degrees of freedom still remains a challenging problem, particularly for new research in highly redundant robots like humanoids. Even after more than a decade of research, task space control with redundacy resolution still remains an incompletely understood theoretical topic, and also lacks a larger body of thorough experimental investigation on complex robotic systems. This paper presents our first steps towards the development of a working redundancy resolution algorithm which is robust against modeling errors and unforeseen disturbances arising from contact forces. To gain a better understanding of the pros and cons of different approaches to redundancy resolution, we focus on a comparative empirical evaluation. First, we review several redundancy resolution schemes at the velocity, acceleration and torque levels presented in the literature in a common notational framework and also introduce some new variants of these previous approaches. Second, we present experimental comparisons of these approaches on a seven-degree-of-freedom anthropomorphic robot arm. Surprisingly, one of our simplest algorithms empirically demonstrates the best performance, despite, from a theoretical point, the algorithm does not share the same beauty as some of the other methods. Finally, we discuss practical properties of these control algorithms, particularly in light of inevitable modeling errors of the robot dynamics.

link (url) DOI [BibTex]

link (url) DOI [BibTex]

2004


no image
Attentional Modulation of Auditory Event-Related Potentials in a Brain-Computer Interface

Hill, J., Lal, T., Bierig, K., Birbaumer, N., Schölkopf, B.

In BioCAS04, (S3/5/INV- S3/17-20):4, IEEE Computer Society, Los Alamitos, CA, USA, 2004 IEEE International Workshop on Biomedical Circuits and Systems, December 2004 (inproceedings)

Abstract
Motivated by the particular problems involved in communicating with "locked-in" paralysed patients, we aim to develop a brain-computer interface that uses auditory stimuli. We describe a paradigm that allows a user to make a binary decision by focusing attention on one of two concurrent auditory stimulus sequences. Using Support Vector Machine classification and Recursive Channel Elimination on the independent components of averaged event-related potentials, we show that an untrained user‘s EEG data can be classified with an encouragingly high level of accuracy. This suggests that it is possible for users to modulate EEG signals in a single trial by the conscious direction of attention, well enough to be useful in BCI.

PDF Web DOI [BibTex]

2004

PDF Web DOI [BibTex]


no image
Discrete vs. Continuous: Two Sides of Machine Learning

Zhou, D.

October 2004 (talk)

Abstract
We consider the problem of transductive inference. In many real-world problems, unlabeled data is far easier to obtain than labeled data. Hence transductive inference is very significant in many practical problems. According to Vapnik's point of view, one should predict the function value only on the given points directly rather than a function defined on the whole space, the latter being a more complicated problem. Inspired by this idea, we develop discrete calculus on finite discrete spaces, and then build discrete regularization. A family of transductive algorithms is naturally derived from this regularization framework. We validate the algorithms on both synthetic and real-world data from text/web categorization to bioinformatics problems. A significant by-product of this work is a powerful way of ranking data based on examples including images, documents, proteins and many other kinds of data. This talk is mainly based on the followiing contribution: (1) D. Zhou and B. Sch{\"o}lkopf: Transductive Inference with Graphs, MPI Technical report, August, 2004; (2) D. Zhou, B. Sch{\"o}lkopf and T. Hofmann. Semi-supervised Learning on Directed Graphs. NIPS 2004; (3) D. Zhou, O. Bousquet, T.N. Lal, J. Weston and B. Sch{\"o}lkopf. Learning with Local and Global Consistency. NIPS 2003.

PDF [BibTex]


no image
Using kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method

Honkela, A., Harmeling, S., Lundqvist, L., Valpola, H.

In ICA 2004, pages: 790-797, (Editors: Puntonet, C. G., A. Prieto), Springer, Berlin, Germany, Fifth International Conference on Independent Component Analysis and Blind Signal Separation, October 2004 (inproceedings)

Abstract
The variational Bayesian nonlinear blind source separation method introduced by Lappalainen and Honkela in 2000 is initialised with linear principal component analysis (PCA). Because of the multilayer perceptron (MLP) network used to model the nonlinearity, the method is susceptible to local minima and therefore sensitive to the initialisation used. As the method is used for nonlinear separation, the linear initialisation may in some cases lead it astray. In this paper we study the use of kernel PCA (KPCA) in the initialisation. KPCA is a rather straightforward generalisation of linear PCA and it is much faster to compute than the variational Bayesian method. The experiments show that it can produce significantly better initialisations than linear PCA. Additionally, the model comparison methods provided by the variational Bayesian framework can be easily applied to compare different kernels.

DOI [BibTex]

DOI [BibTex]


no image
Robust ICA for Super-Gaussian Sources

Meinecke, F., Harmeling, S., Müller, K.

In ICA 2004, pages: 217-224, (Editors: Puntonet, C. G., A. Prieto), Springer, Berlin, Germany, Fifth International Conference on Independent Component Analysis and Blind Signal Separation, October 2004 (inproceedings)

Abstract
Most ICA algorithms are sensitive to outliers. Instead of robustifying existing algorithms by outlier rejection techniques, we show how a simple outlier index can be used directly to solve the ICA problem for super-Gaussian source signals. This ICA method is outlier-robust by construction and can be used for standard ICA as well as for over-complete ICA (i.e. more source signals than observed signals (mixtures)).

DOI [BibTex]

DOI [BibTex]


no image
Modelling Spikes with Mixtures of Factor Analysers

Görür, D., Rasmussen, C., Tolias, A., Sinz, F., Logothetis, N.

In Pattern Recognition, pages: 391-398, LNCS 3175, (Editors: Rasmussen, C. E. , H.H. Bülthoff, B. Schölkopf, M.A. Giese), Springer, Berlin, Germany, 26th DAGM Symposium, September 2004 (inproceedings)

Abstract
Identifying the action potentials of individual neurons from extracellular recordings, known as spike sorting, is a challenging problem. We consider the spike sorting problem using a generative model,mixtures of factor analysers, which concurrently performs clustering and feature extraction. The most important advantage of this method is that it quantifies the certainty with which the spikes are classified. This can be used as a means for evaluating the quality of clustering and therefore spike isolation. Using this method, nearly simultaneously occurring spikes can also be modelled which is a hard task for many of the spike sorting methods. Furthermore, modelling the data with a generative model allows us to generate simulated data.

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Learning Depth From Stereo

Sinz, F., Candela, J., BakIr, G., Rasmussen, C., Franz, M.

In 26th DAGM Symposium, pages: 245-252, LNCS 3175, (Editors: Rasmussen, C. E., H. H. Bülthoff, B. Schölkopf, M. A. Giese), Springer, Berlin, Germany, 26th DAGM Symposium, September 2004 (inproceedings)

Abstract
We compare two approaches to the problem of estimating the depth of a point in space from observing its image position in two different cameras: 1.~The classical photogrammetric approach explicitly models the two cameras and estimates their intrinsic and extrinsic parameters using a tedious calibration procedure; 2.~A generic machine learning approach where the mapping from image to spatial coordinates is directly approximated by a Gaussian Process regression. Our results show that the generic learning approach, in addition to simplifying the procedure of calibration, can lead to higher depth accuracies than classical calibration although no specific domain knowledge is used.

PDF PostScript Web [BibTex]

PDF PostScript Web [BibTex]


no image
Grundlagen von Support Vector Maschinen und Anwendungen in der Bildverarbeitung

Eichhorn, J.

September 2004 (talk)

Abstract
Invited talk at the workshop "Numerical, Statistical and Discrete Methods in Image Processing" at the TU M{\"u}nchen (in GERMAN)

PDF [BibTex]


no image
Stability of Hausdorff-based Distance Measures

Shapiro, MD., Blaschko, MB.

In VIIP, pages: 1-6, VIIP, September 2004 (inproceedings)

[BibTex]

[BibTex]


no image
Learning to Find Graph Pre-Images

BakIr, G., Zien, A., Tsuda, K.

In Pattern Recognition, pages: 253-261, (Editors: Rasmussen, C. E., H. H. Bülthoff, B. Schölkopf, M. A. Giese), Springer, Berlin, Germany, 26th DAGM Symposium, August 2004 (inproceedings)

Abstract
The recent development of graph kernel functions has made it possible to apply well-established machine learning methods to graphs. However, to allow for analyses that yield a graph as a result, it is necessary to solve the so-called pre-image problem: to reconstruct a graph from its feature space representation induced by the kernel. Here, we suggest a practical solution to this problem.

PostScript PDF DOI [BibTex]

PostScript PDF DOI [BibTex]


no image
The benefit of liquid Helium cooling for Cryo-Electron Tomography: A quantitative comparative study

Schweikert, G., Luecken, U., Pfeifer, G., Baumeister, W., Plitzko, J.

The thirteenth European Microscopy Congress, August 2004 (talk)

[BibTex]

[BibTex]


no image
Exponential Families for Conditional Random Fields

Altun, Y., Smola, A., Hofmann, T.

In Proceedings of the 20th Annual Conference on Uncertainty in Artificial Intelligence (UAI 2004), pages: 2-9, (Editors: Chickering, D.M. , J.Y. Halpern), Morgan Kaufmann, San Francisco, CA, USA, 20th Annual Conference on Uncertainty in Artificial Intelligence (UAI), July 2004 (inproceedings)

Abstract
In this paper we define conditional random fields in reproducing kernel Hilbert spaces and show connections to Gaussian Process classification. More specifically, we prove decomposition results for undirected graphical models and we give constructions for kernels. Finally we present efficient means of solving the optimization problem using reduced rank decompositions and we show how stationarity can be exploited efficiently in the optimization process.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Gaussian Process Classification for Segmenting and Annotating Sequences

Altun, Y., Hofmann, T., Smola, A.

In Proceedings of the 21st International Conference on Machine Learning (ICML 2004), pages: 25-32, (Editors: Greiner, R. , D. Schuurmans), ACM Press, New York, USA, 21st International Conference on Machine Learning (ICML), July 2004 (inproceedings)

Abstract
Many real-world classification tasks involve the prediction of multiple, inter-dependent class labels. A prototypical case of this sort deals with prediction of a sequence of labels for a sequence of observations. Such problems arise naturally in the context of annotating and segmenting observation sequences. This paper generalizes Gaussian Process classification to predict multiple labels by taking dependencies between neighboring labels into account. Our approach is motivated by the desire to retain rigorous probabilistic semantics, while overcoming limitations of parametric methods like Conditional Random Fields, which exhibit conceptual and computational difficulties in high-dimensional input spaces. Experiments on named entity recognition and pitch accent prediction tasks demonstrate the competitiveness of our approach.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Learning with Non-Positive Kernels

Ong, CS., Mary, X., Canu, S., Smola, AJ.

In ICML 2004, pages: 81-81, ACM Press, New York, NY, USA, Twenty-First International Conference on Machine Learning, July 2004 (inproceedings)

Abstract
n this paper we show that many kernel methods can be adapted to deal with indefinite kernels, that is, kernels which are not positive semidefinite. They do not satisfy Mercer‘s condition and they induce associated functional spaces called Reproducing Kernel Kre&icaron;n Spaces (RKKS), a generalization of Reproducing Kernel Hilbert Spaces (RKHS).Machine learning in RKKS shares many "nice" properties of learning in RKHS, such as orthogonality and projection. However, since the kernels are indefinite, we can no longer minimize the loss, instead we stabilize it. We show a general representer theorem for constrained stabilization and prove generalization bounds by computing the Rademacher averages of the kernel class. We list several examples of indefinite kernels and investigate regularization methods to solve spline interpolation. Some preliminary experiments with indefinite kernels for spline smoothing are reported for truncated spectral factorization, Landweber-Fridman iterations, and MR-II.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Using Conditional Random Fields to Predict Pitch Accent in Conversational Speech

Gregory, M., Altun, Y.

In pages: 677-684, (Editors: Scott, D. , W. Daelemans, M. Walker), ACL, East Stroudsburg, PA, USA, 42nd Annual Meeting of the Association for Computational Linguistics (ACL), July 2004 (inproceedings)

Abstract
The detection of prosodic characteristics is an important aspect of both speech synthesis and speech recognition. Correct placement of pitch accents aids in more natural sounding speech, while automatic detection of accents can contribute to better wordlevel recognition and better textual understanding. In this paper we investigate probabilistic, contextual, and phonological factors that influence pitch accent placement in natural, conversational speech in a sequence labeling setting. We introduce Conditional Random Fields (CRFs) to pitch accent prediction task in order to incorporate these factors efficiently in a sequence model. We demonstrate the usefulness and the incremental effect of these factors in a sequence model by performing experiments on hand labeled data from the Switchboard Corpus. Our model outperforms the baseline and previous models of pitch accent prediction on the Switchboard Corpus.

Web [BibTex]

Web [BibTex]


no image
Support vector machine learning for interdependent and structured output spaces

Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.

In pages: 1-8, (Editors: Greiner, R. , D. Schuurmans), AAAI Press, Menlo Park, CA, USA, Twenty-first International Conference on Machine Learning (ICML), July 2004 (inproceedings)

Web DOI [BibTex]

Web DOI [BibTex]


no image
Riemannian Geometry on Graphs and its Application to Ranking and Classification

Zhou, D.

June 2004 (talk)

Abstract
We consider the problem of transductive inference. In many real-world problems, unlabeled data is far easier to obtain than labeled data. Hence transductive inference is very significant in many practical problems. According to Vapnik's point of view, one should predict the function value only on the given points directly rather than a function defined on the whole space, the latter being a more complicated problem. Inspired by this idea, we develop discrete calculus on finite discrete spaces, and then build discrete regularization. A family of transductive algorithms is naturally derived from this regularization framework. We validate the algorithms on both synthetic and real-world data from text/web categorization to bioinformatics problems. A significant by-product of this work is a powerful way of ranking data based on examples including images, documents, proteins and many other kinds of data.

PDF [BibTex]


no image
PAC-Bayesian Generic Chaining

Audibert, J., Bousquet, O.

In Advances in Neural Information Processing Systems 16, pages: 1125-1132 , (Editors: Thrun, S., L.K. Saul, B. Schölkopf), MIT Press, Cambridge, MA, USA, Seventeenth Annual Conference on Neural Information Processing Systems (NIPS), June 2004 (inproceedings)

Abstract
There exist many different generalization error bounds for classification. Each of these bounds contains an improvement over the others for certain situations. Our goal is to combine these different improvements into a single bound. In particular we combine the PAC-Bayes approach introduced by McAllester, which is interesting for averaging classifiers, with the optimal union bound provided by the generic chaining technique developed by Fernique and Talagrand. This combination is quite natural since the generic chaining is based on the notion of majorizing measures, which can be considered as priors on the set of classifiers, and such priors also arise in the PAC-bayesian setting.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Prediction on Spike Data Using Kernel Algorithms

Eichhorn, J., Tolias, A., Zien, A., Kuss, M., Rasmussen, C., Weston, J., Logothetis, N., Schölkopf, B.

In Advances in Neural Information Processing Systems 16, pages: 1367-1374, (Editors: S Thrun and LK Saul and B Schölkopf), MIT Press, Cambridge, MA, USA, 17th Annual Conference on Neural Information Processing Systems (NIPS), June 2004 (inproceedings)

Abstract
We report and compare the performance of different learning algorithms based on data from cortical recordings. The task is to predict the orientation of visual stimuli from the activity of a population of simultaneously recorded neurons. We compare several ways of improving the coding of the input (i.e., the spike data) as well as of the output (i.e., the orientation), and report the results obtained using different kernel algorithms.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Warped Gaussian Processes

Snelson, E., Rasmussen, CE., Ghahramani, Z.

In Advances in Neural Information Processing Systems 16, pages: 337-344, (Editors: Thrun, S., L.K. Saul, B. Schölkopf), MIT Press, Cambridge, MA, USA, Seventeenth Annual Conference on Neural Information Processing Systems (NIPS), June 2004 (inproceedings)

Abstract
We generalise the Gaussian process (GP) framework for regression by learning a nonlinear transformation of the GP outputs. This allows for non-Gaussian processes and non-Gaussian noise. The learning algorithm chooses a nonlinear transformation such that transformed data is well-modelled by a GP. This can be seen as including a preprocessing transformation as an integral part of the probabilistic modelling problem, rather than as an ad-hoc step. We demonstrate on several real regression problems that learning the transformation can lead to significantly better performance than using a regular GP, or a GP with a fixed transformation.

PDF Web [BibTex]

PDF Web [BibTex]