116 results (BibTeX)

2003


An Introduction to Variable and Feature Selection.

Guyon, I., Elisseeff, A.

Journal of Machine Learning, 3, pages: 1157-1182, 2003 (article)

[BibTex]

2003

[BibTex]


On the Complexity of Learning the Kernel Matrix

Bousquet, O., Herrmann, D.

In Advances in Neural Information Processing Systems 15, pages: 399-406, (Editors: Becker, S. , S. Thrun, K. Obermayer), The MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
We investigate data based procedures for selecting the kernel when learning with Support Vector Machines. We provide generalization error bounds by estimating the Rademacher complexities of the corresponding function classes. In particular we obtain a complexity bound for function classes induced by kernels with given eigenvectors, i.e., we allow to vary the spectrum and keep the eigenvectors fix. This bound is only a logarithmic factor bigger than the complexity of the function class induced by a single kernel. However, optimizing the margin over such classes leads to overfitting. We thus propose a suitable way of constraining the class. We use an efficient algorithm to solve the resulting optimization problem, present preliminary experimental results, and compare them to an alignment-based approach.

PDF Web [BibTex]

PDF Web [BibTex]


Feature Selection for Support Vector Machines by Means of Genetic Algorithms

Fröhlich, H., Chapelle, O., Schölkopf, B.

In 15th IEEE International Conference on Tools with AI, pages: 142-148, 15th IEEE International Conference on Tools with AI, 2003 (inproceedings)

[BibTex]

[BibTex]


A case based comparison of identification with neural network and Gaussian process models.

Kocijan, J. Banko, B. Likar, B. Girard, A. Murray-Smith, R. Rasmussen, CE.

In Proceedings of the International Conference on Intelligent Control Systems and Signal Processing ICONS 2003, 1, pages: 137-142, (Editors: Ruano, E.A.), Proceedings of the International Conference on Intelligent Control Systems and Signal Processing ICONS, April 2003 (inproceedings)

Abstract
In this paper an alternative approach to black-box identification of non-linear dynamic systems is compared with the more established approach of using artificial neural networks. The Gaussian process prior approach is a representative of non-parametric modelling approaches. It was compared on a pH process modelling case study. The purpose of modelling was to use the model for control design. The comparison revealed that even though Gaussian process models can be effectively used for modelling dynamic systems caution has to be axercised when signals are selected.

PDF [BibTex]

PDF [BibTex]


Propagation of Uncertainty in Bayesian Kernel Models - Application to Multiple-Step Ahead Forecasting

Quiñonero-Candela, J. Girard, A. Larsen, J. Rasmussen, CE.

In IEEE International Conference on Acoustics, Speech and Signal Processing, 2, pages: 701-704, IEEE International Conference on Acoustics, Speech and Signal Processing, 2003 (inproceedings)

Abstract
The object of Bayesian modelling is the predictive distribution, which in a forecasting scenario enables improved estimates of forecasted values and their uncertainties. In this paper we focus on reliably estimating the predictive mean and variance of forecasted values using Bayesian kernel based models such as the Gaussian Process and the Relevance Vector Machine. We derive novel analytic expressions for the predictive mean and variance for Gaussian kernel shapes under the assumption of a Gaussian input distribution in the static case, and of a recursive Gaussian predictive density in iterative forecasting. The capability of the method is demonstrated for forecasting of time-series and compared to approximate methods.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


Unsupervised Clustering of Images using their Joint Segmentation

Seldin, Y., Starik, S., Werman, M.

In The 3rd International Workshop on Statistical and Computational Theories of Vision (SCTV 2003), pages: 1-24, 3rd International Workshop on Statistical and Computational Theories of Vision (SCTV), 2003 (inproceedings)

PDF Web [BibTex]

PDF Web [BibTex]


Image Reconstruction by Linear Programming

Tsuda, K., Rätsch, G.

(118), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, October 2003 (techreport)

PDF [BibTex]

PDF [BibTex]


Ranking on Data Manifolds

Zhou, D., Weston, J., Gretton, A., Bousquet, O., Schölkopf, B.

(113), Max Planck Institute for Biological Cybernetics, 72076 Tuebingen, Germany, June 2003 (techreport)

Abstract
The Google search engine has had a huge success with its PageRank web page ranking algorithm, which exploits global, rather than local, hyperlink structure of the World Wide Web using random walk. This algorithm can only be used for graph data, however. Here we propose a simple universal ranking algorithm for vectorial data, based on the exploration of the intrinsic global geometric structure revealed by a huge amount of data. Experimental results from image and text to bioinformatics illustrates the validity of our algorithm.

PDF [BibTex]

PDF [BibTex]


A Note on Parameter Tuning for On-Line Shifting Algorithms

Bousquet, O.

Max Planck Institute for Biological Cybernetics, Tübingen, Germany, 2003 (techreport)

Abstract
In this short note, building on ideas of M. Herbster [2] we propose a method for automatically tuning the parameter of the FIXED-SHARE algorithm proposed by Herbster and Warmuth [3] in the context of on-line learning with shifting experts. We show that this can be done with a memory requirement of $O(nT)$ and that the additional loss incurred by the tuning is the same as the loss incurred for estimating the parameter of a Bernoulli random variable.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


Phase Information and the Recognition of Natural Images

Braun, D., Wichmann, F., Gegenfurtner, K.

6, pages: 138, (Editors: H.H. Bülthoff, K.R. Gegenfurtner, H.A. Mallot, R. Ulrich, F.A. Wichmann), 6. T{\"u}binger Wahrnehmungskonferenz (TWK), February 2003 (poster)

Abstract
Fourier phase plays an important role in determining image structure. For example, when the phase spectrum of an image showing a ower is swapped with the phase spectrum of an image showing a tank, then we will usually perceive a tank in the resulting image, even though the amplitude spectrum is still that of the ower. Also, when the phases of an image are randomly swapped across frequencies, the resulting image becomes impossible to recognize. Our goal was to evaluate the e ect of phase manipulations in a more quantitative manner. On each trial subjects viewed two images of natural scenes. The subject had to indicate which one of the two images contained an animal. The spectra of the images were manipulated by adding random phase noise at each frequency. The phase noise was uniformly distributed in the interval [;+], where  was varied between 0 degree and 180 degrees. Image pairs were displayed for 100 msec. Subjects were remarkably resistant to the addition of phase noise. Even with [120; 120] degree noise, subjects still were at a level of 75% correct. The introduction of phase noise leads to a reduction of image contrast. Subjects were slightly better than a simple prediction based on this contrast reduction. However, when contrast response functions were measured in the same experimental paradigm, we found that performance in the phase noise experiment was signi cantly lower than that predicted by the corresponding contrast reduction.

Web [BibTex]

Web [BibTex]


Kernel Methods for Classification and Signal Separation

Gretton, A.

pages: 226, Biologische Kybernetik, University of Cambridge, Cambridge, April 2003 (phdthesis)

PostScript [BibTex]

PostScript [BibTex]


Hyperkernels

Ong, CS. Smola, AJ. Williamson, RC.

In pages: 495-502, 2003 (inproceedings)

PDF [BibTex]

PDF [BibTex]


Bayesian Monte Carlo

Rasmussen, CE. Ghahramani, Z.

In Advances in Neural Information Processing Systems 15, pages: 489-496, (Editors: Becker, S. , S. Thrun, K. Obermayer), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
We investigate Bayesian alternatives to classical Monte Carlo methods for evaluating integrals. Bayesian Monte Carlo (BMC) allows the incorporation of prior knowledge, such as smoothness of the integrand, into the estimation. In a simple problem we show that this outperforms any classical importance sampling method. We also attempt more challenging multidimensional integrals involved in computing marginal likelihoods of statistical models (a.k.a. partition functions and model evidences). We find that Bayesian Monte Carlo outperformed Annealed Importance Sampling, although for very high dimensional problems or problems with massive multimodality BMC may be less adequate. One advantage of the Bayesian approach to Monte Carlo is that samples can be drawn from any distribution. This allows for the possibility of active design of sample points so as to maximise information gain.

PDF Web [BibTex]

PDF Web [BibTex]


How Many Neighbors To Consider in Pattern Pre-selection for Support Vector Classifiers?

Shin, H., Cho, S.

In Proc. of INNS-IEEE International Joint Conference on Neural Networks (IJCNN 2003), pages: 565-570, IJCNN, July 2003 (inproceedings)

Abstract
Training support vector classifiers (SVC) requires large memory and long cpu time when the pattern set is large. To alleviate the computational burden in SVC training, we previously proposed a preprocessing algorithm which selects only the patterns in the overlap region around the decision boundary, based on neighborhood properties [8], [9], [10]. The k-nearest neighbors’ class label entropy for each pattern was used to estimate the pattern’s proximity to the decision boundary. The value of parameter k is critical, yet has been determined by a rather ad-hoc fashion. We propose in this paper a systematic procedure to determine k and show its effectiveness through experiments.

PDF [BibTex]

PDF [BibTex]


Technical report on Separation methods for nonlinear mixtures

Jutten, C., Karhunen, J., Almeida, L., Harmeling, S.

(D29), EU-Project BLISS, October 2003 (techreport)

PDF [BibTex]

PDF [BibTex]


Large Margin Methods for Label Sequence Learning

Altun, Y., Hofmann, T.

In pages: 993-996, International Speech Communication Association, Bonn, Germany, 8th European Conference on Speech Communication and Technology (EuroSpeech), September 2003 (inproceedings)

Web [BibTex]

Web [BibTex]


Dynamics of a rigid body in a Stokes fluid

Gonzalez, O. Graf, ABA. Maddocks, JH.

Journal of Fluid Mechanics, 2003 (article) Accepted

[BibTex]

[BibTex]


A novel transient heater-foil technique for liquid crystal experiments on film cooled surfaces

Vogel, G. Graf, ABA. von Wolfersdorf, J. Weigand, B.

ASME Journal of Turbomachinery, 125, pages: 529-537, 2003 (article)

PDF [BibTex]

PDF [BibTex]


Blind separation of post-nonlinear mixtures using linearizing transformations and temporal decorrelation

Ziehe, A., Kawanabe, M., Harmeling, S., Müller, K.

Journal of Machine Learning Research, 4(7-8):1319-1338, November 2003 (article)

Abstract
We propose two methods that reduce the post-nonlinear blind source separation problem (PNL-BSS) to a linear BSS problem. The first method is based on the concept of maximal correlation: we apply the alternating conditional expectation (ACE) algorithm--a powerful technique from non-parametric statistics--to approximately invert the componentwise nonlinear functions. The second method is a Gaussianizing transformation, which is motivated by the fact that linearly mixed signals before nonlinear transformation are approximately Gaussian distributed. This heuristic, but simple and efficient procedure works as good as the ACE method. Using the framework provided by ACE, convergence can be proven. The optimal transformations obtained by ACE coincide with the sought-after inverse functions of the nonlinearities. After equalizing the nonlinearities, temporal decorrelation separation (TDSEP) allows us to recover the source signals. Numerical simulations testing "ACE-TD" and "Gauss-TD" on realistic examples are performed with excellent results.

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


Control, Planning, Learning, and Imitation with Dynamic Movement Primitives

Schaal, S., Peters, J., Nakanishi, J., Ijspeert, A.

In IROS 2003, pages: 1-21, Workshop on Bilateral Paradigms on Humans and Humanoids, IEEE International Conference on Intelligent Robots and Systems, October 2003 (inproceedings)

PDF [BibTex]

PDF [BibTex]


Fast Pattern Selection for Support Vector Classifiers

Shin, H., Cho, S.

In PAKDD 2003, pages: 376-387, (Editors: Whang, K.-Y. , J. Jeon, K. Shim, J. Srivastava), Springer, Berlin, Germany, 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining, May 2003 (inproceedings)

Abstract
Training SVM requires large memory and long cpu time when the pattern set is large. To alleviate the computational burden in SVM training, we propose a fast preprocessing algorithm which selects only the patterns near the decision boundary. Preliminary simulation results were promising: Up to two orders of magnitude, training time reduction was achieved including the preprocessing, without any loss in classification accuracies.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


Support Vector Machines

Schölkopf, B., Smola, A.

In Handbook of Brain Theory and Neural Networks (2nd edition), pages: 1119-1125, (Editors: MA Arbib), MIT Press, Cambridge, MA, USA, 2003 (inbook)

[BibTex]

[BibTex]


Technical report on implementation of linear methods and validation on acoustic sources

Harmeling, S., Bünau, P., Ziehe, A., Pham, D.

EU-Project BLISS, September 2003 (techreport)

PDF [BibTex]

PDF [BibTex]


Prediction at an Uncertain Input for Gaussian Processes and Relevance Vector Machines - Application to Multiple-Step Ahead Time-Series Forecasting

Quiñonero-Candela, J., Girard, A., Rasmussen, C.

(IMM-2003-18), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, 2003 (techreport)

PDF PostScript [BibTex]

PDF PostScript [BibTex]


Discriminative Learning for Label Sequences via Boosting

Altun, Y., Hofmann, T., Johnson, M.

In Advances in Neural Information Processing Systems 15, pages: 977-984, (Editors: Becker, S. , S. Thrun, K. Obermayer ), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
This paper investigates a boosting approach to discriminative learning of label sequences based on a sequence rank loss function.

PDF Web [BibTex]

PDF Web [BibTex]


Large margin Methods in Label Sequence Learning

Altun, Y.

Brown University, Providence, RI, USA, 2003 (mastersthesis)

[BibTex]

[BibTex]


The em Algorithm for Kernel Matrix Completion with Auxiliary Data

Tsuda, K., Akaho, S., Asai, K.

Journal of Machine Learning Research, 4, pages: 67-81, May 2003 (article)

PDF [BibTex]

PDF [BibTex]


Kernel Methods and Their Applications to Signal Processing

Bousquet, O., Perez-Cruz, F.

In Proceedings. (ICASSP ‘03), Special Session on Kernel Methods, pages: 860 , ICASSP, 2003 (inproceedings)

Abstract
Recently introduced in Machine Learning, the notion of kernels has drawn a lot of interest as it allows to obtain non-linear algorithms from linear ones in a simple and elegant manner. This, in conjunction with the introduction of new linear classification methods such as the Support Vector Machines has produced significant progress. The successes of such algorithms is now spreading as they are applied to more and more domains. Many Signal Processing problems, by their non-linear and high-dimensional nature may benefit from such techniques. We give an overview of kernel methods and their recent applications.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


On-Line One-Class Support Vector Machines. An Application to Signal Segmentation

Gretton, A., Desobry, .

In IEEE ICASSP Vol. 2, pages: 709-712, IEEE ICASSP, April 2003 (inproceedings)

Abstract
In this paper, we describe an efficient algorithm to sequentially update a density support estimate obtained using one-class support vector machines. The solution provided is an exact solution, which proves to be far more computationally attractive than a batch approach. This deterministic technique is applied to the problem of audio signal segmentation, with simulations demonstrating the computational performance gain on toy data sets, and the accuracy of the segmentation on audio signals.

PostScript [BibTex]

PostScript [BibTex]


Marginalized Kernels between Labeled Graphs

Kashima, H., Tsuda, K., Inokuchi, A.

In 20th International Conference on Machine Learning, pages: 321-328, (Editors: Faucett, T. and N. Mishra), 20th International Conference on Machine Learning, August 2003 (inproceedings)

PDF [BibTex]

PDF [BibTex]


Predictive control with Gaussian process models

Kocijan, J. Murray-Smith, R. Rasmussen, CE. Likar, B.

In Proceedings of IEEE Region 8 Eurocon 2003: Computer as a Tool, pages: 352-356, (Editors: Zajc, B. and M. Tkal), Proceedings of IEEE Region 8 Eurocon: Computer as a Tool, 2003 (inproceedings)

Abstract
This paper describes model-based predictive control based on Gaussian processes.Gaussian process models provide a probabilistic non-parametric modelling approach for black-box identification of non-linear dynamic systems. It offers more insight in variance of obtained model response, as well as fewer parameters to determine than other models. The Gaussian processes can highlight areas of the input space where prediction quality is poor, due to the lack of data or its complexity, by indicating the higher variance around the predicted mean. This property is used in predictive control, where optimisation of control signal takes the variance information into account. The predictive control principle is demonstrated on a simulated example of nonlinear system.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


Extension of the nu-SVM range for classification

Perez-Cruz, F., Weston, J., Herrmann, D., Schölkopf, B.

In Advances in Learning Theory: Methods, Models and Applications, NATO Science Series III: Computer and Systems Sciences, Vol. 190, 190, pages: 179-196, NATO Science Series III: Computer and Systems Sciences, (Editors: J Suykens and G Horvath and S Basu and C Micchelli and J Vandewalle), IOS Press, Amsterdam, 2003 (inbook)

[BibTex]

[BibTex]


Kernel Hebbian Algorithm for Iterative Kernel Principal Component Analysis

Kim, K., Franz, M., Schölkopf, B.

(109), MPI f. biologische Kybernetik, Tuebingen, June 2003 (techreport)

Abstract
A new method for performing a kernel principal component analysis is proposed. By kernelizing the generalized Hebbian algorithm, one can iteratively estimate the principal components in a reproducing kernel Hilbert space with only linear order memory complexity. The derivation of the method, a convergence proof, and preliminary applications in image hyperresolution are presented. In addition, we discuss the extension of the method to the online learning of kernel principal components.

PDF [BibTex]

PDF [BibTex]


Learning with Local and Global Consistency

Zhou, D., Bousquet, O., Lal, T., Weston, J., Schölkopf, B.

(112), Max Planck Institute for Biological Cybernetics, Tuebingen, Germany, June 2003 (techreport)

Abstract
We consider the learning problem in the transductive setting. Given a set of points of which only some are labeled, the goal is to predict the label of the unlabeled points. A principled clue to solve such a learning problem is the consistency assumption that a classifying function should be sufficiently smooth with respect to the structure revealed by these known labeled and unlabeled points. We present a simple algorithm to obtain such a smooth solution. Our method yields encouraging experimental results on a number of classification problems and demonstrates effective use of unlabeled data.

[BibTex]

[BibTex]


The Kernel Mutual Information

Gretton, A., Herbrich, R., Smola, A.

Max Planck Institute for Biological Cybernetics, April 2003 (techreport)

Abstract
We introduce two new functions, the kernel covariance (KC) and the kernel mutual information (KMI), to measure the degree of independence of several continuous random variables. The former is guaranteed to be zero if and only if the random variables are pairwise independent; the latter shares this property, and is in addition an approximate upper bound on the mutual information, as measured near independence, and is based on a kernel density estimate. We show that Bach and Jordan‘s kernel generalised variance (KGV) is also an upper bound on the same kernel density estimate, but is looser. Finally, we suggest that the addition of a regularising term in the KGV causes it to approach the KMI, which motivates the introduction of this regularisation. The performance of the KC and KMI is verified in the context of instantaneous independent component analysis (ICA), by recovering both artificial and real (musical) signals following linear mixing.

PostScript [BibTex]

PostScript [BibTex]


Introduction: Robots with Cognition?

Franz, MO.

6, pages: 38, (Editors: H.H. Bülthoff, K.R. Gegenfurtner, H.A. Mallot, R. Ulrich, F.A. Wichmann), 6. T{\"u}binger Wahrnehmungskonferenz (TWK), February 2003 (talk)

Abstract
Using robots as models of cognitive behaviour has a long tradition in robotics. Parallel to the historical development in cognitive science, one observes two major, subsequent waves in cognitive robotics. The first is based on ideas of classical, cognitivist Artificial Intelligence (AI). According to the AI view of cognition as rule-based symbol manipulation, these robots typically try to extract symbolic descriptions of the environment from their sensors that are used to update a common, global world representation from which, in turn, the next action of the robot is derived. The AI approach has been successful in strongly restricted and controlled environments requiring well-defined tasks, e.g. in industrial assembly lines. AI-based robots mostly failed, however, in the unpredictable and unstructured environments that have to be faced by mobile robots. This has provoked the second wave in cognitive robotics which tries to achieve cognitive behaviour as an emergent property from the interaction of simple, low-level modules. Robots of the second wave are called animats as their architecture is designed to closely model aspects of real animals. Using only simple reactive mechanisms and Hebbian-type or evolutionary learning, the resulting animats often outperformed the highly complex AI-based robots in tasks such as obstacle avoidance, corridor following etc. While successful in generating robust, insect-like behaviour, typical animats are limited to stereotyped, fixed stimulus-response associations. If one adopts the view that cognition requires a flexible, goal-dependent choice of behaviours and planning capabilities (H.A. Mallot, Kognitionswissenschaft, 1999, 40-48) then it appears that cognitive behaviour cannot emerge from a collection of purely reactive modules. It rather requires environmentally decoupled structures that work without directly engaging the actions that it is concerned with. This poses the current challenge to cognitive robotics: How can we build cognitive robots that show the robustness and the learning capabilities of animats without falling back into the representational paradigm of AI? The speakers of the symposium present their approaches to this question in the context of robot navigation and sensorimotor learning. In the first talk, Prof. Helge Ritter introduces a robot system for imitation learning capable of exploring various alternatives in simulation before actually performing a task. The second speaker, Angelo Arleo, develops a model of spatial memory in rat navigation based on his electrophysiological experiments. He validates the model on a mobile robot which, in some navigation tasks, shows a performance comparable to that of the real rat. A similar model of spatial memory is used to investigate the mechanisms of territory formation in a series of robot experiments presented by Prof. Hanspeter Mallot. In the last talk, we return to the domain of sensorimotor learning where Ralf M{\"o}ller introduces his approach to generate anticipatory behaviour by learning forward models of sensorimotor relationships.

Web [BibTex]

Web [BibTex]


m-Alternative Forced Choice—Improving the Efficiency of the Method of Constant Stimuli

Jäkel, F.

Biologische Kybernetik, Graduate School for Neural and Behavioural Sciences, Tübingen, 2003 (diplomathesis)

[BibTex]

[BibTex]


Rademacher and Gaussian averages in Learning Theory

Bousquet, O.

Universite de Marne-la-Vallee, March 2003 (talk)

PDF [BibTex]

PDF [BibTex]


New Approaches to Statistical Learning Theory

Bousquet, O.

Annals of the Institute of Statistical Mathematics, 55(2):371-389, 2003 (article)

Abstract
We present new tools from probability theory that can be applied to the analysis of learning algorithms. These tools allow to derive new bounds on the generalization performance of learning algorithms and to propose alternative measures of the complexity of the learning task, which in turn can be used to derive new learning algorithms.

PostScript [BibTex]

PostScript [BibTex]


Constructing Descriptive and Discriminative Non-linear Features: Rayleigh Coefficients in Kernel Feature Spaces

Mika, S., Rätsch, G., Weston, J., Schölkopf, B., Smola, A., Müller, K.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(5):623-628, May 2003 (article)

Abstract
We incorporate prior knowledge to construct nonlinear algorithms for invariant feature extraction and discrimination. Employing a unified framework in terms of a nonlinearized variant of the Rayleigh coefficient, we propose nonlinear generalizations of Fisher‘s discriminant and oriented PCA using support vector kernel functions. Extensive simulations show the utility of our approach.

DOI [BibTex]

DOI [BibTex]


Cluster Kernels for Semi-Supervised Learning

Chapelle, O., Weston, J., Schölkopf, B.

In Advances in Neural Information Processing Systems 15, pages: 585-592, (Editors: S Becker and S Thrun and K Obermayer), MIT Press, Cambridge, MA, USA, 16th Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
We propose a framework to incorporate unlabeled data in kernel classifier, based on the idea that two points in the same cluster are more likely to have the same label. This is achieved by modifying the eigenspectrum of the kernel matrix. Experimental results assess the validity of this approach.

PDF Web [BibTex]

PDF Web [BibTex]


The Kernel Mutual Information

Gretton, A., Herbrich, R., Smola, A.

In IEEE ICASSP Vol. 4, pages: 880-883, IEEE ICASSP, April 2003 (inproceedings)

Abstract
We introduce a new contrast function, the kernel mutual information (KMI), to measure the degree of independence of continuous random variables. This contrast function provides an approximate upper bound on the mutual information, as measured near independence, and is based on a kernel density estimate of the mutual information between a discretised approximation of the continuous random variables. We show that Bach and Jordan‘s kernel generalised variance (KGV) is also an upper bound on the same kernel density estimate, but is looser. Finally, we suggest that the addition of a regularising term in the KGV causes it to approach the KMI, which motivates the introduction of this regularisation.

PostScript [BibTex]

PostScript [BibTex]


On the Representation, Learning and Transfer of Spatio-Temporal Movement Characteristics

Ilg, W. Bakir, GH. Mezger, J. Giese, MA.

In Humanoids Proceedings, pages: 0-0, Humanoids Proceedings, July 2003, electronical version (inproceedings)

Abstract
In this paper we present a learning-based approach for the modelling of complex movement sequences. Based on the method of Spatio-Temporal Morphable Models (STMMS. We derive a hierarchical algorithm that, in a first step, identifies automatically movement elements in movement sequences based on a coarse spatio-temporal description, and in a second step models these movement primitives by approximation through linear combinations of learned example movement trajectories. We describe the different steps of the algorithm and show how it can be applied for modelling and synthesis of complex sequences of human movements that contain movement elements with variable style. The proposed method is demonstrated on different applications of movement representation relevant for imitation learning of movement styles in humanoid robotics.

PDF [BibTex]

PDF [BibTex]


Mismatch String Kernels for SVM Protein Classification

Leslie, C., Eskin, E., Weston, J., Noble, W.

In Advances in Neural Information Processing Systems 15, pages: 1417-1424, (Editors: Becker, S. , S. Thrun, K. Obermayer), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the protein classification problem. These kernels measure sequence similarity based on shared occurrences of k-length subsequences, counted with up to m mismatches, and do not rely on any generative model for the positive training sequences. We compute the kernels efficiently using a mismatch tree data structure and report experiments on a benchmark SCOP dataset, where we show that the mismatch kernel used with an SVM classifier performs as well as the Fisher kernel, the most successful method for remote homology detection, while achieving considerable computational savings.

PDF Web [BibTex]

PDF Web [BibTex]


Distance-based classification with Lipschitz functions

von Luxburg, U., Bousquet, O.

In Learning Theory and Kernel Machines, Proceedings of the 16th Annual Conference on Computational Learning Theory, pages: 314-328, (Editors: Schölkopf, B. and M.K. Warmuth), Learning Theory and Kernel Machines, Proceedings of the 16th Annual Conference on Computational Learning Theory, 2003 (inproceedings)

Abstract
The goal of this article is to develop a framework for large margin classification in metric spaces. We want to find a generalization of linear decision functions for metric spaces and define a corresponding notion of margin such that the decision function separates the training points with a large margin. It will turn out that using Lipschitz functions as decision functions, the inverse of the Lipschitz constant can be interpreted as the size of a margin. In order to construct a clean mathematical setup we isometrically embed the given metric space into a Banach space and the space of Lipschitz functions into its dual space. Our approach leads to a general large margin algorithm for classification in metric spaces. To analyze this algorithm, we first prove a representer theorem. It states that there exists a solution which can be expressed as linear combination of distances to sets of training points. Then we analyze the Rademacher complexity of some Lipschitz function classes. The generality of the Lipschitz approach can be seen from the fact that several well-known algorithms are special cases of the Lipschitz algorithm, among them the support vector machine, the linear programming machine, and the 1-nearest neighbor classifier.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


An Introduction to Support Vector Machines

Schölkopf, B.

In Recent Advances and Trends in Nonparametric Statistics , pages: 3-17, (Editors: MG Akritas and DN Politis), Elsevier, Amsterdam, The Netherlands, 2003 (inbook)

Web DOI [BibTex]

Web DOI [BibTex]


Statistical Learning and Kernel Methods in Bioinformatics

Schölkopf, B., Guyon, I., Weston, J.

In Artificial Intelligence and Heuristic Methods in Bioinformatics, 183, pages: 1-21, 3, (Editors: P Frasconi und R Shamir), IOS Press, Amsterdam, The Netherlands, 2003 (inbook)

[BibTex]

[BibTex]