Header logo is ei


2003


no image
On the Representation, Learning and Transfer of Spatio-Temporal Movement Characteristics

Ilg, W., Bakir, GH., Mezger, J., Giese, MA.

In Humanoids Proceedings, pages: 0-0, Humanoids Proceedings, July 2003, electronical version (inproceedings)

Abstract
In this paper we present a learning-based approach for the modelling of complex movement sequences. Based on the method of Spatio-Temporal Morphable Models (STMMS. We derive a hierarchical algorithm that, in a first step, identifies automatically movement elements in movement sequences based on a coarse spatio-temporal description, and in a second step models these movement primitives by approximation through linear combinations of learned example movement trajectories. We describe the different steps of the algorithm and show how it can be applied for modelling and synthesis of complex sequences of human movements that contain movement elements with variable style. The proposed method is demonstrated on different applications of movement representation relevant for imitation learning of movement styles in humanoid robotics.

PDF [BibTex]

2003

PDF [BibTex]


no image
Loss Functions and Optimization Methods for Discriminative Learning of Label Sequences

Altun, Y., Johnson, M., Hofmann, T.

In pages: 145-152, (Editors: Collins, M. , M. Steedman), ACL, East Stroudsburg, PA, USA, Conference on Empirical Methods in Natural Language Processing (EMNLP) , July 2003 (inproceedings)

Abstract
Discriminative models have been of interest in the NLP community in recent years. Previous research has shown that they are advantageous over generative models. In this paper, we investigate how different objective functions and optimization methods affect the performance of the classifiers in the discriminative learning framework. We focus on the sequence labelling problem, particularly POS tagging and NER tasks. Our experiments show that changing the objective function is not as effective as changing the features included in the model.

Web [BibTex]

Web [BibTex]


no image
Time Complexity Analysis of Fast Pattern Selection Algorithm for SVM

Shin, H., Cho, S.

In Proc. of the Korean Data Mining Conference, pages: 221-231, Korean Data Mining Conference, June 2003 (inproceedings)

[BibTex]

[BibTex]


no image
Fast Pattern Selection for Support Vector Classifiers

Shin, H., Cho, S.

In PAKDD 2003, pages: 376-387, (Editors: Whang, K.-Y. , J. Jeon, K. Shim, J. Srivastava), Springer, Berlin, Germany, 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining, May 2003 (inproceedings)

Abstract
Training SVM requires large memory and long cpu time when the pattern set is large. To alleviate the computational burden in SVM training, we propose a fast preprocessing algorithm which selects only the patterns near the decision boundary. Preliminary simulation results were promising: Up to two orders of magnitude, training time reduction was achieved including the preprocessing, without any loss in classification accuracies.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Scaling Reinforcement Learning Paradigms for Motor Control

Peters, J., Vijayakumar, S., Schaal, S.

In JSNC 2003, 10, pages: 1-7, 10th Joint Symposium on Neural Computation (JSNC), May 2003 (inproceedings)

Abstract
Reinforcement learning offers a general framework to explain reward related learning in artificial and biological motor control. However, current reinforcement learning methods rarely scale to high dimensional movement systems and mainly operate in discrete, low dimensional domains like game-playing, artificial toy problems, etc. This drawback makes them unsuitable for application to human or bio-mimetic motor control. In this poster, we look at promising approaches that can potentially scale and suggest a novel formulation of the actor-critic algorithm which takes steps towards alleviating the current shortcomings. We argue that methods based on greedy policies are not likely to scale into high-dimensional domains as they are problematic when used with function approximation – a must when dealing with continuous domains. We adopt the path of direct policy gradient based policy improvements since they avoid the problems of unstabilizing dynamics encountered in traditional value iteration based updates. While regular policy gradient methods have demonstrated promising results in the domain of humanoid notor control, we demonstrate that these methods can be significantly improved using the natural policy gradient instead of the regular policy gradient. Based on this, it is proved that Kakade’s ‘average natural policy gradient’ is indeed the true natural gradient. A general algorithm for estimating the natural gradient, the Natural Actor-Critic algorithm, is introduced. This algorithm converges with probability one to the nearest local minimum in Riemannian space of the cost function. The algorithm outperforms nonnatural policy gradients by far in a cart-pole balancing evaluation, and offers a promising route for the development of reinforcement learning for truly high-dimensionally continuous state-action systems. Keywords: Reinforcement learning, neurodynamic programming, actorcritic methods, policy gradient methods, natural policy gradient

PDF Web [BibTex]

PDF Web [BibTex]


no image
A case based comparison of identification with neural network and Gaussian process models.

Kocijan, J., Banko, B., Likar, B., Girard, A., Murray-Smith, R., Rasmussen, CE.

In Proceedings of the International Conference on Intelligent Control Systems and Signal Processing ICONS 2003, 1, pages: 137-142, (Editors: Ruano, E.A.), Proceedings of the International Conference on Intelligent Control Systems and Signal Processing ICONS, April 2003 (inproceedings)

Abstract
In this paper an alternative approach to black-box identification of non-linear dynamic systems is compared with the more established approach of using artificial neural networks. The Gaussian process prior approach is a representative of non-parametric modelling approaches. It was compared on a pH process modelling case study. The purpose of modelling was to use the model for control design. The comparison revealed that even though Gaussian process models can be effectively used for modelling dynamic systems caution has to be axercised when signals are selected.

PDF [BibTex]

PDF [BibTex]


no image
On-Line One-Class Support Vector Machines. An Application to Signal Segmentation

Gretton, A., Desobry, ..

In IEEE ICASSP Vol. 2, pages: 709-712, IEEE ICASSP, April 2003 (inproceedings)

Abstract
In this paper, we describe an efficient algorithm to sequentially update a density support estimate obtained using one-class support vector machines. The solution provided is an exact solution, which proves to be far more computationally attractive than a batch approach. This deterministic technique is applied to the problem of audio signal segmentation, with simulations demonstrating the computational performance gain on toy data sets, and the accuracy of the segmentation on audio signals.

PostScript [BibTex]

PostScript [BibTex]


no image
Blind separation of post-nonlinear mixtures using gaussianizing transformations and temporal decorrelation

Ziehe, A., Kawanabe, M., Harmeling, S., Müller, K.

In ICA 2003, pages: 269-274, (Editors: Amari, S.-I. , A. Cichocki, S. Makino, N. Murata), 4th International Symposium on Independent Component Analysis and Blind Signal Separation, April 2003 (inproceedings)

Abstract
At the previous workshop (ICA2001) we proposed the ACE-TD method that reduces the post-nonlinear blind source separation problem (PNL BSS) to a linear BSS problem. The method utilizes the Alternating Conditional Expectation (ACE) algorithm to approximately invert the (post-){non-linear} functions. In this contribution, we propose an alternative procedure called Gaussianizing transformation, which is motivated by the fact that linearly mixed signals before nonlinear transformation are approximately Gaussian distributed. This heuristic, but simple and efficient procedure yields similar results as the ACE method and can thus be used as a fast and effective equalization method. After equalizing the nonlinearities, temporal decorrelation separation (TDSEP) allows us to recover the source signals. Numerical simulations on realistic examples are performed to compare "Gauss-TD" with "ACE-TD".

PDF Web [BibTex]

PDF Web [BibTex]


no image
The Kernel Mutual Information

Gretton, A., Herbrich, R., Smola, A.

In IEEE ICASSP Vol. 4, pages: 880-883, IEEE ICASSP, April 2003 (inproceedings)

Abstract
We introduce a new contrast function, the kernel mutual information (KMI), to measure the degree of independence of continuous random variables. This contrast function provides an approximate upper bound on the mutual information, as measured near independence, and is based on a kernel density estimate of the mutual information between a discretised approximation of the continuous random variables. We show that Bach and Jordan‘s kernel generalised variance (KGV) is also an upper bound on the same kernel density estimate, but is looser. Finally, we suggest that the addition of a regularising term in the KGV causes it to approach the KMI, which motivates the introduction of this regularisation.

PostScript [BibTex]

PostScript [BibTex]


no image
Analysing ICA component by injection noise

Harmeling, S., Meinecke, F., Müller, K.

In ICA 2003, pages: 149-154, (Editors: Amari, S.-I. , A. Cichocki, S. Makino, N. Murata), 4th International Symposium on Independent Component Analysis and Blind Signal Separation, April 2003 (inproceedings)

Abstract
Usually, noise is considered to be destructive. We present a new method that constructively injects noise to assess the reliability and the group structure of empirical ICA components. Simulations show that the true root-mean squared angle distances between the real sources and some source estimates can be approximated by our method. In a toy experiment, we see that we are also able to reveal the underlying group structure of extracted ICA components. Furthermore, an experiment with fetal ECG data demonstrates that our approach is useful for exploratory data analysis of real-world data.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Hierarchical Spatio-Temporal Morphable Models for Representation of complex movements for Imitation Learning

Ilg, W., Bakir, GH., Franz, MO., Giese, M.

In 11th International Conference on Advanced Robotics, (2):453-458, (Editors: Nunes, U., A. de Almeida, A. Bejczy, K. Kosuge and J.A.T. Machado), 11th International Conference on Advanced Robotics, January 2003 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Hyperkernels

Ong, CS., Smola, AJ., Williamson, RC.

In pages: 495-502, 2003 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Feature Selection for Support Vector Machines by Means of Genetic Algorithms

Fröhlich, H., Chapelle, O., Schölkopf, B.

In 15th IEEE International Conference on Tools with AI, pages: 142-148, 15th IEEE International Conference on Tools with AI, 2003 (inproceedings)

[BibTex]

[BibTex]


no image
Propagation of Uncertainty in Bayesian Kernel Models - Application to Multiple-Step Ahead Forecasting

Quiñonero-Candela, J., Girard, A., Larsen, J., Rasmussen, CE.

In IEEE International Conference on Acoustics, Speech and Signal Processing, 2, pages: 701-704, IEEE International Conference on Acoustics, Speech and Signal Processing, 2003 (inproceedings)

Abstract
The object of Bayesian modelling is the predictive distribution, which in a forecasting scenario enables improved estimates of forecasted values and their uncertainties. In this paper we focus on reliably estimating the predictive mean and variance of forecasted values using Bayesian kernel based models such as the Gaussian Process and the Relevance Vector Machine. We derive novel analytic expressions for the predictive mean and variance for Gaussian kernel shapes under the assumption of a Gaussian input distribution in the static case, and of a recursive Gaussian predictive density in iterative forecasting. The capability of the method is demonstrated for forecasting of time-series and compared to approximate methods.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Unsupervised Clustering of Images using their Joint Segmentation

Seldin, Y., Starik, S., Werman, M.

In The 3rd International Workshop on Statistical and Computational Theories of Vision (SCTV 2003), pages: 1-24, 3rd International Workshop on Statistical and Computational Theories of Vision (SCTV), 2003 (inproceedings)

PDF Web [BibTex]

PDF Web [BibTex]


no image
Kernel Methods and Their Applications to Signal Processing

Bousquet, O., Perez-Cruz, F.

In Proceedings. (ICASSP ‘03), Special Session on Kernel Methods, pages: 860 , ICASSP, 2003 (inproceedings)

Abstract
Recently introduced in Machine Learning, the notion of kernels has drawn a lot of interest as it allows to obtain non-linear algorithms from linear ones in a simple and elegant manner. This, in conjunction with the introduction of new linear classification methods such as the Support Vector Machines has produced significant progress. The successes of such algorithms is now spreading as they are applied to more and more domains. Many Signal Processing problems, by their non-linear and high-dimensional nature may benefit from such techniques. We give an overview of kernel methods and their recent applications.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Predictive control with Gaussian process models

Kocijan, J., Murray-Smith, R., Rasmussen, CE., Likar, B.

In Proceedings of IEEE Region 8 Eurocon 2003: Computer as a Tool, pages: 352-356, (Editors: Zajc, B. and M. Tkal), Proceedings of IEEE Region 8 Eurocon: Computer as a Tool, 2003 (inproceedings)

Abstract
This paper describes model-based predictive control based on Gaussian processes.Gaussian process models provide a probabilistic non-parametric modelling approach for black-box identification of non-linear dynamic systems. It offers more insight in variance of obtained model response, as well as fewer parameters to determine than other models. The Gaussian processes can highlight areas of the input space where prediction quality is poor, due to the lack of data or its complexity, by indicating the higher variance around the predicted mean. This property is used in predictive control, where optimisation of control signal takes the variance information into account. The predictive control principle is demonstrated on a simulated example of nonlinear system.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Distance-based classification with Lipschitz functions

von Luxburg, U., Bousquet, O.

In Learning Theory and Kernel Machines, Proceedings of the 16th Annual Conference on Computational Learning Theory, pages: 314-328, (Editors: Schölkopf, B. and M.K. Warmuth), Learning Theory and Kernel Machines, Proceedings of the 16th Annual Conference on Computational Learning Theory, 2003 (inproceedings)

Abstract
The goal of this article is to develop a framework for large margin classification in metric spaces. We want to find a generalization of linear decision functions for metric spaces and define a corresponding notion of margin such that the decision function separates the training points with a large margin. It will turn out that using Lipschitz functions as decision functions, the inverse of the Lipschitz constant can be interpreted as the size of a margin. In order to construct a clean mathematical setup we isometrically embed the given metric space into a Banach space and the space of Lipschitz functions into its dual space. Our approach leads to a general large margin algorithm for classification in metric spaces. To analyze this algorithm, we first prove a representer theorem. It states that there exists a solution which can be expressed as linear combination of distances to sets of training points. Then we analyze the Rademacher complexity of some Lipschitz function classes. The generality of the Lipschitz approach can be seen from the fact that several well-known algorithms are special cases of the Lipschitz algorithm, among them the support vector machine, the linear programming machine, and the 1-nearest neighbor classifier.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Semi-Supervised Learning through Principal Directions Estimation

Chapelle, O., Schölkopf, B., Weston, J.

In ICML Workshop, The Continuum from Labeled to Unlabeled Data in Machine Learning & Data Mining, pages: 7, ICML Workshop: The Continuum from Labeled to Unlabeled Data in Machine Learning & Data Mining, 2003 (inproceedings)

Abstract
We describe methods for taking into account unlabeled data in the training of a kernel-based classifier, such as a Support Vector Machines (SVM). We propose two approaches utilizing unlabeled points in the vicinity of labeled ones. Both of the approaches effectively modify the metric of the pattern space, either by using non-spherical Gaussian density estimates which are determined using EM, or by modifying the kernel function using displacement vectors computed from pairs of unlabeled and labeled points. The latter is linked to techniques for training invariant SVMs. We present experimental results indicating that the proposed technique can lead to substantial improvements of classification accuracy.

PostScript [BibTex]

PostScript [BibTex]


no image
Machine Learning with Hyperkernels

Ong, CS., Smola, AJ.

In pages: 568-575, 2003 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Gaussian Processes to Speed up Hybrid Monte Carlo for Expensive Bayesian Integrals

Rasmussen, CE.

In Bayesian Statistics 7, pages: 651-659, (Editors: J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West), Bayesian Statistics 7, 2003 (inproceedings)

Abstract
Hybrid Monte Carlo (HMC) is often the method of choice for computing Bayesian integrals that are not analytically tractable. However the success of this method may require a very large number of evaluations of the (un-normalized) posterior and its partial derivatives. In situations where the posterior is computationally costly to evaluate, this may lead to an unacceptable computational load for HMC. I propose to use a Gaussian Process model of the (log of the) posterior for most of the computations required by HMC. Within this scheme only occasional evaluation of the actual posterior is required to guarantee that the samples generated have exactly the desired distribution, even if the GP model is somewhat inaccurate. The method is demonstrated on a 10 dimensional problem, where 200 evaluations suffice for the generation of 100 roughly independent points from the posterior. Thus, the proposed scheme allows Bayesian treatment of models with posteriors that are computationally demanding, such as models involving computer simulation.

PDF PostScript Web [BibTex]

PDF PostScript Web [BibTex]


no image
Dimension Reduction Based on Orthogonality — a Decorrelation Method in ICA

Zhang, K., Chan, L.

In Artificial Neural Networks and Neural Information Processing - ICANN/ICONIP 2003, pages: 132-139, (Editors: O Kaynak and E Alpaydin and E Oja and L Xu), Springer, Berlin, Germany, International Conference on Artificial Neural Networks and International Conference on Neural Information Processing, ICANN/ICONIP, 2003, Lecture Notes in Computer Science, Volume 2714 (inproceedings)

Web DOI [BibTex]

Web DOI [BibTex]

2001


no image
Pattern Selection Using the Bias and Variance of Ensemble

Shin, H., Cho, S.

In Proc. of the Korean Data Mining Conference, pages: 56-67, Korean Data Mining Conference, December 2001 (inproceedings)

[BibTex]

2001

[BibTex]


no image
Separation of post-nonlinear mixtures using ACE and temporal decorrelation

Ziehe, A., Kawanabe, M., Harmeling, S., Müller, K.

In ICA 2001, pages: 433-438, (Editors: Lee, T.-W. , T.P. Jung, S. Makeig, T. J. Sejnowski), Third International Workshop on Independent Component Analysis and Blind Signal Separation, December 2001 (inproceedings)

Abstract
We propose an efficient method based on the concept of maximal correlation that reduces the post-nonlinear blind source separation problem (PNL BSS) to a linear BSS problem. For this we apply the Alternating Conditional Expectation (ACE) algorithm – a powerful technique from nonparametric statistics – to approximately invert the (post-)nonlinear functions. Interestingly, in the framework of the ACE method convergence can be proven and in the PNL BSS scenario the optimal transformation found by ACE will coincide with the desired inverse functions. After the nonlinearities have been removed by ACE, temporal decorrelation (TD) allows us to recover the source signals. An excellent performance underlines the validity of our approach and demonstrates the ACE-TD method on realistic examples.

PDF [BibTex]

PDF [BibTex]


no image
Nonlinear blind source separation using kernel feature spaces

Harmeling, S., Ziehe, A., Kawanabe, M., Blankertz, B., Müller, K.

In ICA 2001, pages: 102-107, (Editors: Lee, T.-W. , T.P. Jung, S. Makeig, T. J. Sejnowski), Third International Workshop on Independent Component Analysis and Blind Signal Separation, December 2001 (inproceedings)

Abstract
In this work we propose a kernel-based blind source separation (BSS) algorithm that can perform nonlinear BSS for general invertible nonlinearities. For our kTDSEP algorithm we have to go through four steps: (i) adapting to the intrinsic dimension of the data mapped to feature space F, (ii) finding an orthonormal basis of this submanifold, (iii) mapping the data into the subspace of F spanned by this orthonormal basis, and (iv) applying temporal decorrelation BSS (TDSEP) to the mapped data. After demixing we get a number of irrelevant components and the original sources. To find out which ones are the components of interest, we propose a criterion that allows to identify the original sources. The excellent performance of kTDSEP is demonstrated in experiments on nonlinearly mixed speech data.

PDF [BibTex]

PDF [BibTex]


no image
Pattern Selection for ‘Regression’ using the Bias and Variance of Ensemble Network

Shin, H., Cho, S.

In Proc. of the Korean Institute of Industrial Engineers Conference, pages: 10-19, Korean Industrial Engineers Conference, November 2001 (inproceedings)

[BibTex]

[BibTex]


no image
Pattern Selection for ‘Classification’ using the Bias and Variance of Ensemble Neural Network

Shin, H., Cho, S.

In Proc. of the Korea Information Science Conference, pages: 307-309, Korea Information Science Conference, October 2001, Best Paper Award (inproceedings)

[BibTex]

[BibTex]


no image
Hybrid IDM/Impedance learning in human movements

Burdet, E., Teng, K., Chew, C., Peters, J., , B.

In ISHF 2001, 1, pages: 1-9, 1st International Symposium on Measurement, Analysis and Modeling of Human Functions (ISHF2001), September 2001 (inproceedings)

Abstract
In spite of motor output variability and the delay in the sensori-motor, humans routinely perform intrinsically un- stable tasks. The hybrid IDM/impedance learning con- troller presented in this paper enables skilful performance in strong stable and unstable environments. It consid- ers motor output variability identified from experimen- tal data, and contains two modules concurrently learning the endpoint force and impedance adapted to the envi- ronment. The simulations suggest how humans learn to skillfully perform intrinsically unstable tasks. Testable predictions are proposed.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Combining Off- and On-line Calibration of a Digital Camera

Urbanek, M., Horaud, R., Sturm, P.

In In Proceedings of Third International Conference on 3-D Digital Imaging and Modeling, pages: 99-106, In Proceedings of Third International Conference on 3-D Digital Imaging and Modeling, June 2001 (inproceedings)

Abstract
We introduce a novel outlook on the self­calibration task, by considering images taken by a camera in motion, allowing for zooming and focusing. Apart from the complex relationship between the lens control settings and the intrinsic camera parameters, a prior off­line calibration allows to neglect the setting of focus, and to fix the principal point and aspect ratio throughout distinct views. Thus, the calibration matrix is dependent only on the zoom position. Given a fully calibrated reference view, one has only one parameter to estimate for any other view of the same scene, in order to calibrate it and to be able to perform metric reconstructions. We provide a close­form solution, and validate the reliability of the algorithm with experiments on real images. An important advantage of our method is a reduced ­ to one ­ number of critical camera configurations, associated with it. Moreover, we propose a method for computing the epipolar geometry of two views, taken from different positions and with different (spatial) resolutions; the idea is to take an appropriate third view, that is "easy" to match with the other two.

ZIP [BibTex]

ZIP [BibTex]


no image
Support vector novelty detection applied to jet engine vibration spectra

Hayton, P., Schölkopf, B., Tarassenko, L., Anuzis, P.

In Advances in Neural Information Processing Systems 13, pages: 946-952, (Editors: TK Leen and TG Dietterich and V Tresp), MIT Press, Cambridge, MA, USA, 14th Annual Neural Information Processing Systems Conference (NIPS), April 2001 (inproceedings)

Abstract
A system has been developed to extract diagnostic information from jet engine carcass vibration data. Support Vector Machines applied to novelty detection provide a measure of how unusual the shape of a vibration signature is, by learning a representation of normality. We describe a novel method for Support Vector Machines of including information from a second class for novelty detection and give results from the application to Jet Engine vibration analysis.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Four-legged Walking Gait Control Using a Neuromorphic Chip Interfaced to a Support Vector Learning Algorithm

Still, S., Schölkopf, B., Hepp, K., Douglas, R.

In Advances in Neural Information Processing Systems 13, pages: 741-747, (Editors: TK Leen and TG Dietterich and V Tresp), MIT Press, Cambridge, MA, USA, 14th Annual Neural Information Processing Systems Conference (NIPS), April 2001 (inproceedings)

Abstract
To control the walking gaits of a four-legged robot we present a novel neuromorphic VLSI chip that coordinates the relative phasing of the robot's legs similar to how spinal Central Pattern Generators are believed to control vertebrate locomotion [3]. The chip controls the leg movements by driving motors with time varying voltages which are the outputs of a small network of coupled oscillators. The characteristics of the chip's output voltages depend on a set of input parameters. The relationship between input parameters and output voltages can be computed analytically for an idealized system. In practice, however, this ideal relationship is only approximately true due to transistor mismatch and offsets.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Algorithmic Stability and Generalization Performance

Bousquet, O., Elisseeff, A.

In Advances in Neural Information Processing Systems 13, pages: 196-202, (Editors: Leen, T.K. , T.G. Dietterich, V. Tresp), MIT Press, Cambridge, MA, USA, Fourteenth Annual Neural Information Processing Systems Conference (NIPS), April 2001 (inproceedings)

Abstract
We present a novel way of obtaining PAC-style bounds on the generalization error of learning algorithms, explicitly using their stability properties. A {\em stable} learner being one for which the learned solution does not change much for small changes in the training set. The bounds we obtain do not depend on any measure of the complexity of the hypothesis space (e.g. VC dimension) but rather depend on how the learning algorithm searches this space, and can thus be applied even when the VC dimension in infinite. We demonstrate that regularization networks possess the required stability property and apply our method to obtain new bounds on their generalization performance.

PDF Web [BibTex]

PDF Web [BibTex]


no image
The Kernel Trick for Distances

Schölkopf, B.

In Advances in Neural Information Processing Systems 13, pages: 301-307, (Editors: TK Leen and TG Dietterich and V Tresp), MIT Press, Cambridge, MA, USA, 14th Annual Neural Information Processing Systems Conference (NIPS), April 2001 (inproceedings)

Abstract
A method is described which, like the kernel trick in support vector machines (SVMs), lets us generalize distance-based algorithms to operate in feature spaces, usually nonlinearly related to the input space. This is done by identifying a class of kernels which can be represented as norm-based distances in Hilbert spaces. It turns out that the common kernel algorithms, such as SVMs and kernel PCA, are actually really distance based algorithms and can be run with that class of kernels, too. As well as providing a useful new insight into how these algorithms work, the present work can form the basis for conceiving new algorithms.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Vicinal Risk Minimization

Chapelle, O., Weston, J., Bottou, L., Vapnik, V.

In Advances in Neural Information Processing Systems 13, pages: 416-422, (Editors: Leen, T.K. , T.G. Dietterich, V. Tresp), MIT Press, Cambridge, MA, USA, Fourteenth Annual Neural Information Processing Systems Conference (NIPS) , April 2001 (inproceedings)

Abstract
The Vicinal Risk Minimization principle establishes a bridge between generative models and methods derived from the Structural Risk Minimization Principle such as Support Vector Machines or Statistical Regularization. We explain how VRM provides a framework which integrates a number of existing algorithms, such as Parzen windows, Support Vector Machines, Ridge Regression, Constrained Logistic Classifiers and Tangent-Prop. We then show how the approach implies new algorithms for solving problems usually associated with generative models. New algorithms are described for dealing with pattern recognition problems with very different pattern distributions and dealing with unlabeled data. Preliminary empirical results are presented.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Feature Selection for SVMs

Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., Vapnik, V.

In Advances in Neural Information Processing Systems 13, pages: 668-674, (Editors: Leen, T.K. , T.G. Dietterich, V. Tresp), MIT Press, Cambridge, MA, USA, Fourteenth Annual Neural Information Processing Systems Conference (NIPS), April 2001 (inproceedings)

Abstract
We introduce a method of feature selection for Support Vector Machines. The method is based upon finding those features which minimize bounds on the leave-one-out error. This search can be efficiently performed via gradient descent. The resulting algorithms are shown to be superior to some standard feature selection algorithms on both toy data and real-life problems of face recognition, pedestrian detection and analyzing DNA microarray data.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Occam’s Razor

Rasmussen, CE., Ghahramani, Z.

In Advances in Neural Information Processing Systems 13, pages: 294-300, (Editors: Leen, T.K. , T.G. Dietterich, V. Tresp), MIT Press, Cambridge, MA, USA, Fourteenth Annual Neural Information Processing Systems Conference (NIPS), April 2001 (inproceedings)

Abstract
The Bayesian paradigm apparently only sometimes gives rise to Occam's Razor; at other times very large models perform well. We give simple examples of both kinds of behaviour. The two views are reconciled when measuring complexity of functions, rather than of the machinery used to implement them. We analyze the complexity of functions for some linear in the parameter models that are equivalent to Gaussian Processes, and always find Occam's Razor at work.

PDF Web [BibTex]

PDF Web [BibTex]


no image
An Improved Training Algorithm for Kernel Fisher Discriminants

Mika, S., Schölkopf, B., Smola, A.

In Proceedings AISTATS, pages: 98-104, (Editors: T Jaakkola and T Richardson), Morgan Kaufman, San Francisco, CA, Artificial Intelligence and Statistics (AISTATS), January 2001 (inproceedings)

Web [BibTex]

Web [BibTex]


no image
Nonstationary Signal Classification using Support Vector Machines

Gretton, A., Davy, M., Doucet, A., Rayner, P.

In 11th IEEE Workshop on Statistical Signal Processing, pages: 305-305, 11th IEEE Workshop on Statistical Signal Processing, 2001 (inproceedings)

Abstract
In this paper, we demonstrate the use of support vector (SV) techniques for the binary classification of nonstationary sinusoidal signals with quadratic phase. We briefly describe the theory underpinning SV classification, and introduce the Cohen's group time-frequency representation, which is used to process the non-stationary signals so as to define the classifier input space. We show that the SV classifier outperforms alternative classification methods on this processed data.

PostScript [BibTex]

PostScript [BibTex]


no image
Enhanced User Authentication through Typing Biometrics with Artificial Neural Networks and K-Nearest Neighbor Algorithm

Wong, FWMH., Supian, ASM., Ismail, AF., Lai, WK., Ong, CS.

In 2001 (inproceedings)

[BibTex]

[BibTex]


no image
Predicting the Nonlinear Dynamics of Biological Neurons using Support Vector Machines with Different Kernels

Frontzek, T., Lal, TN., Eckmiller, R.

In Proceedings of the International Joint Conference on Neural Networks (IJCNN'2001) Washington DC, 2, pages: 1492-1497, Proceedings of the International Joint Conference on Neural Networks (IJCNN'2001) Washington DC, 2001 (inproceedings)

Abstract
Based on biological data we examine the ability of Support Vector Machines (SVMs) with gaussian, polynomial and tanh-kernels to learn and predict the nonlinear dynamics of single biological neurons. We show that SVMs for regression learn the dynamics of the pyloric dilator neuron of the australian crayfish, and we determine the optimal SVM parameters with regard to the test error. Compared to conventional RBF networks and MLPs, SVMs with gaussian kernels learned faster and performed a better iterated one-step-ahead prediction with regard to training and test error. From a biological point of view SVMs are especially better in predicting the most important part of the dynamics, where the membranpotential is driven by superimposed synaptic inputs to the threshold for the oscillatory peak.

PDF [BibTex]

PDF [BibTex]


no image
Computationally Efficient Face Detection

Romdhani, S., Torr, P., Schölkopf, B., Blake, A.

In Computer Vision, ICCV 2001, vol. 2, (73):695-700, IEEE, 8th International Conference on Computer Vision, 2001 (inproceedings)

DOI [BibTex]

DOI [BibTex]


no image
Design and Verification of Supervisory Controller of High-Speed Train

Yoo, SP., Lee, DY., Son, HI.

In IEEE International Symposium on Industrial Electronics, pages: 1290-1295, IEEE Operations Center, Piscataway, NJ, USA, IEEE International Symposium on Industrial Electronics (ISIE), 2001 (inproceedings)

Abstract
A high-level controller, supervisory controller, is required to monitor, control, and diagnose the low-level controllers of the high-speed train. The supervisory controller controls low-level controllers by monitoring input and output signals, events, and the high-speed train can be modeled as a discrete event system (DES). The high-speed train is modeled with automata, and the high-level control specification is defined. The supervisory controller is designed using the high-speed train model and the control specification. The designed supervisory controller is verified and evaluated with simulation using a computer-aided software engineering (CASE) tool, Object GEODE

Web DOI [BibTex]

Web DOI [BibTex]


no image
Towards Learning Path Planning for Solving Complex Robot Tasks

Frontzek, T., Lal, TN., Eckmiller, R.

In Proceedings of the International Conference on Artificial Neural Networks (ICANN'2001) Vienna, pages: 943-950, Proceedings of the International Conference on Artificial Neural Networks (ICANN'2001) Vienna, 2001 (inproceedings)

Abstract
For solving complex robot tasks it is necessary to incorporate path planning methods that are able to operate within different high-dimensional configuration spaces containing an unknown number of obstacles. Based on Advanced A*-algorithm (AA*) using expansion matrices instead of a simple expansion logic we propose a further improvement of AA* enabling the capability to learn directly from sample planning tasks. This is done by inserting weights into the expansion matrix which are modified according to a special learning rule. For an examplary planning task we show that Adaptive AA* learns movement vectors which allow larger movements than the initial ones into well-defined directions of the configuration space. Compared to standard approaches planning times are clearly reduced.

PDF [BibTex]

PDF [BibTex]


no image
Learning to predict the leave-one-out error of kernel based classifiers

Tsuda, K., Rätsch, G., Mika, S., Müller, K.

In International Conference on Artificial Neural Networks, ICANN'01, (LNCS 2130):331-338, (Editors: G. Dorffner, H. Bischof and K. Hornik), International Conference on Artificial Neural Networks, ICANN'01, 2001 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
A kernel approach for vector quantization with guaranteed distortion bounds

Tipping, M., Schölkopf, B.

In Artificial Intelligence and Statistics, pages: 129-134, (Editors: T Jaakkola and T Richardson), Morgan Kaufmann, San Francisco, CA, USA, 8th International Conference on Artificial Intelligence and Statistics (AI and STATISTICS), 2001 (inproceedings)

[BibTex]

[BibTex]


no image
Tracking a Small Set of Experts by Mixing Past Posteriors

Bousquet, O., Warmuth, M.

In Proceedings of the 14th Annual Conference on Computational Learning Theory, Lecture Notes in Computer Science, 2111, pages: 31-47, Proceedings of the 14th Annual Conference on Computational Learning Theory, Lecture Notes in Computer Science, 2001 (inproceedings)

Abstract
In this paper, we examine on-line learning problems in which the target concept is allowed to change over time. In each trial a master algorithm receives predictions from a large set of $n$ experts. Its goal is to predict almost as well as the best sequence of such experts chosen off-line by partitioning the training sequence into $k+1$ sections and then choosing the best expert for each section. We build on methods developed by Herbster and Warmuth and consider an open problem posed by Freund where the experts in the best partition are from a small pool of size $m$. Since $k>>m$ the best expert shifts back and forth between the experts of the small pool. We propose algorithms that solve this open problem by mixing the past posteriors maintained by the master algorithm. We relate the number of bits needed for encoding the best partition to the loss bounds of the algorithms. Instead of paying $\log n$ for choosing the best expert in each section we first pay $\log {n\choose m}$ bits in the bounds for identifying the pool of $m$ experts and then $\log m$ bits per new section. In the bounds we also pay twice for encoding the boundaries of the sections.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Learning and Prediction of the Nonlinear Dynamics of Biological Neurons with Support Vector Machines

Frontzek, T., Lal, TN., Eckmiller, R.

In Proceedings of the International Conference on Artificial Neural Networks (ICANN'2001), pages: 390-398, Proceedings of the International Conference on Artificial Neural Networks (ICANN'2001), 2001 (inproceedings)

Abstract
Based on biological data we examine the ability of Support Vector Machines (SVMs) with gaussian kernels to learn and predict the nonlinear dynamics of single biological neurons. We show that SVMs for regression learn the dynamics of the pyloric dilator neuron of the australian crayfish, and we determine the optimal SVM parameters with regard to the test error. Compared to conventional RBF networks, SVMs learned faster and performed a better iterated one-step-ahead prediction with regard to training and test error. From a biological point of view SVMs are especially better in predicting the most important part of the dynamics, where the membranpotential is driven by superimposed synaptic inputs to the threshold for the oscillatory peak.

PDF [BibTex]

PDF [BibTex]


no image
Estimating a Kernel Fisher Discriminant in the Presence of Label Noise

Lawrence, N., Schölkopf, B.

In 18th International Conference on Machine Learning, pages: 306-313, (Editors: CE Brodley and A Pohoreckyj Danyluk), Morgan Kaufmann , San Fransisco, CA, USA, 18th International Conference on Machine Learning (ICML), 2001 (inproceedings)

Web [BibTex]

Web [BibTex]


no image
A Generalized Representer Theorem

Schölkopf, B., Herbrich, R., Smola, A.

In Lecture Notes in Computer Science, Vol. 2111, (2111):416-426, LNCS, (Editors: D Helmbold and R Williamson), Springer, Berlin, Germany, Annual Conference on Computational Learning Theory (COLT/EuroCOLT), 2001 (inproceedings)

[BibTex]

[BibTex]


no image
Unsupervised Segmentation and Classification of Mixtures of Markovian Sources

Seldin, Y., Bejerano, G., Tishby, N.

In The 33rd Symposium on the Interface of Computing Science and Statistics (Interface 2001 - Frontiers in Data Mining and Bioinformatics), pages: 1-15, 33rd Symposium on the Interface of Computing Science and Statistics (Interface - Frontiers in Data Mining and Bioinformatics), 2001 (inproceedings)

Abstract
We describe a novel algorithm for unsupervised segmentation of sequences into alternating Variable Memory Markov sources, first presented in [SBT01]. The algorithm is based on competitive learning between Markov models, when implemented as Prediction Suffix Trees [RST96] using the MDL principle. By applying a model clustering procedure, based on rate distortion theory combined with deterministic annealing, we obtain a hierarchical segmentation of sequences between alternating Markov sources. The method is applied successfully to unsupervised segmentation of multilingual texts into languages where it is able to infer correctly both the number of languages and the language switching points. When applied to protein sequence families (results of the [BSMT01] work), we demonstrate the method‘s ability to identify biologically meaningful sub-sequences within the proteins, which correspond to signatures of important functional sub-units called domains. Our approach to proteins classification (through the obtained signatures) is shown to have both conceptual and practical advantages over the currently used methods.

PDF Web [BibTex]

PDF Web [BibTex]