Header logo is ei


2007


no image
A Bayesian Approach to Nonlinear Parameter Identification for Rigid Body Dynamics

Ting, J., Mistry, M., Peters, J., Schaal, S., Nakanishi, J.

In RSS 2006, pages: 247-254, (Editors: Sukhatme, G. S., S. Schaal, W. Burgard, D. Fox), MIT Press, Cambridge, MA, USA, Robotics: Science and Systems II (RSS ), April 2007 (inproceedings)

Abstract
For robots of increasing complexity such as humanoid robots, conventional identification of rigid body dynamics models based on CAD data and actuator models becomes difficult and inaccurate due to the large number of additional nonlinear effects in these systems, e.g., stemming from stiff wires, hydraulic hoses, protective shells, skin, etc. Data driven parameter estimation offers an alternative model identification method, but it is often burdened by various other problems, such as significant noise in all measured or inferred variables of the robot. The danger of physically inconsistent results also exists due to unmodeled nonlinearities or insufficiently rich data. In this paper, we address all these problems by developing a Bayesian parameter identification method that can automatically detect noise in both input and output data for the regression algorithm that performs system identification. A post-processing step ensures physically consistent rigid body parameters by nonlinearly projecting the result of the Bayesian estimation onto constraints given by positive definite inertia matrices and the parallel axis theorem. We demonstrate on synthetic and actual robot data that our technique performs parameter identification with 5 to 20% higher accuracy than traditional methods. Due to the resulting physically consistent parameters, our algorithm enables us to apply advanced control methods that algebraically require physical consistency on robotic platforms.

PDF Web [BibTex]

2007

PDF Web [BibTex]


no image
Learning causality by identifying common effects with kernel-based dependence measures

Sun, X., Janzing, D.

In ESANN 2007, pages: 453-458, D-Side, Evere, Belgium, 15th European Symposium on Artificial Neural Networks, April 2007 (inproceedings)

Abstract
We describe a method for causal inference that measures the strength of statistical dependence by the Hilbert-Schmidt norm of kernel-based conditional cross-covariance operators. We consider the increase of the dependence of two variables X and Y by conditioning on a third variable Z as a hint for Z being a common effect of X and Y. Based on this assumption, we collect "votes" for hypothetical causal directions and orient the edges according to the majority vote. For most of our experiments with artificial and real-world data our method has outperformed the conventional constraint-based inductive causation (IC) algorithm.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Exploring the causal order of binary variables via exponential hierarchies of Markov kernels

Sun, X., Janzing, D.

In ESANN 2007, pages: 465-470, D-Side, Evere, Belgium, 15th European Symposium on Artificial Neural Networks, April 2007 (inproceedings)

Abstract
We propose a new algorithm for estimating the causal structure that underlies the observed dependence among n (n>=4) binary variables X_1,...,X_n. Our inference principle states that the factorization of the joint probability into conditional probabilities for X_j given X_1,...,X_{j-1} often leads to simpler terms if the order of variables is compatible with the directed acyclic graph representing the causal structure. We study joint measures of OR/AND gates and show that the complexity of the conditional probabilities (the so-called Markov kernels), defined by a hierarchy of exponential models, depends on the order of the variables. Some toy and real-data experiments support our inference rule.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Applying the Episodic Natural Actor-Critic Architecture to Motor Primitive Learning

Peters, J., Schaal, S.

In Proceedings of the 15th European Symposium on Artificial Neural Networks (ESANN 2007), pages: 295-300, D-Side, Evere, Belgium, 15th European Symposium on Artificial Neural Networks (ESANN), April 2007 (inproceedings)

Abstract
In this paper, we investigate motor primitive learning with the Natural Actor-Critic approach. The Natural Actor-Critic consists out of actor updates which are achieved using natural stochastic policy gradients while the critic obtains the natural policy gradient by linear regression. We show that this architecture can be used to learn the “building blocks of movement generation”, called motor primitives. Motor primitives are parameterized control policies such as splines or nonlinear differential equations with desired attractor properties. We show that our most modern algorithm, the Episodic Natural Actor-Critic outperforms previous algorithms by at least an order of magnitude. We demonstrate the efficiency of this reinforcement learning method in the application of learning to hit a baseball with an anthropomorphic robot arm.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Fast Newton-type Methods for the Least Squares Nonnegative Matrix Approximation Problem

Kim, D., Sra, S., Dhillon, I.

In SDM 2007, pages: 343-354, (Editors: Apte, C. ), Society for Industrial and Applied Mathematics, Pittsburgh, PA, USA, SIAM International Conference on Data Mining, April 2007 (inproceedings)

Abstract
Nonnegative Matrix Approximation is an effective matrix decomposition technique that has proven to be useful for a wide variety of applications ranging from document analysis and image processing to bioinformatics. There exist a few algorithms for nonnegative matrix approximation (NNMA), for example, Lee & Seung’s multiplicative updates, alternating least squares, and certain gradient descent based procedures. All of these procedures suffer from either slow convergence, numerical instabilities, or at worst, theoretical unsoundness. In this paper we present new and improved algorithms for the least-squares NNMA problem, which are not only theoretically well-founded, but also overcome many of the deficiencies of other methods. In particular, we use non-diagonal gradient scaling to obtain rapid convergence. Our methods provide numerical results superior to both Lee & Seung’s method as well to the alternating least squares (ALS) heuristic, which is known to work well in some situations but has no theoretical guarantees (Berry et al. 2006). Our approach extends naturally to include regularization and box-constraints, without sacrificing convergence guarantees. We present experimental results on both synthetic and realworld datasets to demonstrate the superiority of our methods, in terms of better approximations as well as efficiency.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Distinguishing Between Cause and Effect via Kernel-Based Complexity Measures for Conditional Distributions

Sun, X., Janzing, D., Schölkopf, B.

In Proceedings of the 15th European Symposium on Artificial Neural Networks , pages: 441-446, (Editors: M Verleysen), D-Side Publications, Evere, Belgium, ESANN, April 2007 (inproceedings)

Abstract
We propose a method to evaluate the complexity of probability measures from data that is based on a reproducing kernel Hilbert space seminorm of the logarithm of conditional probability densities. The motivation is to provide a tool for a causal inference method which assumes that conditional probabilities for effects given their causes are typically simpler and smoother than vice-versa. We present experiments with toy data where the quantitative results are consistent with our intuitive understanding of complexity and smoothness. Also in some examples with real-world data the probability measure corresponding to the true causal direction turned out to be less complex than those of the reversed order.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Deterministic Annealing for Multiple-Instance Learning

Gehler, P., Chapelle, O.

In JMLR Workshop and Conference Proceedings Volume 2: AISTATS 2007, pages: 123-130, (Editors: Meila, M. , X. Shen), MIT Press, Cambridge, MA, USA, 11th International Conference on Artificial Intelligence and Statistics, March 2007 (inproceedings)

Abstract
In this paper we demonstrate how deterministic annealing can be applied to different SVM formulations of the multiple-instance learning (MIL) problem. Our results show that we find better local minima compared to the heuristic methods those problems are usually solved with. However this does not always translate into a better test error suggesting an inadequacy of the objective function. Based on this finding we propose a new objective function which together with the deterministic annealing algorithm finds better local minima and achieves better performance on a set of benchmark datasets. Furthermore the results also show how the structure of MIL datasets influence the performance of MIL algorithms and we discuss how future benchmark datasets for the MIL problem should be designed.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Bayesian Inference and Optimal Design in the Sparse Linear Model

Seeger, M., Steinke, F., Tsuda, K.

In JMLR Workshop and Conference Proceedings Volume 2: AISTATS 2007, pages: 444-451, (Editors: Meila, M. , X. Shen), JMLR, Cambridge, MA, USA, 11th International Conference on Artificial Intelligence and Statistics, March 2007 (inproceedings)

Abstract
The sparse linear model has seen many successful applications in Statistics, Machine Learning, and Computational Biology, such as identification of gene regulatory networks from micro-array expression data. Prior work has either approximated Bayesian inference by expensive Markov chain Monte Carlo, or replaced it by point estimation. We show how to obtain a good approximation to Bayesian analysis efficiently, using the Expectation Propagation method. We also address the problems of optimal design and hyperparameter estimation. We demonstrate our framework on a gene network identification task.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Stick-breaking Construction for the Indian Buffet Process

Teh, Y., Görür, D., Ghahramani, Z.

In JMLR Workshop and Conference Proceedings Volume 2: AISTATS 2007, pages: 556-563, (Editors: Meila, M. , X. Shen), MIT Press, Cambridge, MA, USA, 11th International Conference on Artificial Intelligence and Statistics, March 2007 (inproceedings)

Abstract
The Indian buffet process (IBP) is a Bayesian nonparametric distribution whereby objects are modelled using an unbounded number of latent features. In this paper we derive a stick-breaking representation for the IBP. Based on this new representation, we develop slice samplers for the IBP that are efficient, easy to implement and are more generally applicable than the currently available Gibbs sampler. This representation, along with the work of Thibaux and Jordan [17], also illuminates interesting theoretical connections between the IBP, Chinese restaurant processes, Beta processes and Dirichlet processes.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Fast Kernel ICA using an Approximate Newton Method

Shen, H., Jegelka, S., Gretton, A.

In JMLR Workshop and Conference Proceedings Volume 2: AISTATS 2007, pages: 476-483, (Editors: Meila, M. , X. Shen), MIT Press, Cambridge, MA, USA, 11th International Conference on Artificial Intelligence and Statistics, March 2007 (inproceedings)

Abstract
Recent approaches to independent component analysis (ICA) have used kernel independence measures to obtain very good performance, particularly where classical methods experience difficulty (for instance, sources with near-zero kurtosis). We present Fast Kernel ICA (FastKICA), a novel optimisation technique for one such kernel independence measure, the Hilbert-Schmidt independence criterion (HSIC). Our search procedure uses an approximate Newton method on the special orthogonal group, where we estimate the Hessian locally about independence. We employ incomplete Cholesky decomposition to efficiently compute the gradient and approximate Hessian. FastKICA results in more accurate solutions at a given cost compared with gradient descent, and is relatively insensitive to local minima when initialised far from independence. These properties allow kernel approaches to be extended to problems with larger numbers of sources and observations. Our method is competitive with other modern and classical ICA approaches in both speed and accuracy.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Transductive Classification via Local Learning Regularization

Wu, M., Schölkopf, B.

In JMLR Workshop and Conference Proceedings Volume 2: AISTATS 2007, pages: 628-635, (Editors: M Meila and X Shen), 11th International Conference on Artificial Intelligence and Statistics, March 2007 (inproceedings)

Abstract
The idea of local learning, classifying a particular point based on its neighbors, has been successfully applied to supervised learning problems. In this paper, we adapt it for Transductive Classification (TC) problems. Specifically, we formulate a Local Learning Regularizer (LL-Reg) which leads to a solution with the property that the label of each data point can be well predicted based on its neighbors and their labels. For model selection, an efficient way to compute the leave-one-out classification error is provided for the proposed and related algorithms. Experimental results using several benchmark datasets illustrate the effectiveness of the proposed approach.

PDF Web [BibTex]

PDF Web [BibTex]


no image
The Independent Components of Natural Images are Perceptually Dependent

Bethge, M., Wiecki, T., Wichmann, F.

In Human Vision and Electronic Imaging XII, pages: 1-12, (Editors: Rogowitz, B. E.), SPIE, Bellingham, WA, USA, SPIE Human Vision and Electronic Imaging Conference, February 2007 (inproceedings)

Abstract
The independent components of natural images are a set of linear filters which are optimized for statistical independence. With such a set of filters images can be represented without loss of information. Intriguingly, the filter shapes are localized, oriented, and bandpass, resembling important properties of V1 simple cell receptive fields. Here we address the question of whether the independent components of natural images are also perceptually less dependent than other image components. We compared the pixel basis, the ICA basis and the discrete cosine basis by asking subjects to interactively predict missing pixels (for the pixel basis) or to predict the coefficients of ICA and DCT basis functions in patches of natural images. Like Kersten (1987) we find the pixel basis to be perceptually highly redundant but perhaps surprisingly, the ICA basis showed significantly higher perceptual dependencies than the DCT basis. This shows a dissociation between statistical and perceptual dependence measures.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Unsupervised learning of a steerable basis for invariant image representations

Bethge, M., Gerwinn, S., Macke, J.

In Human Vision and Electronic Imaging XII, pages: 1-12, (Editors: Rogowitz, B. E.), SPIE, Bellingham, WA, USA, SPIE Human Vision and Electronic Imaging Conference, February 2007 (inproceedings)

Abstract
There are two aspects to unsupervised learning of invariant representations of images: First, we can reduce the dimensionality of the representation by finding an optimal trade-off between temporal stability and informativeness. We show that the answer to this optimization problem is generally not unique so that there is still considerable freedom in choosing a suitable basis. Which of the many optimal representations should be selected? Here, we focus on this second aspect, and seek to find representations that are invariant under geometrical transformations occuring in sequences of natural images. We utilize ideas of steerability and Lie groups, which have been developed in the context of filter design. In particular, we show how an anti-symmetric version of canonical correlation analysis can be used to learn a full-rank image basis which is steerable with respect to rotations. We provide a geometric interpretation of this algorithm by showing that it finds the two-dimensional eigensubspaces of the avera ge bivector. For data which exhibits a variety of transformations, we develop a bivector clustering algorithm, which we use to learn a basis of generalized quadrature pairs (i.e. complex cells) from sequences of natural images.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
A Subspace Kernel for Nonlinear Feature Extraction

Wu, M., Farquhar, J.

In IJCAI-07, pages: 1125-1130, (Editors: Veloso, M. M.), AAAI Press, Menlo Park, CA, USA, International Joint Conference on Artificial Intelligence, January 2007 (inproceedings)

Abstract
Kernel based nonlinear Feature Extraction (KFE) or dimensionality reduction is a widely used pre-processing step in pattern classification and data mining tasks. Given a positive definite kernel function, it is well known that the input data are implicitly mapped to a feature space with usually very high dimensionality. The goal of KFE is to find a low dimensional subspace of this feature space, which retains most of the information needed for classification or data analysis. In this paper, we propose a subspace kernel based on which the feature extraction problem is transformed to a kernel parameter learning problem. The key observation is that when projecting data into a low dimensional subspace of the feature space, the parameters that are used for describing this subspace can be regarded as the parameters of the kernel function between the projected data. Therefore current kernel parameter learning methods can be adapted to optimize this parameterized kernel function. Experimental results are provided to validate the effectiveness of the proposed approach.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Graph kernels for disease outcome prediction from protein-protein interaction networks

Borgwardt, KM., Vishwanathan, SVN., Schraudolph, N., Kriegel, H-P.

In pages: 4-15, (Editors: Altman, R.B. A.K. Dunker, L. Hunter, T. Murray, T.E. Klein), World Scientific, Hackensack, NJ, USA, Pacific Symposium on Biocomputing (PSB), January 2007 (inproceedings)

Abstract
It is widely believed that comparing discrepancies in the protein-protein interaction (PPI) networks of individuals will become an important tool in understanding and preventing diseases. Currently PPI networks for individuals are not available, but gene expression data is becoming easier to obtain and allows us to represent individuals by a co-integrated gene expression/protein interaction network. Two major problems hamper the application of graph kernels – state-of-the-art methods for whole-graph comparison – to compare PPI networks. First, these methods do not scale to graphs of the size of a PPI network. Second, missing edges in these interaction networks are biologically relevant for detecting discrepancies, yet, these methods do not take this into account. In this article we present graph kernels for biological network comparison that are fast to compute and take into account missing interactions. We evaluate their practical performance on two datasets of co-integrated gene expression/PPI networks.

PDF [BibTex]

PDF [BibTex]


no image
Independent Factor Reinforcement Learning for Portfolio Management

Li, J., Zhang, K., Chan, L.

In Proceedings of the 8th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL 2007), pages: 1020-1031, (Editors: H Yin and P Tiño and E Corchado and W Byrne and X Yao), Springer, Berlin, Germany, 8th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL), 2007 (inproceedings)

Web [BibTex]

Web [BibTex]


no image
Kernel-Based Nonlinear Independent Component Analysis

Zhang, K., Chan, L.

In Independent Component Analysis and Signal Separation, 7th International Conference, ICA 2007, pages: 301-308, (Editors: M E Davies and C J James and S A Abdallah and M D Plumbley), Springer, 7th International Conference on Independent Component Analysis and Signal Separation (ICA), 2007, Lecture Notes in Computer Science, Vol. 4666 (inproceedings)

Web DOI [BibTex]

Web DOI [BibTex]


no image
Towards Machine Learning of Motor Skills

Peters, J., Schaal, S., Schölkopf, B.

In Proceedings of Autonome Mobile Systeme (AMS), pages: 138-144, (Editors: K Berns and T Luksch), 2007, clmc (inproceedings)

Abstract
Autonomous robots that can adapt to novel situations has been a long standing vision of robotics, artificial intelligence, and cognitive sciences. Early approaches to this goal during the heydays of artificial intelligence research in the late 1980s, however, made it clear that an approach purely based on reasoning or human insights would not be able to model all the perceptuomotor tasks that a robot should fulfill. Instead, new hope was put in the growing wake of machine learning that promised fully adaptive control algorithms which learn both by observation and trial-and-error. However, to date, learning techniques have yet to fulfill this promise as only few methods manage to scale into the high-dimensional domains of manipulator robotics, or even the new upcoming trend of humanoid robotics, and usually scaling was only achieved in precisely pre-structured domains. In this paper, we investigate the ingredients for a general approach to motor skill learning in order to get one step closer towards human-like performance. For doing so, we study two ma jor components for such an approach, i.e., firstly, a theoretically well-founded general approach to representing the required control structures for task representation and execution and, secondly, appropriate learning algorithms which can be applied in this setting.

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Reinforcement Learning for Optimal Control of Arm Movements

Theodorou, E., Peters, J., Schaal, S.

In Abstracts of the 37st Meeting of the Society of Neuroscience., Neuroscience, 2007, clmc (inproceedings)

Abstract
Every day motor behavior consists of a plethora of challenging motor skills from discrete movements such as reaching and throwing to rhythmic movements such as walking, drumming and running. How this plethora of motor skills can be learned remains an open question. In particular, is there any unifying computa-tional framework that could model the learning process of this variety of motor behaviors and at the same time be biologically plausible? In this work we aim to give an answer to these questions by providing a computational framework that unifies the learning mechanism of both rhythmic and discrete movements under optimization criteria, i.e., in a non-supervised trial-and-error fashion. Our suggested framework is based on Reinforcement Learning, which is mostly considered as too costly to be a plausible mechanism for learning com-plex limb movement. However, recent work on reinforcement learning with pol-icy gradients combined with parameterized movement primitives allows novel and more efficient algorithms. By using the representational power of such mo-tor primitives we show how rhythmic motor behaviors such as walking, squash-ing and drumming as well as discrete behaviors like reaching and grasping can be learned with biologically plausible algorithms. Using extensive simulations and by using different reward functions we provide results that support the hy-pothesis that Reinforcement Learning could be a viable candidate for motor learning of human motor behavior when other learning methods like supervised learning are not feasible.

[BibTex]

[BibTex]


no image
Reinforcement learning by reward-weighted regression for operational space control

Peters, J., Schaal, S.

In Proceedings of the 24th Annual International Conference on Machine Learning, pages: 745-750, ICML, 2007, clmc (inproceedings)

Abstract
Many robot control problems of practical importance, including operational space control, can be reformulated as immediate reward reinforcement learning problems. However, few of the known optimization or reinforcement learning algorithms can be used in online learning control for robots, as they are either prohibitively slow, do not scale to interesting domains of complex robots, or require trying out policies generated by random search, which are infeasible for a physical system. Using a generalization of the EM-base reinforcement learning framework suggested by Dayan & Hinton, we reduce the problem of learning with immediate rewards to a reward-weighted regression problem with an adaptive, integrated reward transformation for faster convergence. The resulting algorithm is efficient, learns smoothly without dangerous jumps in solution space, and works well in applications of complex high degree-of-freedom robots.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Policy gradient methods for machine learning

Peters, J., Theodorou, E., Schaal, S.

In Proceedings of the 14th INFORMS Conference of the Applied Probability Society, pages: 97-98, Eindhoven, Netherlands, July 9-11, 2007, 2007, clmc (inproceedings)

Abstract
We present an in-depth survey of policy gradient methods as they are used in the machine learning community for optimizing parameterized, stochastic control policies in Markovian systems with respect to the expected reward. Despite having been developed separately in the reinforcement learning literature, policy gradient methods employ likelihood ratio gradient estimators as also suggested in the stochastic simulation optimization community. It is well-known that this approach to policy gradient estimation traditionally suffers from three drawbacks, i.e., large variance, a strong dependence on baseline functions and a inefficient gradient descent. In this talk, we will present a series of recent results which tackles each of these problems. The variance of the gradient estimation can be reduced significantly through recently introduced techniques such as optimal baselines, compatible function approximations and all-action gradients. However, as even the analytically obtainable policy gradients perform unnaturally slow, it required the step from ÔvanillaÕ policy gradient methods towards natural policy gradients in order to overcome the inefficiency of the gradient descent. This development resulted into the Natural Actor-Critic architecture which can be shown to be very efficient in application to motor primitive learning for robotics.

[BibTex]

[BibTex]


no image
Policy Learning for Motor Skills

Peters, J., Schaal, S.

In Proceedings of 14th International Conference on Neural Information Processing (ICONIP), pages: 233-242, (Editors: Ishikawa, M. , K. Doya, H. Miyamoto, T. Yamakawa), 2007, clmc (inproceedings)

Abstract
Policy learning which allows autonomous robots to adapt to novel situations has been a long standing vision of robotics, artificial intelligence, and cognitive sciences. However, to date, learning techniques have yet to fulfill this promise as only few methods manage to scale into the high-dimensional domains of manipulator robotics, or even the new upcoming trend of humanoid robotics, and usually scaling was only achieved in precisely pre-structured domains. In this paper, we investigate the ingredients for a general approach policy learning with the goal of an application to motor skill refinement in order to get one step closer towards human-like performance. For doing so, we study two major components for such an approach, i.e., firstly, we study policy learning algorithms which can be applied in the general setting of motor skill learning, and, secondly, we study a theoretically well-founded general approach to representing the required control structures for task representation and execution.

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Reinforcement learning for operational space control

Peters, J., Schaal, S.

In Proceedings of the 2007 IEEE International Conference on Robotics and Automation, pages: 2111-2116, IEEE Computer Society, ICRA, 2007, clmc (inproceedings)

Abstract
While operational space control is of essential importance for robotics and well-understood from an analytical point of view, it can be prohibitively hard to achieve accurate control in face of modeling errors, which are inevitable in complex robots, e.g., humanoid robots. In such cases, learning control methods can offer an interesting alternative to analytical control algorithms. However, the resulting supervised learning problem is ill-defined as it requires to learn an inverse mapping of a usually redundant system, which is well known to suffer from the property of non-convexity of the solution space, i.e., the learning system could generate motor commands that try to steer the robot into physically impossible configurations. The important insight that many operational space control algorithms can be reformulated as optimal control problems, however, allows addressing this inverse learning problem in the framework of reinforcement learning. However, few of the known optimization or reinforcement learning algorithms can be used in online learning control for robots, as they are either prohibitively slow, do not scale to interesting domains of complex robots, or require trying out policies generated by random search, which are infeasible for a physical system. Using a generalization of the EM-based reinforcement learning framework suggested by Dayan & Hinton, we reduce the problem of learning with immediate rewards to a reward-weighted regression problem with an adaptive, integrated reward transformation for faster convergence. The resulting algorithm is efficient, learns smoothly without dangerous jumps in solution space, and works well in applications of complex high degree-of-freedom robots.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Using reward-weighted regression for reinforcement learning of task space control

Peters, J., Schaal, S.

In Proceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pages: 262-267, Honolulu, Hawaii, April 1-5, 2007, 2007, clmc (inproceedings)

Abstract
In this paper, we evaluate different versions from the three main kinds of model-free policy gradient methods, i.e., finite difference gradients, `vanilla' policy gradients and natural policy gradients. Each of these methods is first presented in its simple form and subsequently refined and optimized. By carrying out numerous experiments on the cart pole regulator benchmark we aim to provide a useful baseline for future research on parameterized policy search algorithms. Portable C++ code is provided for both plant and algorithms; thus, the results in this paper can be reevaluated, reused and new algorithms can be inserted with ease.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark

Riedmiller, M., Peters, J., Schaal, S.

In Proceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pages: 254-261, ADPRL, 2007, clmc (inproceedings)

Abstract
In this paper, we evaluate different versions from the three main kinds of model-free policy gradient methods, i.e., finite difference gradients, `vanilla' policy gradients and natural policy gradients. Each of these methods is first presented in its simple form and subsequently refined and optimized. By carrying out numerous experiments on the cart pole regulator benchmark we aim to provide a useful baseline for future research on parameterized policy search algorithms. Portable C++ code is provided for both plant and algorithms; thus, the results in this paper can be reevaluated, reused and new algorithms can be inserted with ease.

PDF [BibTex]

PDF [BibTex]

2003


no image
How to Deal with Large Dataset, Class Imbalance and Binary Output in SVM based Response Model

Shin, H., Cho, S.

In Proc. of the Korean Data Mining Conference, pages: 93-107, Korean Data Mining Conference, December 2003, Best Paper Award (inproceedings)

Abstract
[Abstract]: Various machine learning methods have made a rapid transition to response modeling in search of improved performance. And support vector machine (SVM) has also been attracting much attention lately. This paper presents an SVM response model. We are specifically focusing on the how-to’s to circumvent practical obstacles, such as how to face with class imbalance problem, how to produce the scores from an SVM classifier for lift chart analysis, and how to evaluate the models on accuracy and profit. Besides coping with the intractability problem of SVM training caused by large marketing dataset, a previously proposed pattern selection algorithm is introduced. SVM training accompanies time complexity of the cube of training set size. The pattern selection algorithm picks up important training patterns before SVM response modeling. We made comparison on SVM training results between the pattern selection algorithm and random sampling. Three aspects of SVM response models were evaluated: accuracies, lift chart analysis, and computational efficiency. The SVM trained with selected patterns showed a high accuracy, a high uplift in profit and in response rate, and a high computational efficiency.

PDF [BibTex]

2003

PDF [BibTex]


no image
Bayesian Monte Carlo

Rasmussen, CE., Ghahramani, Z.

In Advances in Neural Information Processing Systems 15, pages: 489-496, (Editors: Becker, S. , S. Thrun, K. Obermayer), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
We investigate Bayesian alternatives to classical Monte Carlo methods for evaluating integrals. Bayesian Monte Carlo (BMC) allows the incorporation of prior knowledge, such as smoothness of the integrand, into the estimation. In a simple problem we show that this outperforms any classical importance sampling method. We also attempt more challenging multidimensional integrals involved in computing marginal likelihoods of statistical models (a.k.a. partition functions and model evidences). We find that Bayesian Monte Carlo outperformed Annealed Importance Sampling, although for very high dimensional problems or problems with massive multimodality BMC may be less adequate. One advantage of the Bayesian approach to Monte Carlo is that samples can be drawn from any distribution. This allows for the possibility of active design of sample points so as to maximise information gain.

PDF Web [BibTex]

PDF Web [BibTex]


no image
On the Complexity of Learning the Kernel Matrix

Bousquet, O., Herrmann, D.

In Advances in Neural Information Processing Systems 15, pages: 399-406, (Editors: Becker, S. , S. Thrun, K. Obermayer), The MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
We investigate data based procedures for selecting the kernel when learning with Support Vector Machines. We provide generalization error bounds by estimating the Rademacher complexities of the corresponding function classes. In particular we obtain a complexity bound for function classes induced by kernels with given eigenvectors, i.e., we allow to vary the spectrum and keep the eigenvectors fix. This bound is only a logarithmic factor bigger than the complexity of the function class induced by a single kernel. However, optimizing the margin over such classes leads to overfitting. We thus propose a suitable way of constraining the class. We use an efficient algorithm to solve the resulting optimization problem, present preliminary experimental results, and compare them to an alignment-based approach.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Control, Planning, Learning, and Imitation with Dynamic Movement Primitives

Schaal, S., Peters, J., Nakanishi, J., Ijspeert, A.

In IROS 2003, pages: 1-21, Workshop on Bilateral Paradigms on Humans and Humanoids, IEEE International Conference on Intelligent Robots and Systems, October 2003 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Discriminative Learning for Label Sequences via Boosting

Altun, Y., Hofmann, T., Johnson, M.

In Advances in Neural Information Processing Systems 15, pages: 977-984, (Editors: Becker, S. , S. Thrun, K. Obermayer ), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
This paper investigates a boosting approach to discriminative learning of label sequences based on a sequence rank loss function.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Multiple-step ahead prediction for non linear dynamic systems: A Gaussian Process treatment with propagation of the uncertainty

Girard, A., Rasmussen, CE., Quiñonero-Candela, J., Murray-Smith, R.

In Advances in Neural Information Processing Systems 15, pages: 529-536, (Editors: Becker, S. , S. Thrun, K. Obermayer), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
We consider the problem of multi-step ahead prediction in time series analysis using the non-parametric Gaussian process model. k-step ahead forecasting of a discrete-time non-linear dynamic system can be performed by doing repeated one-step ahead predictions. For a state-space model of the form y_t = f(y_{t-1},...,y_{t-L}), the prediction of y at time t + k is based on the point estimates of the previous outputs. In this paper, we show how, using an analytical Gaussian approximation, we can formally incorporate the uncertainty about intermediate regressor values, thus updating the uncertainty on the current prediction.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Cluster Kernels for Semi-Supervised Learning

Chapelle, O., Weston, J., Schölkopf, B.

In Advances in Neural Information Processing Systems 15, pages: 585-592, (Editors: S Becker and S Thrun and K Obermayer), MIT Press, Cambridge, MA, USA, 16th Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
We propose a framework to incorporate unlabeled data in kernel classifier, based on the idea that two points in the same cluster are more likely to have the same label. This is achieved by modifying the eigenspectrum of the kernel matrix. Experimental results assess the validity of this approach.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Mismatch String Kernels for SVM Protein Classification

Leslie, C., Eskin, E., Weston, J., Noble, W.

In Advances in Neural Information Processing Systems 15, pages: 1417-1424, (Editors: Becker, S. , S. Thrun, K. Obermayer), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the protein classification problem. These kernels measure sequence similarity based on shared occurrences of k-length subsequences, counted with up to m mismatches, and do not rely on any generative model for the positive training sequences. We compute the kernels efficiently using a mismatch tree data structure and report experiments on a benchmark SCOP dataset, where we show that the mismatch kernel used with an SVM classifier performs as well as the Fisher kernel, the most successful method for remote homology detection, while achieving considerable computational savings.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Incremental Gaussian Processes

Quinonero Candela, J., Winther, O.

In Advances in Neural Information Processing Systems 15, pages: 1001-1008, (Editors: Becker, S. , S. Thrun, K. Obermayer), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
In this paper, we consider Tipping‘s relevance vector machine (RVM) and formalize an incremental training strategy as a variant of the expectation-maximization (EM) algorithm that we call subspace EM. Working with a subset of active basis functions, the sparsity of the RVM solution will ensure that the number of basis functions and thereby the computational complexity is kept low. We also introduce a mean field approach to the intractable classification model that is expected to give a very good approximation to exact Bayesian inference and contains the Laplace approximation as a special case. We test the algorithms on two large data sets with O(10^3-10^4) examples. The results indicate that Bayesian learning of large data sets, e.g. the MNIST database is realistic.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Kernel Dependency Estimation

Weston, J., Chapelle, O., Elisseeff, A., Schölkopf, B., Vapnik, V.

In Advances in Neural Information Processing Systems 15, pages: 873-880, (Editors: S Becker and S Thrun and K Obermayer), MIT Press, Cambridge, MA, USA, 16th Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

PDF Web [BibTex]

PDF Web [BibTex]


no image
Derivative observations in Gaussian Process models of dynamic systems

Solak, E., Murray-Smith, R., Leithead, WE., Leith, D., Rasmussen, CE.

In Advances in Neural Information Processing Systems 15, pages: 1033-1040, (Editors: Becker, S., S. Thrun and K. Obermayer), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
Gaussian processes provide an approach to nonparametric modelling which allows a straightforward combination of function and derivative observations in an empirical model. This is of particular importance in identification of nonlinear dynamic systems from experimental data. 1) It allows us to combine derivative information, and associated uncertainty with normal function observations into the learning and inference process. This derivative information can be in the form of priors specified by an expert or identified from perturbation data close to equilibrium. 2) It allows a seamless fusion of multiple local linear models in a consistent manner, inferring consistent models and ensuring that integrability constraints are met. 3) It improves dramatically the computational efficiency of Gaussian process models for dynamic system identification, by summarising large quantities of near-equilibrium data by a handful of linearisations, reducing the training set size - traditionally a problem for Gaussian process models.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Linear Combinations of Optic Flow Vectors for Estimating Self-Motion: a Real-World Test of a Neural Model

Franz, MO., Chahl, JS.

In Advances in Neural Information Processing Systems 15, pages: 1319-1326, (Editors: Becker, S., S. Thrun and K. Obermayer), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
The tangential neurons in the fly brain are sensitive to the typical optic flow patterns generated during self-motion. In this study, we examine whether a simplified linear model of these neurons can be used to estimate self-motion from the optic flow. We present a theory for the construction of an estimator consisting of a linear combination of optic flow vectors that incorporates prior knowledge both about the distance distribution of the environment, and about the noise and self-motion statistics of the sensor. The estimator is tested on a gantry carrying an omnidirectional vision sensor. The experiments show that the proposed approach leads to accurate and robust estimates of rotation rates, whereas translation estimates turn out to be less reliable.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Clustering with the Fisher score

Tsuda, K., Kawanabe, M., Müller, K.

In Advances in Neural Information Processing Systems 15, pages: 729-736, (Editors: Becker, S. , S. Thrun, K. Obermayer), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
Recently the Fisher score (or the Fisher kernel) is increasingly used as a feature extractor for classification problems. The Fisher score is a vector of parameter derivatives of loglikelihood of a probabilistic model. This paper gives a theoretical analysis about how class information is preserved in the space of the Fisher score, which turns out that the Fisher score consists of a few important dimensions with class information and many nuisance dimensions. When we perform clustering with the Fisher score, K-Means type methods are obviously inappropriate because they make use of all dimensions. So we will develop a novel but simple clustering algorithm specialized for the Fisher score, which can exploit important dimensions. This algorithm is successfully tested in experiments with artificial data and real data (amino acid sequences).

PDF Web [BibTex]

PDF Web [BibTex]


no image
Large Margin Methods for Label Sequence Learning

Altun, Y., Hofmann, T.

In pages: 993-996, International Speech Communication Association, Bonn, Germany, 8th European Conference on Speech Communication and Technology (EuroSpeech), September 2003 (inproceedings)

Web [BibTex]

Web [BibTex]


no image
Fast Pattern Selection Algorithm for Support Vector Classifiers: "Time Complexity Analysis"

Shin, H., Cho, S.

In Lecture Notes in Computer Science (LNCS 2690), LNCS 2690, pages: 1008-1015, Springer-Verlag, Heidelberg, The 4th International Conference on Intelligent Data Engineering (IDEAL), September 2003 (inproceedings)

Abstract
Training SVM requires large memory and long cpu time when the pattern set is large. To alleviate the computational burden in SVM training, we propose a fast preprocessing algorithm which selects only the patterns near the decision boundary. The time complexity of the proposed algorithm is much smaller than that of the naive M^2 algorithm

PDF [BibTex]

PDF [BibTex]


no image
Marginalized Kernels between Labeled Graphs

Kashima, H., Tsuda, K., Inokuchi, A.

In 20th International Conference on Machine Learning, pages: 321-328, (Editors: Faucett, T. and N. Mishra), 20th International Conference on Machine Learning, August 2003 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Sparse Gaussian Processes: inference, subspace identification and model selection

Csato, L., Opper, M.

In Proceedings, pages: 1-6, (Editors: Van der Hof, , Wahlberg), The Netherlands, 13th IFAC Symposium on System Identifiaction, August 2003, electronical version; Index ThA02-2 (inproceedings)

Abstract
Gaussian Process (GP) inference is a probabilistic kernel method where the GP is treated as a latent function. The inference is carried out using the Bayesian online learning and its extension to the more general iterative approach which we call TAP/EP learning. Sparsity is introduced in this context to make the TAP/EP method applicable to large datasets. We address the prohibitive scaling of the number of parameters by defining a subset of the training data that is used as the support the GP, thus the number of required parameters is independent of the training set, similar to the case of ``Support--‘‘ or ``Relevance--Vectors‘‘. An advantage of the full probabilistic treatment is that allows the computation of the marginal data likelihood or evidence, leading to hyper-parameter estimation within the GP inference. An EM algorithm to choose the hyper-parameters is proposed. The TAP/EP learning is the E-step and the M-step then updates the hyper-parameters. Due to the sparse E-step the resulting algorithm does not involve manipulation of large matrices. The presented algorithm is applicable to a wide variety of likelihood functions. We present results of applying the algorithm on classification and nonstandard regression problems for artificial and real datasets.

PDF GZIP [BibTex]

PDF GZIP [BibTex]


no image
Adaptive, Cautious, Predictive control with Gaussian Process Priors

Murray-Smith, R., Sbarbaro, D., Rasmussen, CE., Girard, A.

In Proceedings of the 13th IFAC Symposium on System Identification, pages: 1195-1200, (Editors: Van den Hof, P., B. Wahlberg and S. Weiland), Proceedings of the 13th IFAC Symposium on System Identification, August 2003 (inproceedings)

Abstract
Nonparametric Gaussian Process models, a Bayesian statistics approach, are used to implement a nonlinear adaptive control law. Predictions, including propagation of the state uncertainty are made over a k-step horizon. The expected value of a quadratic cost function is minimised, over this prediction horizon, without ignoring the variance of the model predictions. The general method and its main features are illustrated on a simulation example.

PDF [BibTex]

PDF [BibTex]


no image
Generative Model-based Clustering of Directional Data

Banerjee, A., Dhillon, I., Ghosh, J., Sra, S.

In Proc. ACK SIGKDD, pages: 00-00, KDD, August 2003 (inproceedings)

GZIP [BibTex]

GZIP [BibTex]


no image
Hidden Markov Support Vector Machines

Altun, Y., Tsochantaridis, I., Hofmann, T.

In pages: 4-11, (Editors: Fawcett, T. , N. Mishra), AAAI Press, Menlo Park, CA, USA, Twentieth International Conference on Machine Learning (ICML), August 2003 (inproceedings)

Web [BibTex]

Web [BibTex]


no image
How Many Neighbors To Consider in Pattern Pre-selection for Support Vector Classifiers?

Shin, H., Cho, S.

In Proc. of INNS-IEEE International Joint Conference on Neural Networks (IJCNN 2003), pages: 565-570, IJCNN, July 2003 (inproceedings)

Abstract
Training support vector classifiers (SVC) requires large memory and long cpu time when the pattern set is large. To alleviate the computational burden in SVC training, we previously proposed a preprocessing algorithm which selects only the patterns in the overlap region around the decision boundary, based on neighborhood properties [8], [9], [10]. The k-nearest neighbors’ class label entropy for each pattern was used to estimate the pattern’s proximity to the decision boundary. The value of parameter k is critical, yet has been determined by a rather ad-hoc fashion. We propose in this paper a systematic procedure to determine k and show its effectiveness through experiments.

PDF [BibTex]

PDF [BibTex]


no image
On the Representation, Learning and Transfer of Spatio-Temporal Movement Characteristics

Ilg, W., Bakir, GH., Mezger, J., Giese, MA.

In Humanoids Proceedings, pages: 0-0, Humanoids Proceedings, July 2003, electronical version (inproceedings)

Abstract
In this paper we present a learning-based approach for the modelling of complex movement sequences. Based on the method of Spatio-Temporal Morphable Models (STMMS. We derive a hierarchical algorithm that, in a first step, identifies automatically movement elements in movement sequences based on a coarse spatio-temporal description, and in a second step models these movement primitives by approximation through linear combinations of learned example movement trajectories. We describe the different steps of the algorithm and show how it can be applied for modelling and synthesis of complex sequences of human movements that contain movement elements with variable style. The proposed method is demonstrated on different applications of movement representation relevant for imitation learning of movement styles in humanoid robotics.

PDF [BibTex]

PDF [BibTex]


no image
Loss Functions and Optimization Methods for Discriminative Learning of Label Sequences

Altun, Y., Johnson, M., Hofmann, T.

In pages: 145-152, (Editors: Collins, M. , M. Steedman), ACL, East Stroudsburg, PA, USA, Conference on Empirical Methods in Natural Language Processing (EMNLP) , July 2003 (inproceedings)

Abstract
Discriminative models have been of interest in the NLP community in recent years. Previous research has shown that they are advantageous over generative models. In this paper, we investigate how different objective functions and optimization methods affect the performance of the classifiers in the discriminative learning framework. We focus on the sequence labelling problem, particularly POS tagging and NER tasks. Our experiments show that changing the objective function is not as effective as changing the features included in the model.

Web [BibTex]

Web [BibTex]


no image
Time Complexity Analysis of Fast Pattern Selection Algorithm for SVM

Shin, H., Cho, S.

In Proc. of the Korean Data Mining Conference, pages: 221-231, Korean Data Mining Conference, June 2003 (inproceedings)

[BibTex]

[BibTex]


no image
Fast Pattern Selection for Support Vector Classifiers

Shin, H., Cho, S.

In PAKDD 2003, pages: 376-387, (Editors: Whang, K.-Y. , J. Jeon, K. Shim, J. Srivastava), Springer, Berlin, Germany, 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining, May 2003 (inproceedings)

Abstract
Training SVM requires large memory and long cpu time when the pattern set is large. To alleviate the computational burden in SVM training, we propose a fast preprocessing algorithm which selects only the patterns near the decision boundary. Preliminary simulation results were promising: Up to two orders of magnitude, training time reduction was achieved including the preprocessing, without any loss in classification accuracies.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]