Header logo is ei


2007


no image
Nonlinear Receptive Field Analysis: Making Kernel Methods Interpretable

Kienzle, W., Macke, J., Wichmann, F., Schölkopf, B., Franz, M.

Computational and Systems Neuroscience Meeting 2007 (COSYNE 2007), 4, pages: 16, February 2007 (poster)

PDF Web [BibTex]

2007

PDF Web [BibTex]


no image
Statistical Consistency of Kernel Canonical Correlation Analysis

Fukumizu, K., Bach, F., Gretton, A.

Journal of Machine Learning Research, 8, pages: 361-383, February 2007 (article)

Abstract
While kernel canonical correlation analysis (CCA) has been applied in many contexts, the convergence of finite sample estimates of the associated functions to their population counterparts has not yet been established. This paper gives a mathematical proof of the statistical convergence of kernel CCA, providing a theoretical justification for the method. The proof uses covariance operators defined on reproducing kernel Hilbert spaces, and analyzes the convergence of their empirical estimates of finite rank to their population counterparts, which can have infinite rank. The result also gives a sufficient condition for convergence on the regularization coefficient involved in kernel CCA: this should decrease as n^{-1/3}, where n is the number of data.

PDF [BibTex]

PDF [BibTex]


no image
Unsupervised learning of a steerable basis for invariant image representations

Bethge, M., Gerwinn, S., Macke, J.

In Human Vision and Electronic Imaging XII, pages: 1-12, (Editors: Rogowitz, B. E.), SPIE, Bellingham, WA, USA, SPIE Human Vision and Electronic Imaging Conference, February 2007 (inproceedings)

Abstract
There are two aspects to unsupervised learning of invariant representations of images: First, we can reduce the dimensionality of the representation by finding an optimal trade-off between temporal stability and informativeness. We show that the answer to this optimization problem is generally not unique so that there is still considerable freedom in choosing a suitable basis. Which of the many optimal representations should be selected? Here, we focus on this second aspect, and seek to find representations that are invariant under geometrical transformations occuring in sequences of natural images. We utilize ideas of steerability and Lie groups, which have been developed in the context of filter design. In particular, we show how an anti-symmetric version of canonical correlation analysis can be used to learn a full-rank image basis which is steerable with respect to rotations. We provide a geometric interpretation of this algorithm by showing that it finds the two-dimensional eigensubspaces of the avera ge bivector. For data which exhibits a variety of transformations, we develop a bivector clustering algorithm, which we use to learn a basis of generalized quadrature pairs (i.e. complex cells) from sequences of natural images.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Estimating Population Receptive Fields in Space and Time

Macke, J., Zeck, G., Bethge, M.

Computational and Systems Neuroscience Meeting 2007 (COSYNE 2007), 4, pages: 44, February 2007 (poster)

PDF Web [BibTex]

PDF Web [BibTex]


no image
Machine Learning for Mass Production and Industrial Engineering

Pfingsten, T.

Biologische Kybernetik, Eberhard-Karls-Universität Tübingen, Tübingen, Germany, February 2007 (phdthesis)

PDF [BibTex]

PDF [BibTex]


no image
New Margin- and Evidence-Based Approaches for EEG Signal Classification

Hill, N., Farquhar, J.

Invited talk at the FaSor Jahressymposium, February 2007 (talk)

PDF [BibTex]

PDF [BibTex]


no image
On the Pre-Image Problem in Kernel Methods

BakIr, G., Schölkopf, B., Weston, J.

In Kernel Methods in Bioengineering, Signal and Image Processing, pages: 284-302, (Editors: G Camps-Valls and JL Rojo-Álvarez and M Martínez-Ramón), Idea Group Publishing, Hershey, PA, USA, January 2007 (inbook)

Abstract
In this chapter we are concerned with the problem of reconstructing patterns from their representation in feature space, known as the pre-image problem. We review existing algorithms and propose a learning based approach. All algorithms are discussed regarding their usability and complexity and evaluated on an image denoising application.

DOI [BibTex]

DOI [BibTex]


no image
A Subspace Kernel for Nonlinear Feature Extraction

Wu, M., Farquhar, J.

In IJCAI-07, pages: 1125-1130, (Editors: Veloso, M. M.), AAAI Press, Menlo Park, CA, USA, International Joint Conference on Artificial Intelligence, January 2007 (inproceedings)

Abstract
Kernel based nonlinear Feature Extraction (KFE) or dimensionality reduction is a widely used pre-processing step in pattern classification and data mining tasks. Given a positive definite kernel function, it is well known that the input data are implicitly mapped to a feature space with usually very high dimensionality. The goal of KFE is to find a low dimensional subspace of this feature space, which retains most of the information needed for classification or data analysis. In this paper, we propose a subspace kernel based on which the feature extraction problem is transformed to a kernel parameter learning problem. The key observation is that when projecting data into a low dimensional subspace of the feature space, the parameters that are used for describing this subspace can be regarded as the parameters of the kernel function between the projected data. Therefore current kernel parameter learning methods can be adapted to optimize this parameterized kernel function. Experimental results are provided to validate the effectiveness of the proposed approach.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Some observations on the pedestal effect

Henning, G., Wichmann, F.

Journal of Vision, 7(1:3):1-15, January 2007 (article)

Abstract
The pedestal or dipper effect is the large improvement in the detectability of a sinusoidal grating observed when it is added to a masking or pedestal grating of the same spatial frequency, orientation, and phase. We measured the pedestal effect in both broadband and notched noiseVnoise from which a 1.5-octave band centered on the signal frequency had been removed. Although the pedestal effect persists in broadband noise, it almost disappears in the notched noise. Furthermore, the pedestal effect is substantial when either high- or low-pass masking noise is used. We conclude that the pedestal effect in the absence of notched noise results principally from the use of information derived from channels with peak sensitivities at spatial frequencies different from that of the signal and the pedestal. We speculate that the spatial-frequency components of the notched noise above and below the spatial frequency of the signal and the pedestal prevent ‘‘off-frequency looking,’’ that is, prevent the use of information about changes in contrast carried in channels tuned to spatial frequencies that are very much different from that of the signal and the pedestal. Thus, the pedestal or dipper effect measured without notched noise appears not to be a characteristic of individual spatial-frequency-tuned channels.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Development of a Brain-Computer Interface Approach Based on Covert Attention to Tactile Stimuli

Raths, C.

University of Tübingen, Germany, University of Tübingen, Germany, January 2007 (diplomathesis)

[BibTex]

[BibTex]


no image
Cue Combination and the Effect of Horizontal Disparity and Perspective on Stereoacuity

Zalevski, AM., Henning, GB., Hill, NJ.

Spatial Vision, 20(1):107-138, January 2007 (article)

Abstract
Relative depth judgments of vertical lines based on horizontal disparity deteriorate enormously when the lines form part of closed configurations (Westheimer, 1979). In studies showing this effect, perspective was not manipulated and thus produced inconsistency between horizontal disparity and perspective. We show that stereoacuity improves dramatically when perspective and horizontal disparity are made consistent. Observers appear to use unhelpful perspective cues in judging the relative depth of the vertical sides of rectangles in a way not incompatible with a form of cue weighting. However, 95% confidence intervals for the weights derived for cues usually exceed the a-priori [0-1] range.

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
A Machine Learning Approach for Estimating the Attenuation Map for a Combined PET/MR Scanner

Hofmann, M.

Biologische Kybernetik, Max-Planck Institute for Biological Cybernetics, Tübingen, Germany, 2007 (diplomathesis)

[BibTex]

[BibTex]


no image
Mathematik der Wahrnehmung: Wendepunkte

Wichman, F., Ernst, MO.

Akademische Mitteilungen zw{\"o}lf: F{\"u}nf Sinne, pages: 32-37, 2007 (misc)

[BibTex]

[BibTex]


no image
Towards Machine Learning of Motor Skills

Peters, J., Schaal, S., Schölkopf, B.

In Proceedings of Autonome Mobile Systeme (AMS), pages: 138-144, (Editors: K Berns and T Luksch), 2007, clmc (inproceedings)

Abstract
Autonomous robots that can adapt to novel situations has been a long standing vision of robotics, artificial intelligence, and cognitive sciences. Early approaches to this goal during the heydays of artificial intelligence research in the late 1980s, however, made it clear that an approach purely based on reasoning or human insights would not be able to model all the perceptuomotor tasks that a robot should fulfill. Instead, new hope was put in the growing wake of machine learning that promised fully adaptive control algorithms which learn both by observation and trial-and-error. However, to date, learning techniques have yet to fulfill this promise as only few methods manage to scale into the high-dimensional domains of manipulator robotics, or even the new upcoming trend of humanoid robotics, and usually scaling was only achieved in precisely pre-structured domains. In this paper, we investigate the ingredients for a general approach to motor skill learning in order to get one step closer towards human-like performance. For doing so, we study two ma jor components for such an approach, i.e., firstly, a theoretically well-founded general approach to representing the required control structures for task representation and execution and, secondly, appropriate learning algorithms which can be applied in this setting.

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Reinforcement Learning for Optimal Control of Arm Movements

Theodorou, E., Peters, J., Schaal, S.

In Abstracts of the 37st Meeting of the Society of Neuroscience., Neuroscience, 2007, clmc (inproceedings)

Abstract
Every day motor behavior consists of a plethora of challenging motor skills from discrete movements such as reaching and throwing to rhythmic movements such as walking, drumming and running. How this plethora of motor skills can be learned remains an open question. In particular, is there any unifying computa-tional framework that could model the learning process of this variety of motor behaviors and at the same time be biologically plausible? In this work we aim to give an answer to these questions by providing a computational framework that unifies the learning mechanism of both rhythmic and discrete movements under optimization criteria, i.e., in a non-supervised trial-and-error fashion. Our suggested framework is based on Reinforcement Learning, which is mostly considered as too costly to be a plausible mechanism for learning com-plex limb movement. However, recent work on reinforcement learning with pol-icy gradients combined with parameterized movement primitives allows novel and more efficient algorithms. By using the representational power of such mo-tor primitives we show how rhythmic motor behaviors such as walking, squash-ing and drumming as well as discrete behaviors like reaching and grasping can be learned with biologically plausible algorithms. Using extensive simulations and by using different reward functions we provide results that support the hy-pothesis that Reinforcement Learning could be a viable candidate for motor learning of human motor behavior when other learning methods like supervised learning are not feasible.

[BibTex]

[BibTex]


no image
Reinforcement learning by reward-weighted regression for operational space control

Peters, J., Schaal, S.

In Proceedings of the 24th Annual International Conference on Machine Learning, pages: 745-750, ICML, 2007, clmc (inproceedings)

Abstract
Many robot control problems of practical importance, including operational space control, can be reformulated as immediate reward reinforcement learning problems. However, few of the known optimization or reinforcement learning algorithms can be used in online learning control for robots, as they are either prohibitively slow, do not scale to interesting domains of complex robots, or require trying out policies generated by random search, which are infeasible for a physical system. Using a generalization of the EM-base reinforcement learning framework suggested by Dayan & Hinton, we reduce the problem of learning with immediate rewards to a reward-weighted regression problem with an adaptive, integrated reward transformation for faster convergence. The resulting algorithm is efficient, learns smoothly without dangerous jumps in solution space, and works well in applications of complex high degree-of-freedom robots.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Policy gradient methods for machine learning

Peters, J., Theodorou, E., Schaal, S.

In Proceedings of the 14th INFORMS Conference of the Applied Probability Society, pages: 97-98, Eindhoven, Netherlands, July 9-11, 2007, 2007, clmc (inproceedings)

Abstract
We present an in-depth survey of policy gradient methods as they are used in the machine learning community for optimizing parameterized, stochastic control policies in Markovian systems with respect to the expected reward. Despite having been developed separately in the reinforcement learning literature, policy gradient methods employ likelihood ratio gradient estimators as also suggested in the stochastic simulation optimization community. It is well-known that this approach to policy gradient estimation traditionally suffers from three drawbacks, i.e., large variance, a strong dependence on baseline functions and a inefficient gradient descent. In this talk, we will present a series of recent results which tackles each of these problems. The variance of the gradient estimation can be reduced significantly through recently introduced techniques such as optimal baselines, compatible function approximations and all-action gradients. However, as even the analytically obtainable policy gradients perform unnaturally slow, it required the step from ÔvanillaÕ policy gradient methods towards natural policy gradients in order to overcome the inefficiency of the gradient descent. This development resulted into the Natural Actor-Critic architecture which can be shown to be very efficient in application to motor primitive learning for robotics.

[BibTex]

[BibTex]


no image
Policy Learning for Motor Skills

Peters, J., Schaal, S.

In Proceedings of 14th International Conference on Neural Information Processing (ICONIP), pages: 233-242, (Editors: Ishikawa, M. , K. Doya, H. Miyamoto, T. Yamakawa), 2007, clmc (inproceedings)

Abstract
Policy learning which allows autonomous robots to adapt to novel situations has been a long standing vision of robotics, artificial intelligence, and cognitive sciences. However, to date, learning techniques have yet to fulfill this promise as only few methods manage to scale into the high-dimensional domains of manipulator robotics, or even the new upcoming trend of humanoid robotics, and usually scaling was only achieved in precisely pre-structured domains. In this paper, we investigate the ingredients for a general approach policy learning with the goal of an application to motor skill refinement in order to get one step closer towards human-like performance. For doing so, we study two major components for such an approach, i.e., firstly, we study policy learning algorithms which can be applied in the general setting of motor skill learning, and, secondly, we study a theoretically well-founded general approach to representing the required control structures for task representation and execution.

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Reinforcement learning for operational space control

Peters, J., Schaal, S.

In Proceedings of the 2007 IEEE International Conference on Robotics and Automation, pages: 2111-2116, IEEE Computer Society, ICRA, 2007, clmc (inproceedings)

Abstract
While operational space control is of essential importance for robotics and well-understood from an analytical point of view, it can be prohibitively hard to achieve accurate control in face of modeling errors, which are inevitable in complex robots, e.g., humanoid robots. In such cases, learning control methods can offer an interesting alternative to analytical control algorithms. However, the resulting supervised learning problem is ill-defined as it requires to learn an inverse mapping of a usually redundant system, which is well known to suffer from the property of non-convexity of the solution space, i.e., the learning system could generate motor commands that try to steer the robot into physically impossible configurations. The important insight that many operational space control algorithms can be reformulated as optimal control problems, however, allows addressing this inverse learning problem in the framework of reinforcement learning. However, few of the known optimization or reinforcement learning algorithms can be used in online learning control for robots, as they are either prohibitively slow, do not scale to interesting domains of complex robots, or require trying out policies generated by random search, which are infeasible for a physical system. Using a generalization of the EM-based reinforcement learning framework suggested by Dayan & Hinton, we reduce the problem of learning with immediate rewards to a reward-weighted regression problem with an adaptive, integrated reward transformation for faster convergence. The resulting algorithm is efficient, learns smoothly without dangerous jumps in solution space, and works well in applications of complex high degree-of-freedom robots.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Relative Entropy Policy Search

Peters, J.

CLMC Technical Report: TR-CLMC-2007-2, Computational Learning and Motor Control Lab, Los Angeles, CA, 2007, clmc (techreport)

Abstract
This technical report describes a cute idea of how to create new policy search approaches. It directly relates to the Natural Actor-Critic methods but allows the derivation of one shot solutions. Future work may include the application to interesting problems.

PDF link (url) [BibTex]

PDF link (url) [BibTex]


no image
Using reward-weighted regression for reinforcement learning of task space control

Peters, J., Schaal, S.

In Proceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pages: 262-267, Honolulu, Hawaii, April 1-5, 2007, 2007, clmc (inproceedings)

Abstract
In this paper, we evaluate different versions from the three main kinds of model-free policy gradient methods, i.e., finite difference gradients, `vanilla' policy gradients and natural policy gradients. Each of these methods is first presented in its simple form and subsequently refined and optimized. By carrying out numerous experiments on the cart pole regulator benchmark we aim to provide a useful baseline for future research on parameterized policy search algorithms. Portable C++ code is provided for both plant and algorithms; thus, the results in this paper can be reevaluated, reused and new algorithms can be inserted with ease.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark

Riedmiller, M., Peters, J., Schaal, S.

In Proceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pages: 254-261, ADPRL, 2007, clmc (inproceedings)

Abstract
In this paper, we evaluate different versions from the three main kinds of model-free policy gradient methods, i.e., finite difference gradients, `vanilla' policy gradients and natural policy gradients. Each of these methods is first presented in its simple form and subsequently refined and optimized. By carrying out numerous experiments on the cart pole regulator benchmark we aim to provide a useful baseline for future research on parameterized policy search algorithms. Portable C++ code is provided for both plant and algorithms; thus, the results in this paper can be reevaluated, reused and new algorithms can be inserted with ease.

PDF [BibTex]

PDF [BibTex]

2005


no image
Kernel Methods for Measuring Independence

Gretton, A., Herbrich, R., Smola, A., Bousquet, O., Schölkopf, B.

Journal of Machine Learning Research, 6, pages: 2075-2129, December 2005 (article)

Abstract
We introduce two new functionals, the constrained covariance and the kernel mutual information, to measure the degree of independence of random variables. These quantities are both based on the covariance between functions of the random variables in reproducing kernel Hilbert spaces (RKHSs). We prove that when the RKHSs are universal, both functionals are zero if and only if the random variables are pairwise independent. We also show that the kernel mutual information is an upper bound near independence on the Parzen window estimate of the mutual information. Analogous results apply for two correlation-based dependence functionals introduced earlier: we show the kernel canonical correlation and the kernel generalised variance to be independence measures for universal kernels, and prove the latter to be an upper bound on the mutual information near independence. The performance of the kernel dependence functionals in measuring independence is verified in the context of independent component analysis.

PDF PostScript PDF [BibTex]

2005

PDF PostScript PDF [BibTex]


no image
Kernel ICA for Large Scale Problems

Jegelka, S., Gretton, A., Achlioptas, D.

In pages: -, NIPS Workshop on Large Scale Kernel Machines, December 2005 (inproceedings)

Web [BibTex]

Web [BibTex]


no image
Some thoughts about Gaussian Processes

Chapelle, O.

NIPS Workshop on Open Problems in Gaussian Processes for Machine Learning, December 2005 (talk)

PDF Web [BibTex]

PDF Web [BibTex]


no image
A Unifying View of Sparse Approximate Gaussian Process Regression

Quinonero Candela, J., Rasmussen, C.

Journal of Machine Learning Research, 6, pages: 1935-1959, December 2005 (article)

Abstract
We provide a new unifying view, including all existing proper probabilistic sparse approximations for Gaussian process regression. Our approach relies on expressing the effective prior which the methods are using. This allows new insights to be gained, and highlights the relationship between existing methods. It also allows for a clear theoretically justified ranking of the closeness of the known approximations to the corresponding full GPs. Finally we point directly to designs of new better sparse approximations, combining the best of the existing strategies, within attractive computational constraints.

PDF [BibTex]

PDF [BibTex]


no image
Popper, Falsification and the VC-dimension

Corfield, D., Schölkopf, B., Vapnik, V.

(145), Max Planck Institute for Biological Cybernetics, November 2005 (techreport)

PDF [BibTex]

PDF [BibTex]


no image
Extension to Kernel Dependency Estimation with Applications to Robotics

BakIr, G.

Biologische Kybernetik, Technische Universität Berlin, Berlin, November 2005 (phdthesis)

Abstract
Kernel Dependency Estimation(KDE) is a novel technique which was designed to learn mappings between sets without making assumptions on the type of the involved input and output data. It learns the mapping in two stages. In a first step, it tries to estimate coordinates of a feature space representation of elements of the set by solving a high dimensional multivariate regression problem in feature space. Following this, it tries to reconstruct the original representation given the estimated coordinates. This thesis introduces various algorithmic extensions to both stages in KDE. One of the contributions of this thesis is to propose a novel linear regression algorithm that explores low-dimensional subspaces during learning. Furthermore various existing strategies for reconstructing patterns from feature maps involved in KDE are discussed and novel pre-image techniques are introduced. In particular, pre-image techniques for data-types that are of discrete nature such as graphs and strings are investigated. KDE is then explored in the context of robot pose imitation where the input is a an image with a human operator and the output is the robot articulated variables. Thus, using KDE, robot pose imitation is formulated as a regression problem.

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Kernel methods for dependence testing in LFP-MUA

Gretton, A., Belitski, A., Murayama, Y., Schölkopf, B., Logothetis, N.

35(689.17), 35th Annual Meeting of the Society for Neuroscience (Neuroscience), November 2005 (poster)

Abstract
A fundamental problem in neuroscience is determining whether or not particular neural signals are dependent. The correlation is the most straightforward basis for such tests, but considerable work also focuses on the mutual information (MI), which is capable of revealing dependence of higher orders that the correlation cannot detect. That said, there are other measures of dependence that share with the MI an ability to detect dependence of any order, but which can be easier to compute in practice. We focus in particular on tests based on the functional covariance, which derive from work originally accomplished in 1959 by Renyi. Conceptually, our dependence tests work by computing the covariance between (infinite dimensional) vectors of nonlinear mappings of the observations being tested, and then determining whether this covariance is zero - we call this measure the constrained covariance (COCO). When these vectors are members of universal reproducing kernel Hilbert spaces, we can prove this covariance to be zero only when the variables being tested are independent. The greatest advantage of these tests, compared with the mutual information, is their simplicity – when comparing two signals, we need only take the largest eigenvalue (or the trace) of a product of two matrices of nonlinearities, where these matrices are generally much smaller than the number of observations (and are very simple to construct). We compare the mutual information, the COCO, and the correlation in the context of finding changes in dependence between the LFP and MUA signals in the primary visual cortex of the anaesthetized macaque, during the presentation of dynamic natural stimuli. We demonstrate that the MI and COCO reveal dependence which is not detected by the correlation alone (which we prove by artificially removing all correlation between the signals, and then testing their dependence with COCO and the MI); and that COCO and the MI give results consistent with each other on our data.

Web [BibTex]

Web [BibTex]


no image
Training Support Vector Machines with Multiple Equality Constraints

Kienzle, W., Schölkopf, B.

In Proceedings of the 16th European Conference on Machine Learning, Lecture Notes in Computer Science, Vol. 3720, pages: 182-193, (Editors: JG Carbonell and J Siekmann), Springer, Berlin, Germany, ECML, November 2005 (inproceedings)

Abstract
In this paper we present a primal-dual decomposition algorithm for support vector machine training. As with existing methods that use very small working sets (such as Sequential Minimal Optimization (SMO), Successive Over-Relaxation (SOR) or the Kernel Adatron (KA)), our method scales well, is straightforward to implement, and does not require an external QP solver. Unlike SMO, SOR and KA, the method is applicable to a large number of SVM formulations regardless of the number of equality constraints involved. The effectiveness of our algorithm is demonstrated on a more difficult SVM variant in this respect, namely semi-parametric support vector regression.

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Geometrical aspects of statistical learning theory

Hein, M.

Biologische Kybernetik, Darmstadt, Darmstadt, November 2005 (phdthesis)

PDF [BibTex]

PDF [BibTex]


no image
Measuring Statistical Dependence with Hilbert-Schmidt Norms

Gretton, A., Bousquet, O., Smola, A., Schoelkopf, B.

In Algorithmic Learning Theory, Lecture Notes in Computer Science, Vol. 3734, pages: 63-78, (Editors: S Jain and H-U Simon and E Tomita), Springer, Berlin, Germany, 16th International Conference ALT, October 2005 (inproceedings)

Abstract
We propose an independence criterion based on the eigenspectrum of covariance operators in reproducing kernel Hilbert spaces (RKHSs), consisting of an empirical estimate of the Hilbert-Schmidt norm of the cross-covariance operator (we term this a Hilbert-Schmidt Independence Criterion, or HSIC). This approach has several advantages, compared with previous kernel-based independence criteria. First, the empirical estimate is simpler than any other kernel dependence test, and requires no user-defined regularisation. Second, there is a clearly defined population quantity which the empirical estimate approaches in the large sample limit, with exponential convergence guaranteed between the two: this ensures that independence tests based on {methodname} do not suffer from slow learning rates. Finally, we show in the context of independent component analysis (ICA) that the performance of HSIC is competitive with that of previously published kernel-based criteria, and of other recently published ICA methods.

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Maximal Margin Classification for Metric Spaces

Hein, M., Bousquet, O., Schölkopf, B.

Journal of Computer and System Sciences, 71(3):333-359, October 2005 (article)

Abstract
In order to apply the maximum margin method in arbitrary metric spaces, we suggest to embed the metric space into a Banach or Hilbert space and to perform linear classification in this space. We propose several embeddings and recall that an isometric embedding in a Banach space is always possible while an isometric embedding in a Hilbert space is only possible for certain metric spaces. As a result, we obtain a general maximum margin classification algorithm for arbitrary metric spaces (whose solution is approximated by an algorithm of Graepel. Interestingly enough, the embedding approach, when applied to a metric which can be embedded into a Hilbert space, yields the SVM algorithm, which emphasizes the fact that its solution depends on the metric and not on the kernel. Furthermore we give upper bounds of the capacity of the function classes corresponding to both embeddings in terms of Rademacher averages. Finally we compare the capacities of these function classes directly.

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
An Analysis of the Anti-Learning Phenomenon for the Class Symmetric Polyhedron

Kowalczyk, A., Chapelle, O.

In Algorithmic Learning Theory: 16th International Conference, pages: 78-92, Algorithmic Learning Theory, October 2005 (inproceedings)

Abstract
This paper deals with an unusual phenomenon where most machine learning algorithms yield good performance on the training set but systematically worse than random performance on the test set. This has been observed so far for some natural data sets and demonstrated for some synthetic data sets when the classification rule is learned from a small set of training samples drawn from some high dimensional space. The initial analysis presented in this paper shows that anti-learning is a property of data sets and is quite distinct from overfitting of a training data. Moreover, the analysis leads to a specification of some machine learning procedures which can overcome anti-learning and generate ma- chines able to classify training and test data consistently.

PDF [BibTex]

PDF [BibTex]


no image
Selective integration of multiple biological data for supervised network inference

Kato, T., Tsuda, K., Asai, K.

Bioinformatics, 21(10):2488 , October 2005 (article)

PDF [BibTex]

PDF [BibTex]


no image
Assessing Approximate Inference for Binary Gaussian Process Classification

Kuss, M., Rasmussen, C.

Journal of Machine Learning Research, 6, pages: 1679 , October 2005 (article)

Abstract
Gaussian process priors can be used to define flexible, probabilistic classification models. Unfortunately exact Bayesian inference is analytically intractable and various approximation techniques have been proposed. In this work we review and compare Laplace‘s method and Expectation Propagation for approximate Bayesian inference in the binary Gaussian process classification model. We present a comprehensive comparison of the approximations, their predictive performance and marginal likelihood estimates to results obtained by MCMC sampling. We explain theoretically and corroborate empirically the advantages of Expectation Propagation compared to Laplace‘s method.

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Implicit Surfaces For Modelling Human Heads

Steinke, F.

Biologische Kybernetik, Eberhard-Karls-Universität, Tübingen, September 2005 (diplomathesis)

[BibTex]

[BibTex]


no image
Clustering on the Unit Hypersphere using von Mises-Fisher Distributions

Banerjee, A., Dhillon, I., Ghosh, J., Sra, S.

Journal of Machine Learning Research, 6, pages: 1345-1382, September 2005 (article)

Abstract
Several large scale data mining applications, such as text categorization and gene expression analysis, involve high-dimensional data that is also inherently directional in nature. Often such data is L2 normalized so that it lies on the surface of a unit hypersphere. Popular models such as (mixtures of) multi-variate Gaussians are inadequate for characterizing such data. This paper proposes a generative mixture-model approach to clustering directional data based on the von Mises-Fisher (vMF) distribution, which arises naturally for data distributed on the unit hypersphere. In particular, we derive and analyze two variants of the Expectation Maximization (EM) framework for estimating the mean and concentration parameters of this mixture. Numerical estimation of the concentration parameters is non-trivial in high dimensions since it involves functional inversion of ratios of Bessel functions. We also formulate two clustering algorithms corresponding to the variants of EM that we derive. Our approach provides a theoretical basis for the use of cosine similarity that has been widely employed by the information retrieval community, and obtains the spherical kmeans algorithm (kmeans with cosine similarity) as a special case of both variants. Empirical results on clustering of high-dimensional text and gene-expression data based on a mixture of vMF distributions show that the ability to estimate the concentration parameter for each vMF component, which is not present in existing approaches, yields superior results, especially for difficult clustering tasks in high-dimensional spaces.

PDF [BibTex]

PDF [BibTex]


no image
Support Vector Machines for 3D Shape Processing

Steinke, F., Schölkopf, B., Blanz, V.

Computer Graphics Forum, 24(3, EUROGRAPHICS 2005):285-294, September 2005 (article)

Abstract
We propose statistical learning methods for approximating implicit surfaces and computing dense 3D deformation fields. Our approach is based on Support Vector (SV) Machines, which are state of the art in machine learning. It is straightforward to implement and computationally competitive; its parameters can be automatically set using standard machine learning methods. The surface approximation is based on a modified Support Vector regression. We present applications to 3D head reconstruction, including automatic removal of outliers and hole filling. In a second step, we build on our SV representation to compute dense 3D deformation fields between two objects. The fields are computed using a generalized SVMachine enforcing correspondence between the previously learned implicit SV object representations, as well as correspondences between feature points if such points are available. We apply the method to the morphing of 3D heads and other objects.

PDF [BibTex]

PDF [BibTex]


no image
Rapid animal detection in natural scenes: Critical features are local

Wichmann, F., Rosas, P., Gegenfurtner, K.

Journal of Vision, 5(8):376, Fifth Annual Meeting of the Vision Sciences Society (VSS), September 2005 (poster)

Abstract
Thorpe et al (Nature 381, 1996) first showed how rapidly human observers are able to classify natural images as to whether they contain an animal or not. Whilst the basic result has been replicated using different response paradigms (yes-no versus forced-choice), modalities (eye movements versus button presses) as well as while measuring neurophysiological correlates (ERPs), it is still unclear which image features support this rapid categorisation. Recently Torralba and Oliva (Network: Computation in Neural Systems, 14, 2003) suggested that simple global image statistics can be used to predict seemingly complex decisions about the absence and/or presence of objects in natural scences. They show that the information contained in a small number (N=16) of spectral principal components (SPC)—principal component analysis (PCA) applied to the normalised power spectra of the images—is sufficient to achieve approximately 80% correct animal detection in natural scenes. Our goal was to test whether human observers make use of the power spectrum when rapidly classifying natural scenes. We measured our subjects' ability to detect animals in natural scenes as a function of presentation time (13 to 167 msec); images were immediately followed by a noise mask. In one condition we used the original images, in the other images whose power spectra were equalised (each power spectrum was set to the mean power spectrum over our ensemble of 1476 images). Thresholds for 75% correct animal detection were in the region of 20–30 msec for all observers, independent of the power spectrum of the images: this result makes it very unlikely that human observers make use of the global power spectrum. Taken together with the results of Gegenfurtner, Braun & Wichmann (Journal of Vision [abstract], 2003), showing the robustness of animal detection to global phase noise, we conclude that humans use local features, like edges and contours, in rapid animal detection.

Web DOI [BibTex]

Web DOI [BibTex]


no image
Fast Protein Classification with Multiple Networks

Tsuda, K., Shin, H., Schölkopf, B.

Bioinformatics, 21(Suppl. 2):59-65, September 2005 (article)

Abstract
Support vector machines (SVM) have been successfully used to classify proteins into functional categories. Recently, to integrate multiple data sources, a semidefinite programming (SDP) based SVM method was introduced Lanckriet et al (2004). In SDP/SVM, multiple kernel matrices corresponding to each of data sources are combined with weights obtained by solving an SDP. However, when trying to apply SDP/SVM to large problems, the computational cost can become prohibitive, since both converting the data to a kernel matrix for the SVM and solving the SDP are time and memory demanding. Another application-specific drawback arises when some of the data sources are protein networks. A common method of converting the network to a kernel matrix is the diffusion kernel method, which has time complexity of O(n^3), and produces a dense matrix of size n x n. We propose an efficient method of protein classification using multiple protein networks. Available protein networks, such as a physical interaction network or a metabolic network, can be directly incorporated. Vectorial data can also be incorporated after conversion into a network by means of neighbor point connection. Similarly to the SDP/SVM method, the combination weights are obtained by convex optimization. Due to the sparsity of network edges, the computation time is nearly linear in the number of edges of the combined network. Additionally, the combination weights provide information useful for discarding noisy or irrelevant networks. Experiments on function prediction of 3588 yeast proteins show promising results: the computation time is enormously reduced, while the accuracy is still comparable to the SDP/SVM method.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Iterative Kernel Principal Component Analysis for Image Modeling

Kim, K., Franz, M., Schölkopf, B.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(9):1351-1366, September 2005 (article)

Abstract
In recent years, Kernel Principal Component Analysis (KPCA) has been suggested for various image processing tasks requiring an image model such as, e.g., denoising or compression. The original form of KPCA, however, can be only applied to strongly restricted image classes due to the limited number of training examples that can be processed. We therefore propose a new iterative method for performing KPCA, the Kernel Hebbian Algorithm which iteratively estimates the Kernel Principal Components with only linear order memory complexity. In our experiments, we compute models for complex image classes such as faces and natural images which require a large number of training examples. The resulting image models are tested in single-frame super-resolution and denoising applications. The KPCA model is not specifically tailored to these tasks; in fact, the same model can be used in super-resolution with variable input resolution, or denoising with unknown noise characteristics. In spite of this, both super-resolution a nd denoising performance are comparable to existing methods.

Web DOI [BibTex]

Web DOI [BibTex]


no image
Learning an Interest Operator from Eye Movements

Kienzle, W., Franz, M., Wichmann, F., Schölkopf, B.

International Workshop on Bioinspired Information Processing (BIP 2005), 2005, pages: 1, September 2005 (poster)

PDF Web [BibTex]

PDF Web [BibTex]


no image
Machine Learning Methods for Brain-Computer Interdaces

Lal, TN.

Biologische Kybernetik, University of Darmstadt, September 2005 (phdthesis)

Web [BibTex]

Web [BibTex]


no image
Classification of natural scenes using global image statistics

Drewes, J., Wichmann, F., Gegenfurtner, K.

Journal of Vision, 5(8):602, Fifth Annual Meeting of the Vision Sciences Society (VSS), September 2005 (poster)

Abstract
The algorithmic classification of complex, natural scenes is generally considered a difficult task due to the large amount of information conveyed by natural images. Work by Simon Thorpe and colleagues showed that humans are capable of detecting animals within novel natural scenes with remarkable speed and accuracy. This suggests that the relevant information for classification can be extracted at comparatively limited computational cost. One hypothesis is that global image statistics such as the amplitude spectrum could underly fast image classification (Johnson & Olshausen, Journal of Vision, 2003; Torralba & Oliva, Network: Comput. Neural Syst., 2003). We used linear discriminant analysis to classify a set of 11.000 images into animal and non-animal images. After applying a DFT to the image, we put the Fourier spectrum into bins (8 orientations with 6 frequency bands each). Using all bins, classification performance on the Fourier spectrum reached 70%. However, performance was similar (67%) when only the high spatial frequency information was used and decreased steadily at lower spatial frequencies, reaching a minimum (50%) for the low spatial frequency information. Similar results were obtained when all bins were used on spatially filtered images. A detailed analysis of the classification weights showed that a relatively high level of performance (67%) could also be obtained when only 2 bins were used, namely the vertical and horizontal orientation at the highest spatial frequency band. Our results show that in the absence of sophisticated machine learning techniques, animal detection in natural scenes is limited to rather modest levels of performance, far below those of human observers. If limiting oneself to global image statistics such as the DFT then mostly information at the highest spatial frequencies is useful for the task. This is analogous to the results obtained with human observers on filtered images (Kirchner et al, VSS 2004).

Web DOI [BibTex]

Web DOI [BibTex]


no image
A Combinatorial View of Graph Laplacians

Huang, J.

(144), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, August 2005 (techreport)

Abstract
Discussions about different graph Laplacian, mainly normalized and unnormalized versions of graph Laplacian, have been ardent with respect to various methods in clustering and graph based semi-supervised learning. Previous research on graph Laplacians investigated their convergence properties to Laplacian operators on continuous manifolds. There is still no strong proof on convergence for the normalized Laplacian. In this paper, we analyze different variants of graph Laplacians directly from the ways solving the original graph partitioning problem. The graph partitioning problem is a well-known combinatorial NP hard optimization problem. The spectral solutions provide evidence that normalized Laplacian encodes more reasonable considerations for graph partitioning. We also provide some examples to show their differences.

[BibTex]

[BibTex]


no image
Phenotypic characterization of chondrosarcoma-derived cell lines

Schorle, C., Finger, F., Zien, A., Block, J., Gebhard, P., Aigner, T.

Cancer Letters, 226(2):143-154, August 2005 (article)

Abstract
Gene expression profiling of three chondrosarcoma derived cell lines (AD, SM, 105KC) showed an increased proliferative activity and a reduced expression of chondrocytic-typical matrix products compared to primary chondrocytes. The incapability to maintain an adequate matrix synthesis as well as a notable proliferative activity at the same time is comparable to neoplastic chondrosarcoma cells in vivo which cease largely cartilage matrix formation as soon as their proliferative activity increases. Thus, the investigated cell lines are of limited value as substitute of primary chondrocytes but might have a much higher potential to investigate the behavior of neoplastic chondrocytes, i.e. chondrosarcoma biology.

Web [BibTex]

Web [BibTex]


no image
Beyond Pairwise Classification and Clustering Using Hypergraphs

Zhou, D., Huang, J., Schölkopf, B.

(143), Max Planck Institute for Biological Cybernetics, August 2005 (techreport)

Abstract
In many applications, relationships among objects of interest are more complex than pairwise. Simply approximating complex relationships as pairwise ones can lead to loss of information. An alternative for these applications is to analyze complex relationships among data directly, without the need to first represent the complex relationships into pairwise ones. A natural way to describe complex relationships is to use hypergraphs. A hypergraph is a graph in which edges can connect more than two vertices. Thus we consider learning from a hypergraph, and develop a general framework which is applicable to classification and clustering for complex relational data. We have applied our framework to real-world web classification problems and obtained encouraging results.

PDF [BibTex]

PDF [BibTex]