Header logo is ei


2005


no image
Propagating Distributions on a Hypergraph by Dual Information Regularization

Tsuda, K.

In Proceedings of the 22nd International Conference on Machine Learning, pages: 921 , (Editors: De Raedt, L. , S. Wrobel), ICML Bonn, 2005 (inproceedings)

Abstract
In the information regularization framework by Corduneanu and Jaakkola (2005), the distributions of labels are propagated on a hypergraph for semi-supervised learning. The learning is efficiently done by a Blahut-Arimoto-like two step algorithm, but, unfortunately, one of the steps cannot be solved in a closed form. In this paper, we propose a dual version of information regularization, which is considered as more natural in terms of information geometry. Our learning algorithm has two steps, each of which can be solved in a closed form. Also it can be naturally applied to exponential family distributions such as Gaussians. In experiments, our algorithm is applied to protein classification based on a metabolic network and known functional categories.

[BibTex]

2005

[BibTex]


no image
Support Vector Machines and Kernel Algorithms

Schölkopf, B., Smola, A.

In Encyclopedia of Biostatistics (2nd edition), Vol. 8, 8, pages: 5328-5335, (Editors: P Armitage and T Colton), John Wiley & Sons, NY USA, 2005 (inbook)

[BibTex]

[BibTex]


no image
Moment Inequalities for Functions of Independent Random Variables

Boucheron, S., Bousquet, O., Lugosi, G., Massart, P.

To appear in Annals of Probability, 33, pages: 514-560, 2005 (article)

Abstract
A general method for obtaining moment inequalities for functions of independent random variables is presented. It is a generalization of the entropy method which has been used to derive concentration inequalities for such functions cite{BoLuMa01}, and is based on a generalized tensorization inequality due to Lata{l}a and Oleszkiewicz cite{LaOl00}. The new inequalities prove to be a versatile tool in a wide range of applications. We illustrate the power of the method by showing how it can be used to effortlessly re-derive classical inequalities including Rosenthal and Kahane-Khinchine-type inequalities for sums of independent random variables, moment inequalities for suprema of empirical processes, and moment inequalities for Rademacher chaos and $U$-statistics. Some of these corollaries are apparently new. In particular, we generalize Talagrands exponential inequality for Rademacher chaos of order two to any order. We also discuss applications for other complex functions of independent random variables, such as suprema of boolean polynomials which include, as special cases, subgraph counting problems in random graphs.

PDF [BibTex]

PDF [BibTex]


no image
A Brain Computer Interface with Online Feedback based on Magnetoencephalography

Lal, T., Schröder, M., Hill, J., Preissl, H., Hinterberger, T., Mellinger, J., Bogdan, M., Rosenstiel, W., Hofmann, T., Birbaumer, N., Schölkopf, B.

In Proceedings of the 22nd International Conference on Machine Learning, pages: 465-472, (Editors: L De Raedt and S Wrobel), ACM, New York, NY, USA, ICML, 2005 (inproceedings)

Abstract
The aim of this paper is to show that machine learning techniques can be used to derive a classifying function for human brain signal data measured by magnetoencephalography (MEG), for the use in a brain computer interface (BCI). This is especially helpful for evaluating quickly whether a BCI approach based on electroencephalography, on which training may be slower due to lower signalto- noise ratio, is likely to succeed. We apply recursive channel elimination and regularized SVMs to the experimental data of ten healthy subjects performing a motor imagery task. Four subjects were able to use a trained classifier together with a decision tree interface to write a short name. Further analysis gives evidence that the proposed imagination task is suboptimal for the possible extension to a multiclass interface. To the best of our knowledge this paper is the first working online BCI based on MEG recordings and is therefore a “proof of concept”.

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Healing the Relevance Vector Machine through Augmentation

Rasmussen, CE., Candela, JQ.

In Proceedings of the 22nd International Conference on Machine Learning, pages: 689 , (Editors: De Raedt, L. , S. Wrobel), ICML, 2005 (inproceedings)

Abstract
The Relevance Vector Machine (RVM) is a sparse approximate Bayesian kernel method. It provides full predictive distributions for test cases. However, the predictive uncertainties have the unintuitive property, that emph{they get smaller the further you move away from the training cases}. We give a thorough analysis. Inspired by the analogy to non-degenerate Gaussian Processes, we suggest augmentation to solve the problem. The purpose of the resulting model, RVM*, is primarily to corroborate the theoretical and experimental analysis. Although RVM* could be used in practical applications, it is no longer a truly sparse model. Experiments show that sparsity comes at the expense of worse predictive distributions.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Visual perception I: Basic principles

Wagemans, J., Wichmann, F., de Beeck, H.

In Handbook of Cognition, pages: 3-47, (Editors: Lamberts, K. , R. Goldstone), Sage, London, 2005 (inbook)

[BibTex]

[BibTex]


no image
Maximum-Margin Feature Combination for Detection and Categorization

BakIr, G., Wu, M., Eichhorn, J.

Max Planck Institute for Biological Cybernetics, Tübingen, Germany, 2005 (techreport)

Abstract
In this paper we are concerned with the optimal combination of features of possibly different types for detection and estimation tasks in machine vision. We propose to combine features such that the resulting classifier maximizes the margin between classes. In contrast to existing approaches which are non-convex and/or generative we propose to use a discriminative model leading to convex problem formulation and complexity control. Furthermore we assert that decision functions should not compare apples and oranges by comparing features of different types directly. Instead we propose to combine different similarity measures for each different feature type. Furthermore we argue that the question: ”Which feature type is more discriminative for task X?” is ill-posed and show empirically that the answer to this question might depend on the complexity of the decision function.

PDF [BibTex]

PDF [BibTex]


no image
Kernel-Methods, Similarity, and Exemplar Theories of Categorization

Jäkel, F., Wichmann, F.

ASIC, 4, 2005 (poster)

Abstract
Kernel-methods are popular tools in machine learning and statistics that can be implemented in a simple feed-forward neural network. They have strong connections to several psychological theories. For example, Shepard‘s universal law of generalization can be given a kernel interpretation. This leads to an inner product and a metric on the psychological space that is different from the usual Minkowski norm. The metric has psychologically interesting properties: It is bounded from above and does not have additive segments. As categorization models often rely on Shepard‘s law as a model for psychological similarity some of them can be recast as kernel-methods. In particular, ALCOVE is shown to be closely related to kernel logistic regression. The relationship to the Generalized Context Model is also discussed. It is argued that functional analysis which is routinely used in machine learning provides valuable insights also for psychology.

Web [BibTex]


no image
Rapid animal detection in natural scenes: critical features are local

Wichmann, F., Rosas, P., Gegenfurtner, K.

Experimentelle Psychologie. Beitr{\"a}ge zur 47. Tagung experimentell arbeitender Psychologen, 47, pages: 225, 2005 (poster)

[BibTex]

[BibTex]


no image
A novel representation of protein sequences for prediction of subcellular location using support vector machines

Matsuda, S., Vert, J., Saigo, H., Ueda, N., Toh, H., Akutsu, T.

Protein Science, 14, pages: 2804-2813, 2005 (article)

Abstract
As the number of complete genomes rapidly increases, accurate methods to automatically predict the subcellular location of proteins are increasingly useful to help their functional annotation. In order to improve the predictive accuracy of the many prediction methods developed to date, a novel representation of protein sequences is proposed. This representation involves local compositions of amino acids and twin amino acids, and local frequencies of distance between successive (basic, hydrophobic, and other) amino acids. For calculating the local features, each sequence is split into three parts: N-terminal, middle, and C-terminal. The N-terminal part is further divided into four regions to consider ambiguity in the length and position of signal sequences. We tested this representation with support vector machines on two data sets extracted from the SWISS-PROT database. Through fivefold cross-validation tests, overall accuracies of more than 87% and 91% were obtained for eukaryotic and prokaryotic proteins, respectively. It is concluded that considering the respective features in the N-terminal, middle, and C-terminal parts is helpful to predict the subcellular location. Keywords: subcellular location; signal sequence; amino acid composition; distance frequency; support vector machine; predictive accuracy

Web DOI [BibTex]

Web DOI [BibTex]


no image
Long Term Prediction of Product Quality in a Glass Manufacturing Process Using a Kernel Based Approach

Jung, T., Herrera, L., Schölkopf, B.

In Proceedings of the 8th International Work-Conferenceon Artificial Neural Networks (Computational Intelligence and Bioinspired Systems), Lecture Notes in Computer Science, Vol. 3512, LNCS 3512, pages: 960-967, (Editors: J Cabestany and A Prieto and F Sandoval), Springer, Berlin Heidelberg, Germany, IWANN, 2005 (inproceedings)

Abstract
In this paper we report the results obtained using a kernel-based approach to predict the temporal development of four response signals in the process control of a glass melting tank with 16 input parameters. The data set is a revised version1 from the modelling challenge in EUNITE-2003. The central difficulties are: large time-delays between changes in the inputs and the outputs, large number of data, and a general lack of knowledge about the relevant variables that intervene in the process. The methodology proposed here comprises Support Vector Machines (SVM) and Regularization Networks (RN). We use the idea of sparse approximation both as a means of regularization and as a means of reducing the computational complexity. Furthermore, we will use an incremental approach to add new training examples to the kernel-based method and efficiently update the current solution. This allows us to use a sophisticated learning scheme, where we iterate between prediction and training, with good computational efficiency and satisfactory results.

DOI [BibTex]

DOI [BibTex]


no image
Object correspondence as a machine learning problem

Schölkopf, B., Steinke, F., Blanz, V.

In Proceedings of the 22nd International Conference on Machine Learning, pages: 777-784, (Editors: L De Raedt and S Wrobel), ACM, New York, NY, USA, ICML, 2005 (inproceedings)

Abstract
We propose machine learning methods for the estimation of deformation fields that transform two given objects into each other, thereby establishing a dense point to point correspondence. The fields are computed using a modified support vector machine containing a penalty enforcing that points of one object will be mapped to ``similar‘‘ points on the other one. Our system, which contains little engineering or domain knowledge, delivers state of the art performance. We present application results including close to photorealistic morphs of 3D head models.

PDF [BibTex]

PDF [BibTex]


no image
Towards a Statistical Theory of Clustering. Presented at the PASCAL workshop on clustering, London

von Luxburg, U., Ben-David, S.

Presented at the PASCAL workshop on clustering, London, 2005 (techreport)

Abstract
The goal of this paper is to discuss statistical aspects of clustering in a framework where the data to be clustered has been sampled from some unknown probability distribution. Firstly, the clustering of the data set should reveal some structure of the underlying data rather than model artifacts due to the random sampling process. Secondly, the more sample points we have, the more reliable the clustering should be. We discuss which methods can and cannot be used to tackle those problems. In particular we argue that generalization bounds as they are used in statistical learning theory of classification are unsuitable in a general clustering framework. We suggest that the main replacements of generalization bounds should be convergence proofs and stability considerations. This paper should be considered as a road map paper which identifies important questions and potentially fruitful directions for future research about statistical clustering. We do not attempt to present a complete statistical theory of clustering.

PDF [BibTex]

PDF [BibTex]


no image
The human brain as large margin classifier

Graf, A., Wichmann, F., Bülthoff, H., Schölkopf, B.

Proceedings of the Computational & Systems Neuroscience Meeting (COSYNE), 2, pages: 1, 2005 (poster)

[BibTex]

[BibTex]


no image
A tutorial on v-support vector machines

Chen, P., Lin, C., Schölkopf, B.

Applied Stochastic Models in Business and Industry, 21(2):111-136, 2005 (article)

Abstract
We briefly describe the main ideas of statistical learning theory, support vector machines (SVMs), and kernel feature spaces. We place particular emphasis on a description of the so-called -SVM, including details of the algorithm and its implementation, theoretical results, and practical applications. Copyright © 2005 John Wiley & Sons, Ltd.

PDF [BibTex]

PDF [BibTex]


no image
Robust EEG Channel Selection Across Subjects for Brain Computer Interfaces

Schröder, M., Lal, T., Hinterberger, T., Bogdan, M., Hill, J., Birbaumer, N., Rosenstiel, W., Schölkopf, B.

EURASIP Journal on Applied Signal Processing, 2005(19, Special Issue: Trends in Brain Computer Interfaces):3103-3112, (Editors: Vesin, J. M., T. Ebrahimi), 2005 (article)

Abstract
Most EEG-based Brain Computer Interface (BCI) paradigms come along with specific electrode positions, e.g.~for a visual based BCI electrode positions close to the primary visual cortex are used. For new BCI paradigms it is usually not known where task relevant activity can be measured from the scalp. For individual subjects Lal et.~al showed that recording positions can be found without the use of prior knowledge about the paradigm used. However it remains unclear to what extend their method of Recursive Channel Elimination (RCE) can be generalized across subjects. In this paper we transfer channel rankings from a group of subjects to a new subject. For motor imagery tasks the results are promising, although cross-subject channel selection does not quite achieve the performance of channel selection on data of single subjects. Although the RCE method was not provided with prior knowledge about the mental task, channels that are well known to be important (from a physiological point of view) were consistently selected whereas task-irrelevant channels were reliably disregarded.

Web DOI [BibTex]

Web DOI [BibTex]


no image
Implicit Surface Modelling as an Eigenvalue Problem

Walder, C., Chapelle, O., Schölkopf, B.

In Proceedings of the 22nd International Conference on Machine Learning, pages: 937-944, (Editors: L De Raedt and S Wrobel), ACM, New York, NY, USA, ICML, 2005 (inproceedings)

Abstract
We discuss the problem of fitting an implicit shape model to a set of points sampled from a co-dimension one manifold of arbitrary topology. The method solves a non-convex optimisation problem in the embedding function that defines the implicit by way of its zero level set. By assuming that the solution is a mixture of radial basis functions of varying widths we attain the globally optimal solution by way of an equivalent eigenvalue problem, without using or constructing as an intermediate step the normal vectors of the manifold at each data point. We demonstrate the system on two and three dimensional data, with examples of missing data interpolation and set operations on the resultant shapes.

PDF [BibTex]

PDF [BibTex]


no image
Approximate Bayesian Inference for Psychometric Functions using MCMC Sampling

Kuss, M., Jäkel, F., Wichmann, F.

(135), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, 2005 (techreport)

Abstract
In psychophysical studies the psychometric function is used to model the relation between the physical stimulus intensity and the observer's ability to detect or discriminate between stimuli of different intensities. In this report we propose the use of Bayesian inference to extract the information contained in experimental data estimate the parameters of psychometric functions. Since Bayesian inference cannot be performed analytically we describe how a Markov chain Monte Carlo method can be used to generate samples from the posterior distribution over parameters. These samples are used to estimate Bayesian confidence intervals and other characteristics of the posterior distribution. In addition we discuss the parameterisation of psychometric functions and the role of prior distributions in the analysis. The proposed approach is exemplified using artificially generate d data and in a case study for real experimental data. Furthermore, we compare our approach with traditional methods based on maximum-likelihood parameter estimation combined with bootstrap techniques for confidence interval estimation. The appendix provides a description of an implementation for the R environment for statistical computing and provides the code for reproducing the results discussed in the experiment section.

PDF [BibTex]

PDF [BibTex]


no image
Natural Actor-Critic

Peters, J., Vijayakumar, S., Schaal, S.

In Proceedings of the 16th European Conference on Machine Learning, 3720, pages: 280-291, (Editors: Gama, J.;Camacho, R.;Brazdil, P.;Jorge, A.;Torgo, L.), Springer, ECML, 2005, clmc (inproceedings)

Abstract
This paper investigates a novel model-free reinforcement learning architecture, the Natural Actor-Critic. The actor updates are based on stochastic policy gradients employing AmariÕs natural gradient approach, while the critic obtains both the natural policy gradient and additional parameters of a value function simultaneously by linear regres- sion. We show that actor improvements with natural policy gradients are particularly appealing as these are independent of coordinate frame of the chosen policy representation, and can be estimated more efficiently than regular policy gradients. The critic makes use of a special basis function parameterization motivated by the policy-gradient compatible function approximation. We show that several well-known reinforcement learning methods such as the original Actor-Critic and BradtkeÕs Linear Quadratic Q-Learning are in fact Natural Actor-Critic algorithms. Em- pirical evaluations illustrate the effectiveness of our techniques in com- parison to previous methods, and also demonstrate their applicability for learning control on an anthropomorphic robot arm.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Comparative experiments on task space control with redundancy resolution

Nakanishi, J., Cory, R., Mistry, M., Peters, J., Schaal, S.

In Proceedings of the 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages: 3901-3908, Edmonton, Alberta, Canada, Aug. 2-6, IROS, 2005, clmc (inproceedings)

Abstract
Understanding the principles of motor coordination with redundant degrees of freedom still remains a challenging problem, particularly for new research in highly redundant robots like humanoids. Even after more than a decade of research, task space control with redundacy resolution still remains an incompletely understood theoretical topic, and also lacks a larger body of thorough experimental investigation on complex robotic systems. This paper presents our first steps towards the development of a working redundancy resolution algorithm which is robust against modeling errors and unforeseen disturbances arising from contact forces. To gain a better understanding of the pros and cons of different approaches to redundancy resolution, we focus on a comparative empirical evaluation. First, we review several redundancy resolution schemes at the velocity, acceleration and torque levels presented in the literature in a common notational framework and also introduce some new variants of these previous approaches. Second, we present experimental comparisons of these approaches on a seven-degree-of-freedom anthropomorphic robot arm. Surprisingly, one of our simplest algorithms empirically demonstrates the best performance, despite, from a theoretical point, the algorithm does not share the same beauty as some of the other methods. Finally, we discuss practical properties of these control algorithms, particularly in light of inevitable modeling errors of the robot dynamics.

link (url) DOI [BibTex]

link (url) DOI [BibTex]

2003


no image
Natural Actor-Critic

Peters, J., Vijayakumar, S., Schaal, S.

NIPS Workshop " Planning for the Real World: The promises and challenges of dealing with uncertainty", December 2003 (poster)

PDF Web [BibTex]

2003

PDF Web [BibTex]


no image
Learning Control and Planning from the View of Control Theory and Imitation

Peters, J., Schaal, S.

NIPS Workshop "Planning for the Real World: The promises and challenges of dealing with uncertainty", December 2003 (talk)

Abstract
Learning control and planning in high dimensional continuous state-action systems, e.g., as needed in a humanoid robot, has so far been a domain beyond the applicability of generic planning techniques like reinforcement learning and dynamic programming. This talk describes an approach we have taken in order to enable complex robotics systems to learn to accomplish control tasks. Adaptive learning controllers equipped with statistical learning techniques can be used to learn tracking controllers -- missing state information and uncertainty in the state estimates are usually addressed by observers or direct adaptive control methods. Imitation learning is used as an ingredient to seed initial control policies whose output is a desired trajectory suitable to accomplish the task at hand. Reinforcement learning with stochastic policy gradients using a natural gradient forms the third component that allows refining the initial control policy until the task is accomplished. In comparison to general learning control, this approach is highly prestructured and thus more domain specific. However, it seems to be a theoretically clean and feasible strategy for control systems of the complexity that we need to address.

Web [BibTex]

Web [BibTex]


no image
Molecular phenotyping of human chondrocyte cell lines T/C-28a2, T/C-28a4, and C-28/I2

Finger, F., Schorle, C., Zien, A., Gebhard, P., Goldring, M., Aigner, T.

Arthritis & Rheumatism, 48(12):3395-3403, December 2003 (article)

[BibTex]

[BibTex]


no image
A Study on Rainfall - Runoff Models for Improving Ensemble Streamflow Prediction: 1. Rainfallrunoff Models Using Artificial Neural Networks

Jeong, D., Kim, Y., Cho, S., Shin, H.

Journal of the Korean Society of Civil Engineers, 23(6B):521-530, December 2003 (article)

Abstract
The previous ESP (Ensemble Streamflow Prediction) studies conducted in Korea reported that the modeling error is a major source of the ESP forecast error in winter and spring (i.e. dry seasons), and thus suggested that improving the rainfall-runoff model would be critical to obtain more accurate probabilistic forecasts with ESP. This study used two types of Artificial Neural Networks (ANN), such as a Single Neural Network (SNN) and an Ensemble Neural Networks (ENN), to improve the simulation capability of the rainfall-runoff model of the ESP forecasting system for the monthly inflow to the Daecheong dam. Applied for the first time to Korean hydrology, ENN combines the outputs of member models so that it can control the generalization error better than SNN. Because the dry and the flood season in Korea shows considerably different streamflow characteristics, this study calibrated the rainfall-runoff model separately for each season. Therefore, four rainfall-runoff models were developed according to the ANN types and the seasons. This study compared the ANN models with a conceptual rainfall-runoff model called TANK and verified that the ANN models were superior to TANK. Among the ANN models, ENN was more accurate than SNN. The ANN model performance was improved when the model was calibrated separately for the dry and the flood season. The best ANN model developed in this article will be incorporated into the ESP system to increase the forecast capability of ESP for the monthly inflow to the Daecheong dam.

[BibTex]

[BibTex]


no image
Quantitative Cerebral Blood Flow Measurements in the Rat Using a Beta-Probe and H215O

Weber, B., Spaeth, N., Wyss, M., Wild, D., Burger, C., Stanley, R., Buck, A.

Journal of Cerebral Blood Flow and Metabolism, 23(12):1455-1460, December 2003 (article)

Abstract
Beta-probes are a relatively new tool for tracer kinetic studies in animals. They are highly suited to evaluate new positron emission tomography tracers or measure physiologic parameters at rest and after some kind of stimulation or intervention. In many of these experiments, the knowledge of CBF is highly important. Thus, the purpose of this study was to evaluate the method of CBF measurements using a beta-probe and H215O. CBF was measured in the barrel cortex of eight rats at baseline and after acetazolamide challenge. Trigeminal nerve stimulation was additionally performed in five animals. In each category, three injections of 250 to 300 MBq H215O were performed at 10-minute intervals. Data were analyzed using a standard one-tissue compartment model (K1 = CBF, k2 = CBF/p, where p is the partition coefficient). Values for K1 were 0.35 plusminus 0.09, 0.58 plusminus 0.16, and 0.49 plusminus 0.03 mL dot min-1 dot mL-1 at rest, after acetazolamide challenge, and during trigeminal nerve stimulation, respectively. The corresponding values for k2 were 0.55 plusminus 0.12, 0.94 plusminus 0.16, and 0.85 plusminus 0.12 min-7, and for p were 0.64 plusminus 0.05, 0.61 plusminus 0.07, and 0.59 plusminus 0.06.The standard deviation of the difference between two successive experiments, a measure for the reproducibility of the method, was 10.1%, 13.0%, and 5.7% for K1, k2, and p, respectively. In summary, beta-probes in conjunction with H215O allow the reproducible quantitative measurement of CBF, although some systematic underestimation seems to occur, probably because of partial volume effects.

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Recurrent neural networks from learning attractor dynamics

Schaal, S., Peters, J.

NIPS Workshop on RNNaissance: Recurrent Neural Networks, December 2003 (talk)

Abstract
Many forms of recurrent neural networks can be understood in terms of dynamic systems theory of difference equations or differential equations. Learning in such systems corresponds to adjusting some internal parameters to obtain a desired time evolution of the network, which can usually be characterized in term of point attractor dynamics, limit cycle dynamics, or, in some more rare cases, as strange attractor or chaotic dynamics. Finding a stable learning process to adjust the open parameters of the network towards shaping the desired attractor type and basin of attraction has remain a complex task, as the parameter trajectories during learning can lead the system through a variety of undesirable unstable behaviors, such that learning may never succeed. In this presentation, we review a recently developed learning framework for a class of recurrent neural networks that employs a more structured network approach. We assume that the canonical system behavior is known a priori, e.g., it is a point attractor or a limit cycle. With either supervised learning or reinforcement learning, it is possible to acquire the transformation from a simple representative of this canonical behavior (e.g., a 2nd order linear point attractor, or a simple limit cycle oscillator) to the desired highly complex attractor form. For supervised learning, one shot learning based on locally weighted regression techniques is possible. For reinforcement learning, stochastic policy gradient techniques can be employed. In any case, the recurrent network learned by these methods inherits the stability properties of the simple dynamic system that underlies the nonlinear transformation, such that stability of the learning approach is not a problem. We demonstrate the success of this approach for learning various skills on a humanoid robot, including tasks that require to incorporate additional sensory signals as coupling terms to modify the recurrent network evolution on-line.

Web [BibTex]

Web [BibTex]


no image
Support Vector Channel Selection in BCI

Lal, T., Schröder, M., Hinterberger, T., Weston, J., Bogdan, M., Birbaumer, N., Schölkopf, B.

(120), Max Planck Institute for Biological Cybernetics, Tuebingen, Germany, December 2003 (techreport)

Abstract
Designing a Brain Computer Interface (BCI) system one can choose from a variety of features that may be useful for classifying brain activity during a mental task. For the special case of classifying EEG signals we propose the usage of the state of the art feature selection algorithms Recursive Feature Elimination [3] and Zero-Norm Optimization [13] which are based on the training of Support Vector Machines (SVM) [11]. These algorithms can provide more accurate solutions than standard filter methods for feature selection [14]. We adapt the methods for the purpose of selecting EEG channels. For a motor imagery paradigm we show that the number of used channels can be reduced significantly without increasing the classification error. The resulting best channels agree well with the expected underlying cortical activity patterns during the mental tasks. Furthermore we show how time dependent task specific information can be visualized.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Texture and haptic cues in slant discrimination: Measuring the effect of texture type on cue combination

Rosas, P., Wichmann, F., Ernst, M., Wagemans, J.

Journal of Vision, 3(12):26, 2003 Fall Vision Meeting of the Optical Society of America, December 2003 (poster)

Abstract
In a number of models of depth cue combination the depth percept is constructed via a weighted average combination of independent depth estimations. The influence of each cue in such average depends on the reliability of the source of information. (Young, Landy, & Maloney, 1993; Ernst & Banks, 2002.) In particular, Ernst & Banks (2002) formulate the combination performed by the human brain as that of the minimum variance unbiased estimator that can be constructed from the available cues. Using slant discrimination and slant judgment via probe adjustment as tasks, we have observed systematic differences in performance of human observers when a number of different types of textures were used as cue to slant (Rosas, Wichmann & Wagemans, 2003). If the depth percept behaves as described above, our measurements of the slopes of the psychometric functions provide the predicted weights for the texture cue for the ranked texture types. We have combined these texture types with object motion but the obtained results are difficult to reconcile with the unbiased minimum variance estimator model (Rosas & Wagemans, 2003). This apparent failure of such model might be explained by the existence of a coupling of texture and motion, violating the assumption of independence of cues. Hillis, Ernst, Banks, & Landy (2002) have shown that while for between-modality combination the human visual system has access to the single-cue information, for within-modality combination (visual cues: disparity and texture) the single-cue information is lost, suggesting a coupling between these cues. Then, in the present study we combine the different texture types with haptic information in a slant discrimination task, to test whether in the between-modality condition the texture cue and the haptic cue to slant are combined as predicted by an unbiased, minimum variance estimator model.

Web DOI [BibTex]

Web DOI [BibTex]


no image
How to Deal with Large Dataset, Class Imbalance and Binary Output in SVM based Response Model

Shin, H., Cho, S.

In Proc. of the Korean Data Mining Conference, pages: 93-107, Korean Data Mining Conference, December 2003, Best Paper Award (inproceedings)

Abstract
[Abstract]: Various machine learning methods have made a rapid transition to response modeling in search of improved performance. And support vector machine (SVM) has also been attracting much attention lately. This paper presents an SVM response model. We are specifically focusing on the how-to’s to circumvent practical obstacles, such as how to face with class imbalance problem, how to produce the scores from an SVM classifier for lift chart analysis, and how to evaluate the models on accuracy and profit. Besides coping with the intractability problem of SVM training caused by large marketing dataset, a previously proposed pattern selection algorithm is introduced. SVM training accompanies time complexity of the cube of training set size. The pattern selection algorithm picks up important training patterns before SVM response modeling. We made comparison on SVM training results between the pattern selection algorithm and random sampling. Three aspects of SVM response models were evaluated: accuracies, lift chart analysis, and computational efficiency. The SVM trained with selected patterns showed a high accuracy, a high uplift in profit and in response rate, and a high computational efficiency.

PDF [BibTex]

PDF [BibTex]


no image
Blind separation of post-nonlinear mixtures using linearizing transformations and temporal decorrelation

Ziehe, A., Kawanabe, M., Harmeling, S., Müller, K.

Journal of Machine Learning Research, 4(7-8):1319-1338, November 2003 (article)

Abstract
We propose two methods that reduce the post-nonlinear blind source separation problem (PNL-BSS) to a linear BSS problem. The first method is based on the concept of maximal correlation: we apply the alternating conditional expectation (ACE) algorithm--a powerful technique from non-parametric statistics--to approximately invert the componentwise nonlinear functions. The second method is a Gaussianizing transformation, which is motivated by the fact that linearly mixed signals before nonlinear transformation are approximately Gaussian distributed. This heuristic, but simple and efficient procedure works as good as the ACE method. Using the framework provided by ACE, convergence can be proven. The optimal transformations obtained by ACE coincide with the sought-after inverse functions of the nonlinearities. After equalizing the nonlinearities, temporal decorrelation separation (TDSEP) allows us to recover the source signals. Numerical simulations testing "ACE-TD" and "Gauss-TD" on realistic examples are performed with excellent results.

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Correlated stage- and subfield-associated hippocampal gene expression patterns in experimental and human temporal lobe epilepsy

Becker, A., Chen, J., Zien, A., Sochivko, D., Normann, S., Schramm, J., Elger, C., Wiestler, O., Blumcke, I.

European Journal of Neuroscience, 18(10):2792-2802, November 2003 (article)

Abstract
Epileptic activity evokes profound alterations of hippocampal organization and function. Genomic responses may reflect immediate consequences of excitatory stimulation as well as sustained molecular processes related to neuronal plasticity and structural remodeling. Using oligonucleotide microarrays with 8799 sequences, we determined subregional gene expression profiles in rats subjected to pilocarpine-induced epilepsy (U34A arrays, Affymetrix, Santa Clara, CA, USA; P < 0.05, twofold change, n = 3 per stage). Patterns of gene expression corresponded to distinct stages of epilepsy development. The highest number of differentially expressed genes (dentate gyrus, approx. 400 genes and CA1, approx. 700 genes) was observed 3 days after status epilepticus. The majority of up-regulated genes was associated with mechanisms of cellular stress and injury - 14 days after status epilepticus, numerous transcription factors and genes linked to cytoskeletal and synaptic reorganization were differentially expressed and, in the stage of chronic spontaneous seizures, distinct changes were observed in the transcription of genes involved in various neurotransmission pathways and between animals with low vs. high seizure frequency. A number of genes (n = 18) differentially expressed during the chronic epileptic stage showed corresponding expression patterns in hippocampal subfields of patients with pharmacoresistant temporal lobe epilepsy (n = 5 temporal lobe epilepsy patients; U133A microarrays, Affymetrix; covering 22284 human sequences). These data provide novel insights into the molecular mechanisms of epileptogenesis and seizure-associated cellular and structural remodeling of the hippocampus.

[BibTex]

[BibTex]


no image
Concentration Inequalities for Sub-Additive Functions Using the Entropy Method

Bousquet, O.

Stochastic Inequalities and Applications, 56, pages: 213-247, Progress in Probability, (Editors: Giné, E., C. Houdré and D. Nualart), November 2003 (article)

Abstract
We obtain exponential concentration inequalities for sub-additive functions of independent random variables under weak conditions on the increments of those functions, like the existence of exponential moments for these increments. As a consequence of these general inequalities, we obtain refinements of Talagrand's inequality for empirical processes and new bounds for randomized empirical processes. These results are obtained by further developing the entropy method introduced by Ledoux.

PostScript [BibTex]

PostScript [BibTex]


no image
Learning Theory and Kernel Machines: 16th Annual Conference on Learning Theory and 7th Kernel Workshop (COLT/Kernel 2003), LNCS Vol. 2777

Schölkopf, B., Warmuth, M.

Proceedings of the 16th Annual Conference on Learning Theory and 7th Kernel Workshop (COLT/Kernel 2003), COLT/Kernel 2003, pages: 746, Springer, Berlin, Germany, 16th Annual Conference on Learning Theory and 7th Kernel Workshop, November 2003, Lecture Notes in Computer Science ; 2777 (proceedings)

DOI [BibTex]

DOI [BibTex]


no image
Bayesian Monte Carlo

Rasmussen, CE., Ghahramani, Z.

In Advances in Neural Information Processing Systems 15, pages: 489-496, (Editors: Becker, S. , S. Thrun, K. Obermayer), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
We investigate Bayesian alternatives to classical Monte Carlo methods for evaluating integrals. Bayesian Monte Carlo (BMC) allows the incorporation of prior knowledge, such as smoothness of the integrand, into the estimation. In a simple problem we show that this outperforms any classical importance sampling method. We also attempt more challenging multidimensional integrals involved in computing marginal likelihoods of statistical models (a.k.a. partition functions and model evidences). We find that Bayesian Monte Carlo outperformed Annealed Importance Sampling, although for very high dimensional problems or problems with massive multimodality BMC may be less adequate. One advantage of the Bayesian approach to Monte Carlo is that samples can be drawn from any distribution. This allows for the possibility of active design of sample points so as to maximise information gain.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Technical report on Separation methods for nonlinear mixtures

Jutten, C., Karhunen, J., Almeida, L., Harmeling, S.

(D29), EU-Project BLISS, October 2003 (techreport)

PDF [BibTex]

PDF [BibTex]


no image
On the Complexity of Learning the Kernel Matrix

Bousquet, O., Herrmann, D.

In Advances in Neural Information Processing Systems 15, pages: 399-406, (Editors: Becker, S. , S. Thrun, K. Obermayer), The MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
We investigate data based procedures for selecting the kernel when learning with Support Vector Machines. We provide generalization error bounds by estimating the Rademacher complexities of the corresponding function classes. In particular we obtain a complexity bound for function classes induced by kernels with given eigenvectors, i.e., we allow to vary the spectrum and keep the eigenvectors fix. This bound is only a logarithmic factor bigger than the complexity of the function class induced by a single kernel. However, optimizing the margin over such classes leads to overfitting. We thus propose a suitable way of constraining the class. We use an efficient algorithm to solve the resulting optimization problem, present preliminary experimental results, and compare them to an alignment-based approach.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Image Reconstruction by Linear Programming

Tsuda, K., Rätsch, G.

(118), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, October 2003 (techreport)

PDF [BibTex]

PDF [BibTex]


no image
Control, Planning, Learning, and Imitation with Dynamic Movement Primitives

Schaal, S., Peters, J., Nakanishi, J., Ijspeert, A.

In IROS 2003, pages: 1-21, Workshop on Bilateral Paradigms on Humans and Humanoids, IEEE International Conference on Intelligent Robots and Systems, October 2003 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Discriminative Learning for Label Sequences via Boosting

Altun, Y., Hofmann, T., Johnson, M.

In Advances in Neural Information Processing Systems 15, pages: 977-984, (Editors: Becker, S. , S. Thrun, K. Obermayer ), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
This paper investigates a boosting approach to discriminative learning of label sequences based on a sequence rank loss function.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Multiple-step ahead prediction for non linear dynamic systems: A Gaussian Process treatment with propagation of the uncertainty

Girard, A., Rasmussen, CE., Quiñonero-Candela, J., Murray-Smith, R.

In Advances in Neural Information Processing Systems 15, pages: 529-536, (Editors: Becker, S. , S. Thrun, K. Obermayer), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
We consider the problem of multi-step ahead prediction in time series analysis using the non-parametric Gaussian process model. k-step ahead forecasting of a discrete-time non-linear dynamic system can be performed by doing repeated one-step ahead predictions. For a state-space model of the form y_t = f(y_{t-1},...,y_{t-L}), the prediction of y at time t + k is based on the point estimates of the previous outputs. In this paper, we show how, using an analytical Gaussian approximation, we can formally incorporate the uncertainty about intermediate regressor values, thus updating the uncertainty on the current prediction.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Cluster Kernels for Semi-Supervised Learning

Chapelle, O., Weston, J., Schölkopf, B.

In Advances in Neural Information Processing Systems 15, pages: 585-592, (Editors: S Becker and S Thrun and K Obermayer), MIT Press, Cambridge, MA, USA, 16th Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
We propose a framework to incorporate unlabeled data in kernel classifier, based on the idea that two points in the same cluster are more likely to have the same label. This is achieved by modifying the eigenspectrum of the kernel matrix. Experimental results assess the validity of this approach.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Mismatch String Kernels for SVM Protein Classification

Leslie, C., Eskin, E., Weston, J., Noble, W.

In Advances in Neural Information Processing Systems 15, pages: 1417-1424, (Editors: Becker, S. , S. Thrun, K. Obermayer), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the protein classification problem. These kernels measure sequence similarity based on shared occurrences of k-length subsequences, counted with up to m mismatches, and do not rely on any generative model for the positive training sequences. We compute the kernels efficiently using a mismatch tree data structure and report experiments on a benchmark SCOP dataset, where we show that the mismatch kernel used with an SVM classifier performs as well as the Fisher kernel, the most successful method for remote homology detection, while achieving considerable computational savings.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Real-Time Face Detection

Kienzle, W.

Biologische Kybernetik, Eberhard-Karls-Universitaet Tuebingen, Tuebingen, Germany, October 2003 (diplomathesis)

[BibTex]

[BibTex]


no image
YKL-39 (chitinase 3-like protein 2), but not YKL-40 (chitinase 3-like protein 1), is up regulated in osteoarthritic chondrocytes

Knorr, T., Obermayr, F., Bartnik, E., Zien, A., Aigner, T.

Annals of the Rheumatic Diseases, 62(10):995-998, October 2003 (article)

Abstract
OBJECTIVE: To investigate quantitatively the mRNA expression levels of YKL-40, an established marker of rheumatoid and osteoarthritic cartilage degeneration in synovial fluid and serum, and a closely related molecule YKL-39, in articular chondrocytes. METHODS: cDNA array and online quantitative polymerase chain reaction (PCR) were used to measure mRNA expression levels of YKL-39 and YKL-40 in chondrocytes in normal, early degenerative, and late stage osteoarthritic cartilage samples. RESULTS: Expression analysis showed high levels of both proteins in normal articular chondrocytes, with lower levels of YKL-39 than YKL-40. Whereas YKL-40 was significantly down regulated in late stage osteoarthritic chondrocytes, YKL-39 was significantly up regulated. In vitro both YKLs were down regulated by interleukin 1beta. CONCLUSIONS: The up regulation of YKL-39 in osteoarthritic cartilage suggests that YKL-39 may be a more accurate marker of chondrocyte activation than YKL-40, although it has yet to be established as a suitable marker in synovial fluid and serum. The decreased expression of YKL-40 by osteoarthritic chondrocytes is surprising as increased levels have been reported in rheumatoid and osteoarthritic synovial fluid, where it may derive from activated synovial cells or osteophytic tissue or by increased matrix destruction in the osteoarthritic joint. YKL-39 and YKL-40 are potentially interesting marker molecules for arthritic joint disease because they are abundantly expressed by both normal and osteoarthritic chondrocytes.

[BibTex]

[BibTex]


no image
Incremental Gaussian Processes

Quinonero Candela, J., Winther, O.

In Advances in Neural Information Processing Systems 15, pages: 1001-1008, (Editors: Becker, S. , S. Thrun, K. Obermayer), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
In this paper, we consider Tipping‘s relevance vector machine (RVM) and formalize an incremental training strategy as a variant of the expectation-maximization (EM) algorithm that we call subspace EM. Working with a subset of active basis functions, the sparsity of the RVM solution will ensure that the number of basis functions and thereby the computational complexity is kept low. We also introduce a mean field approach to the intractable classification model that is expected to give a very good approximation to exact Bayesian inference and contains the Laplace approximation as a special case. We test the algorithms on two large data sets with O(10^3-10^4) examples. The results indicate that Bayesian learning of large data sets, e.g. the MNIST database is realistic.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Kernel Dependency Estimation

Weston, J., Chapelle, O., Elisseeff, A., Schölkopf, B., Vapnik, V.

In Advances in Neural Information Processing Systems 15, pages: 873-880, (Editors: S Becker and S Thrun and K Obermayer), MIT Press, Cambridge, MA, USA, 16th Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

PDF Web [BibTex]

PDF Web [BibTex]


no image
Derivative observations in Gaussian Process models of dynamic systems

Solak, E., Murray-Smith, R., Leithead, WE., Leith, D., Rasmussen, CE.

In Advances in Neural Information Processing Systems 15, pages: 1033-1040, (Editors: Becker, S., S. Thrun and K. Obermayer), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
Gaussian processes provide an approach to nonparametric modelling which allows a straightforward combination of function and derivative observations in an empirical model. This is of particular importance in identification of nonlinear dynamic systems from experimental data. 1) It allows us to combine derivative information, and associated uncertainty with normal function observations into the learning and inference process. This derivative information can be in the form of priors specified by an expert or identified from perturbation data close to equilibrium. 2) It allows a seamless fusion of multiple local linear models in a consistent manner, inferring consistent models and ensuring that integrability constraints are met. 3) It improves dramatically the computational efficiency of Gaussian process models for dynamic system identification, by summarising large quantities of near-equilibrium data by a handful of linearisations, reducing the training set size - traditionally a problem for Gaussian process models.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Linear Combinations of Optic Flow Vectors for Estimating Self-Motion: a Real-World Test of a Neural Model

Franz, MO., Chahl, JS.

In Advances in Neural Information Processing Systems 15, pages: 1319-1326, (Editors: Becker, S., S. Thrun and K. Obermayer), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
The tangential neurons in the fly brain are sensitive to the typical optic flow patterns generated during self-motion. In this study, we examine whether a simplified linear model of these neurons can be used to estimate self-motion from the optic flow. We present a theory for the construction of an estimator consisting of a linear combination of optic flow vectors that incorporates prior knowledge both about the distance distribution of the environment, and about the noise and self-motion statistics of the sensor. The estimator is tested on a gantry carrying an omnidirectional vision sensor. The experiments show that the proposed approach leads to accurate and robust estimates of rotation rates, whereas translation estimates turn out to be less reliable.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Clustering with the Fisher score

Tsuda, K., Kawanabe, M., Müller, K.

In Advances in Neural Information Processing Systems 15, pages: 729-736, (Editors: Becker, S. , S. Thrun, K. Obermayer), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
Recently the Fisher score (or the Fisher kernel) is increasingly used as a feature extractor for classification problems. The Fisher score is a vector of parameter derivatives of loglikelihood of a probabilistic model. This paper gives a theoretical analysis about how class information is preserved in the space of the Fisher score, which turns out that the Fisher score consists of a few important dimensions with class information and many nuisance dimensions. When we perform clustering with the Fisher score, K-Means type methods are obviously inappropriate because they make use of all dimensions. So we will develop a novel but simple clustering algorithm specialized for the Fisher score, which can exploit important dimensions. This algorithm is successfully tested in experiments with artificial data and real data (amino acid sequences).

PDF Web [BibTex]

PDF Web [BibTex]