Header logo is ei


2009


no image
Efficient factor GARCH models and factor-DCC models

Zhang, K., Chan, L.

Quantitative Finance, 9(1):71-91, 2009 (article)

Abstract
We report that, in the estimation of univariate GARCH or multivariate generalized orthogonal GARCH (GO-GARCH) models, maximizing the likelihood is equivalent to making the standardized residuals as independent as possible. Based on this, we propose three factor GARCH models in the framework of GO-GARCH: independent-factor GARCH exploits factors that are statistically as independent as possible; factors in best-factor GARCH have the largest autocorrelation in their squared values such that their volatilities could be forecast well by univariate GARCH; and factors in conditional-decorrelation GARCH are conditionally as uncorrelated as possible. A convenient two-step method for estimating these models is introduced. Since the extracted factors may still have weak conditional correlations, we further propose factor-DCC models as an extension to the above factor GARCH models with dynamic conditional correlation (DCC) modelling the remaining conditional correlations between factors. Experimental results for the Hong Kong stock market show that conditional-decorrelation GARCH and independent-factor GARCH have better generalization performance than the original GO-GARCH, and that conditional-decorrelation GARCH (among factor GARCH models) and its extension with DCC embedded (among factor-DCC models) behave best.

PDF Web DOI [BibTex]

2009

PDF Web DOI [BibTex]


no image
Graphical models for decoding in BCI visual speller systems

Martens, S., Farquhar, J., Hill, J., Schölkopf, B.

In pages: 470-473, IEEE, 4th International IEEE EMBS Conference on Neural Engineering (NER), 2009 (inproceedings)

DOI [BibTex]

DOI [BibTex]


no image
A Fast, Consistent Kernel Two-Sample Test

Gretton, A., Fukumizu, K., Harchaoui, Z., Sriperumbudur, B.

In Advances in Neural Information Processing Systems 22, pages: 673-681, (Editors: Bengio, Y. , D. Schuurmans, J. Lafferty, C. Williams, A. Culotta), Curran, Red Hook, NY, USA, 23rd Annual Conference on Neural Information Processing Systems (NIPS), 2009 (inproceedings)

Abstract
A kernel embedding of probability distributions into reproducing kernel Hilbert spaces (RKHS) has recently been proposed, which allows the comparison of two probability measures P and Q based on the distance between their respective embeddings: for a sufficiently rich RKHS, this distance is zero if and only if P and Q coincide. In using this distance as a statistic for a test of whether two samples are from different distributions, a major difficulty arises in computing the significance threshold, since the empirical statistic has as its null distribution (where P = Q) an infinite weighted sum of x2 random variables. Prior finite sample approximations to the null distribution include using bootstrap resampling, which yields a consistent estimate but is computationally costly; and fitting a parametric model with the low order moments of the test statistic, which can work well in practice but has no consistency or accuracy guarantees. The main result of the present work is a novel estimate of the null distribution, computed from the eigenspectrum of the Gram matrix on the aggregate sample from P and Q, and having lower computational cost than the bootstrap. A proof of consistency of this estimate is provided. The performance of the null distribution estimate is compared with the bootstrap and parametric approaches on an artificial example, high dimensional multivariate data, and text.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Augmenting Feature-driven fMRI Analyses: Semi-supervised learning and resting state activity

Blaschko, M., Shelton, J., Bartels, A.

In Advances in Neural Information Processing Systems 22, pages: 126-134, (Editors: Bengio, Y. , D. Schuurmans, J. Lafferty, C. Williams, A. Culotta), Curran, Red Hook, NY, USA, 23rd Annual Conference on Neural Information Processing Systems (NIPS), 2009 (inproceedings)

Abstract
Resting state activity is brain activation that arises in the absence of any task, and is usually measured in awake subjects during prolonged fMRI scanning sessions where the only instruction given is to close the eyes and do nothing. It has been recognized in recent years that resting state activity is implicated in a wide variety of brain function. While certain networks of brain areas have different levels of activation at rest and during a task, there is nevertheless significant similarity between activations in the two cases. This suggests that recordings of resting state activity can be used as a source of unlabeled data to augment discriminative regression techniques in a semi-supervised setting. We evaluate this setting empirically yielding three main results: (i) regression tends to be improved by the use of Laplacian regularization even when no additional unlabeled data are available, (ii) resting state data seem to have a similar marginal distribution to that recorded during the execution of a visual processing task implying largely similar types of activation, and (iii) this source of information can be broadly exploited to improve the robustness of empirical inference in fMRI studies, an inherently data poor domain.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Non-linear System Identification: Visual Saliency Inferred from Eye-Movement Data

Wichmann, F., Kienzle, W., Schölkopf, B., Franz, M.

Journal of Vision, 9(8):article 32, 2009 (article)

Abstract
For simple visual patterns under the experimenter's control we impose which information, or features, an observer can use to solve a given perceptual task. For natural vision tasks, however, there are typically a multitude of potential features in a given visual scene which the visual system may be exploiting when analyzing it: edges, corners, contours, etc. Here we describe a novel non-linear system identification technique based on modern machine learning methods that allows the critical features an observer uses to be inferred directly from the observer's data. The method neither requires stimuli to be embedded in noise nor is it limited to linear perceptive fields (classification images). We demonstrate our technique by deriving the critical image features observers fixate in natural scenes (bottom-up visual saliency). Unlike previous studies where the relevant structure is determined manually—e.g. by selecting Gabors as visual filters—we do not make any assumptions in this regard, but numerically infer number and properties them from the eye-movement data. We show that center-surround patterns emerge as the optimal solution for predicting saccade targets from local image structure. The resulting model, a one-layer feed-forward network with contrast gain-control, is surprisingly simple compared to previously suggested saliency models. Nevertheless, our model is equally predictive. Furthermore, our findings are consistent with neurophysiological hardware in the superior colliculus. Bottom-up visual saliency may thus not be computed cortically as has been thought previously.

Web DOI [BibTex]


no image
mGene.web: a web service for accurate computational gene finding

Schweikert, G., Behr, J., Zien, A., Zeller, G., Ong, C., Sonnenburg, S., Rätsch, G.

Nucleic Acids Research, 37, pages: W312-6, 2009 (article)

Abstract
We describe mGene.web, a web service for the genome-wide prediction of protein coding genes from eukaryotic DNA sequences. It offers pre-trained models for the recognition of gene structures including untranslated regions in an increasing number of organisms. With mGene.web, users have the additional possibility to train the system with their own data for other organisms on the push of a button, a functionality that will greatly accelerate the annotation of newly sequenced genomes. The system is built in a highly modular way, such that individual components of the framework, like the promoter prediction tool or the splice site predictor, can be used autonomously. The underlying gene finding system mGene is based on discriminative machine learning techniques and its high accuracy has been demonstrated in an international competition on nematode genomes. mGene.web is available at http://www.mgene.org/web, it is free of charge and can be used for eukaryotic genomes of small to moderate size (several hundred Mbp).

DOI [BibTex]

DOI [BibTex]


no image
Fast subtree kernels on graphs

Shervashidze, N., Borgwardt, K.

In Advances in Neural Information Processing Systems 22, pages: 1660-1668, (Editors: Bengio, Y. , D. Schuurmans, J. Lafferty, C. Williams, A. Culotta), Curran, Red Hook, NY, USA, 23rd Annual Conference on Neural Information Processing Systems (NIPS), 2009 (inproceedings)

Abstract
In this article, we propose fast subtree kernels on graphs. On graphs with n nodes and m edges and maximum degree d, these kernels comparing subtrees of height h can be computed in O(mh), whereas the classic subtree kernel by Ramon & G{\"a}rtner scales as O(n24dh). Key to this efficiency is the observation that the Weisfeiler-Lehman test of isomorphism from graph theory elegantly computes a subtree kernel as a byproduct. Our fast subtree kernels can deal with labeled graphs, scale up easily to large graphs and outperform state-of-the-art graph kernels on several classification benchmark datasets in terms of accuracy and runtime.

PDF Web [BibTex]

PDF Web [BibTex]


no image
An introduction to Kernel Learning Algorithms

Gehler, P., Schölkopf, B.

In Kernel Methods for Remote Sensing Data Analysis, pages: 25-48, 2, (Editors: Gustavo Camps-Valls and Lorenzo Bruzzone), Wiley, New York, NY, USA, 2009 (inbook)

Abstract
Kernel learning algorithms are currently becoming a standard tool in the area of machine learning and pattern recognition. In this chapter we review the fundamental theory of kernel learning. As the basic building block we introduce the kernel function, which provides an elegant and general way to compare possibly very complex objects. We then review the concept of a reproducing kernel Hilbert space and state the representer theorem. Finally we give an overview of the most prominent algorithms, which are support vector classification and regression, Gaussian Processes and kernel principal analysis. With multiple kernel learning and structured output prediction we also introduce some more recent advancements in the field.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


Thumb xl screen shot 2012 02 21 at 15.56.00  2
On feature combination for multiclass object classification

Gehler, P., Nowozin, S.

In Proceedings of the Twelfth IEEE International Conference on Computer Vision, pages: 221-228, ICCV, 2009, oral presentation (inproceedings)

project page, code, data GoogleScholar pdf DOI [BibTex]

project page, code, data GoogleScholar pdf DOI [BibTex]

2003


no image
Natural Actor-Critic

Peters, J., Vijayakumar, S., Schaal, S.

NIPS Workshop " Planning for the Real World: The promises and challenges of dealing with uncertainty", December 2003 (poster)

PDF Web [BibTex]

2003

PDF Web [BibTex]


no image
Learning Control and Planning from the View of Control Theory and Imitation

Peters, J., Schaal, S.

NIPS Workshop "Planning for the Real World: The promises and challenges of dealing with uncertainty", December 2003 (talk)

Abstract
Learning control and planning in high dimensional continuous state-action systems, e.g., as needed in a humanoid robot, has so far been a domain beyond the applicability of generic planning techniques like reinforcement learning and dynamic programming. This talk describes an approach we have taken in order to enable complex robotics systems to learn to accomplish control tasks. Adaptive learning controllers equipped with statistical learning techniques can be used to learn tracking controllers -- missing state information and uncertainty in the state estimates are usually addressed by observers or direct adaptive control methods. Imitation learning is used as an ingredient to seed initial control policies whose output is a desired trajectory suitable to accomplish the task at hand. Reinforcement learning with stochastic policy gradients using a natural gradient forms the third component that allows refining the initial control policy until the task is accomplished. In comparison to general learning control, this approach is highly prestructured and thus more domain specific. However, it seems to be a theoretically clean and feasible strategy for control systems of the complexity that we need to address.

Web [BibTex]

Web [BibTex]


no image
Molecular phenotyping of human chondrocyte cell lines T/C-28a2, T/C-28a4, and C-28/I2

Finger, F., Schorle, C., Zien, A., Gebhard, P., Goldring, M., Aigner, T.

Arthritis & Rheumatism, 48(12):3395-3403, December 2003 (article)

[BibTex]

[BibTex]


no image
A Study on Rainfall - Runoff Models for Improving Ensemble Streamflow Prediction: 1. Rainfallrunoff Models Using Artificial Neural Networks

Jeong, D., Kim, Y., Cho, S., Shin, H.

Journal of the Korean Society of Civil Engineers, 23(6B):521-530, December 2003 (article)

Abstract
The previous ESP (Ensemble Streamflow Prediction) studies conducted in Korea reported that the modeling error is a major source of the ESP forecast error in winter and spring (i.e. dry seasons), and thus suggested that improving the rainfall-runoff model would be critical to obtain more accurate probabilistic forecasts with ESP. This study used two types of Artificial Neural Networks (ANN), such as a Single Neural Network (SNN) and an Ensemble Neural Networks (ENN), to improve the simulation capability of the rainfall-runoff model of the ESP forecasting system for the monthly inflow to the Daecheong dam. Applied for the first time to Korean hydrology, ENN combines the outputs of member models so that it can control the generalization error better than SNN. Because the dry and the flood season in Korea shows considerably different streamflow characteristics, this study calibrated the rainfall-runoff model separately for each season. Therefore, four rainfall-runoff models were developed according to the ANN types and the seasons. This study compared the ANN models with a conceptual rainfall-runoff model called TANK and verified that the ANN models were superior to TANK. Among the ANN models, ENN was more accurate than SNN. The ANN model performance was improved when the model was calibrated separately for the dry and the flood season. The best ANN model developed in this article will be incorporated into the ESP system to increase the forecast capability of ESP for the monthly inflow to the Daecheong dam.

[BibTex]

[BibTex]


no image
Quantitative Cerebral Blood Flow Measurements in the Rat Using a Beta-Probe and H215O

Weber, B., Spaeth, N., Wyss, M., Wild, D., Burger, C., Stanley, R., Buck, A.

Journal of Cerebral Blood Flow and Metabolism, 23(12):1455-1460, December 2003 (article)

Abstract
Beta-probes are a relatively new tool for tracer kinetic studies in animals. They are highly suited to evaluate new positron emission tomography tracers or measure physiologic parameters at rest and after some kind of stimulation or intervention. In many of these experiments, the knowledge of CBF is highly important. Thus, the purpose of this study was to evaluate the method of CBF measurements using a beta-probe and H215O. CBF was measured in the barrel cortex of eight rats at baseline and after acetazolamide challenge. Trigeminal nerve stimulation was additionally performed in five animals. In each category, three injections of 250 to 300 MBq H215O were performed at 10-minute intervals. Data were analyzed using a standard one-tissue compartment model (K1 = CBF, k2 = CBF/p, where p is the partition coefficient). Values for K1 were 0.35 plusminus 0.09, 0.58 plusminus 0.16, and 0.49 plusminus 0.03 mL dot min-1 dot mL-1 at rest, after acetazolamide challenge, and during trigeminal nerve stimulation, respectively. The corresponding values for k2 were 0.55 plusminus 0.12, 0.94 plusminus 0.16, and 0.85 plusminus 0.12 min-7, and for p were 0.64 plusminus 0.05, 0.61 plusminus 0.07, and 0.59 plusminus 0.06.The standard deviation of the difference between two successive experiments, a measure for the reproducibility of the method, was 10.1%, 13.0%, and 5.7% for K1, k2, and p, respectively. In summary, beta-probes in conjunction with H215O allow the reproducible quantitative measurement of CBF, although some systematic underestimation seems to occur, probably because of partial volume effects.

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Recurrent neural networks from learning attractor dynamics

Schaal, S., Peters, J.

NIPS Workshop on RNNaissance: Recurrent Neural Networks, December 2003 (talk)

Abstract
Many forms of recurrent neural networks can be understood in terms of dynamic systems theory of difference equations or differential equations. Learning in such systems corresponds to adjusting some internal parameters to obtain a desired time evolution of the network, which can usually be characterized in term of point attractor dynamics, limit cycle dynamics, or, in some more rare cases, as strange attractor or chaotic dynamics. Finding a stable learning process to adjust the open parameters of the network towards shaping the desired attractor type and basin of attraction has remain a complex task, as the parameter trajectories during learning can lead the system through a variety of undesirable unstable behaviors, such that learning may never succeed. In this presentation, we review a recently developed learning framework for a class of recurrent neural networks that employs a more structured network approach. We assume that the canonical system behavior is known a priori, e.g., it is a point attractor or a limit cycle. With either supervised learning or reinforcement learning, it is possible to acquire the transformation from a simple representative of this canonical behavior (e.g., a 2nd order linear point attractor, or a simple limit cycle oscillator) to the desired highly complex attractor form. For supervised learning, one shot learning based on locally weighted regression techniques is possible. For reinforcement learning, stochastic policy gradient techniques can be employed. In any case, the recurrent network learned by these methods inherits the stability properties of the simple dynamic system that underlies the nonlinear transformation, such that stability of the learning approach is not a problem. We demonstrate the success of this approach for learning various skills on a humanoid robot, including tasks that require to incorporate additional sensory signals as coupling terms to modify the recurrent network evolution on-line.

Web [BibTex]

Web [BibTex]


no image
Support Vector Channel Selection in BCI

Lal, T., Schröder, M., Hinterberger, T., Weston, J., Bogdan, M., Birbaumer, N., Schölkopf, B.

(120), Max Planck Institute for Biological Cybernetics, Tuebingen, Germany, December 2003 (techreport)

Abstract
Designing a Brain Computer Interface (BCI) system one can choose from a variety of features that may be useful for classifying brain activity during a mental task. For the special case of classifying EEG signals we propose the usage of the state of the art feature selection algorithms Recursive Feature Elimination [3] and Zero-Norm Optimization [13] which are based on the training of Support Vector Machines (SVM) [11]. These algorithms can provide more accurate solutions than standard filter methods for feature selection [14]. We adapt the methods for the purpose of selecting EEG channels. For a motor imagery paradigm we show that the number of used channels can be reduced significantly without increasing the classification error. The resulting best channels agree well with the expected underlying cortical activity patterns during the mental tasks. Furthermore we show how time dependent task specific information can be visualized.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Texture and haptic cues in slant discrimination: Measuring the effect of texture type on cue combination

Rosas, P., Wichmann, F., Ernst, M., Wagemans, J.

Journal of Vision, 3(12):26, 2003 Fall Vision Meeting of the Optical Society of America, December 2003 (poster)

Abstract
In a number of models of depth cue combination the depth percept is constructed via a weighted average combination of independent depth estimations. The influence of each cue in such average depends on the reliability of the source of information. (Young, Landy, & Maloney, 1993; Ernst & Banks, 2002.) In particular, Ernst & Banks (2002) formulate the combination performed by the human brain as that of the minimum variance unbiased estimator that can be constructed from the available cues. Using slant discrimination and slant judgment via probe adjustment as tasks, we have observed systematic differences in performance of human observers when a number of different types of textures were used as cue to slant (Rosas, Wichmann & Wagemans, 2003). If the depth percept behaves as described above, our measurements of the slopes of the psychometric functions provide the predicted weights for the texture cue for the ranked texture types. We have combined these texture types with object motion but the obtained results are difficult to reconcile with the unbiased minimum variance estimator model (Rosas & Wagemans, 2003). This apparent failure of such model might be explained by the existence of a coupling of texture and motion, violating the assumption of independence of cues. Hillis, Ernst, Banks, & Landy (2002) have shown that while for between-modality combination the human visual system has access to the single-cue information, for within-modality combination (visual cues: disparity and texture) the single-cue information is lost, suggesting a coupling between these cues. Then, in the present study we combine the different texture types with haptic information in a slant discrimination task, to test whether in the between-modality condition the texture cue and the haptic cue to slant are combined as predicted by an unbiased, minimum variance estimator model.

Web DOI [BibTex]

Web DOI [BibTex]


no image
How to Deal with Large Dataset, Class Imbalance and Binary Output in SVM based Response Model

Shin, H., Cho, S.

In Proc. of the Korean Data Mining Conference, pages: 93-107, Korean Data Mining Conference, December 2003, Best Paper Award (inproceedings)

Abstract
[Abstract]: Various machine learning methods have made a rapid transition to response modeling in search of improved performance. And support vector machine (SVM) has also been attracting much attention lately. This paper presents an SVM response model. We are specifically focusing on the how-to’s to circumvent practical obstacles, such as how to face with class imbalance problem, how to produce the scores from an SVM classifier for lift chart analysis, and how to evaluate the models on accuracy and profit. Besides coping with the intractability problem of SVM training caused by large marketing dataset, a previously proposed pattern selection algorithm is introduced. SVM training accompanies time complexity of the cube of training set size. The pattern selection algorithm picks up important training patterns before SVM response modeling. We made comparison on SVM training results between the pattern selection algorithm and random sampling. Three aspects of SVM response models were evaluated: accuracies, lift chart analysis, and computational efficiency. The SVM trained with selected patterns showed a high accuracy, a high uplift in profit and in response rate, and a high computational efficiency.

PDF [BibTex]

PDF [BibTex]


no image
Blind separation of post-nonlinear mixtures using linearizing transformations and temporal decorrelation

Ziehe, A., Kawanabe, M., Harmeling, S., Müller, K.

Journal of Machine Learning Research, 4(7-8):1319-1338, November 2003 (article)

Abstract
We propose two methods that reduce the post-nonlinear blind source separation problem (PNL-BSS) to a linear BSS problem. The first method is based on the concept of maximal correlation: we apply the alternating conditional expectation (ACE) algorithm--a powerful technique from non-parametric statistics--to approximately invert the componentwise nonlinear functions. The second method is a Gaussianizing transformation, which is motivated by the fact that linearly mixed signals before nonlinear transformation are approximately Gaussian distributed. This heuristic, but simple and efficient procedure works as good as the ACE method. Using the framework provided by ACE, convergence can be proven. The optimal transformations obtained by ACE coincide with the sought-after inverse functions of the nonlinearities. After equalizing the nonlinearities, temporal decorrelation separation (TDSEP) allows us to recover the source signals. Numerical simulations testing "ACE-TD" and "Gauss-TD" on realistic examples are performed with excellent results.

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Correlated stage- and subfield-associated hippocampal gene expression patterns in experimental and human temporal lobe epilepsy

Becker, A., Chen, J., Zien, A., Sochivko, D., Normann, S., Schramm, J., Elger, C., Wiestler, O., Blumcke, I.

European Journal of Neuroscience, 18(10):2792-2802, November 2003 (article)

Abstract
Epileptic activity evokes profound alterations of hippocampal organization and function. Genomic responses may reflect immediate consequences of excitatory stimulation as well as sustained molecular processes related to neuronal plasticity and structural remodeling. Using oligonucleotide microarrays with 8799 sequences, we determined subregional gene expression profiles in rats subjected to pilocarpine-induced epilepsy (U34A arrays, Affymetrix, Santa Clara, CA, USA; P < 0.05, twofold change, n = 3 per stage). Patterns of gene expression corresponded to distinct stages of epilepsy development. The highest number of differentially expressed genes (dentate gyrus, approx. 400 genes and CA1, approx. 700 genes) was observed 3 days after status epilepticus. The majority of up-regulated genes was associated with mechanisms of cellular stress and injury - 14 days after status epilepticus, numerous transcription factors and genes linked to cytoskeletal and synaptic reorganization were differentially expressed and, in the stage of chronic spontaneous seizures, distinct changes were observed in the transcription of genes involved in various neurotransmission pathways and between animals with low vs. high seizure frequency. A number of genes (n = 18) differentially expressed during the chronic epileptic stage showed corresponding expression patterns in hippocampal subfields of patients with pharmacoresistant temporal lobe epilepsy (n = 5 temporal lobe epilepsy patients; U133A microarrays, Affymetrix; covering 22284 human sequences). These data provide novel insights into the molecular mechanisms of epileptogenesis and seizure-associated cellular and structural remodeling of the hippocampus.

[BibTex]

[BibTex]


no image
Concentration Inequalities for Sub-Additive Functions Using the Entropy Method

Bousquet, O.

Stochastic Inequalities and Applications, 56, pages: 213-247, Progress in Probability, (Editors: Giné, E., C. Houdré and D. Nualart), November 2003 (article)

Abstract
We obtain exponential concentration inequalities for sub-additive functions of independent random variables under weak conditions on the increments of those functions, like the existence of exponential moments for these increments. As a consequence of these general inequalities, we obtain refinements of Talagrand's inequality for empirical processes and new bounds for randomized empirical processes. These results are obtained by further developing the entropy method introduced by Ledoux.

PostScript [BibTex]

PostScript [BibTex]


no image
Learning Theory and Kernel Machines: 16th Annual Conference on Learning Theory and 7th Kernel Workshop (COLT/Kernel 2003), LNCS Vol. 2777

Schölkopf, B., Warmuth, M.

Proceedings of the 16th Annual Conference on Learning Theory and 7th Kernel Workshop (COLT/Kernel 2003), COLT/Kernel 2003, pages: 746, Springer, Berlin, Germany, 16th Annual Conference on Learning Theory and 7th Kernel Workshop, November 2003, Lecture Notes in Computer Science ; 2777 (proceedings)

DOI [BibTex]

DOI [BibTex]


no image
Bayesian Monte Carlo

Rasmussen, CE., Ghahramani, Z.

In Advances in Neural Information Processing Systems 15, pages: 489-496, (Editors: Becker, S. , S. Thrun, K. Obermayer), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
We investigate Bayesian alternatives to classical Monte Carlo methods for evaluating integrals. Bayesian Monte Carlo (BMC) allows the incorporation of prior knowledge, such as smoothness of the integrand, into the estimation. In a simple problem we show that this outperforms any classical importance sampling method. We also attempt more challenging multidimensional integrals involved in computing marginal likelihoods of statistical models (a.k.a. partition functions and model evidences). We find that Bayesian Monte Carlo outperformed Annealed Importance Sampling, although for very high dimensional problems or problems with massive multimodality BMC may be less adequate. One advantage of the Bayesian approach to Monte Carlo is that samples can be drawn from any distribution. This allows for the possibility of active design of sample points so as to maximise information gain.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Technical report on Separation methods for nonlinear mixtures

Jutten, C., Karhunen, J., Almeida, L., Harmeling, S.

(D29), EU-Project BLISS, October 2003 (techreport)

PDF [BibTex]

PDF [BibTex]


no image
On the Complexity of Learning the Kernel Matrix

Bousquet, O., Herrmann, D.

In Advances in Neural Information Processing Systems 15, pages: 399-406, (Editors: Becker, S. , S. Thrun, K. Obermayer), The MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
We investigate data based procedures for selecting the kernel when learning with Support Vector Machines. We provide generalization error bounds by estimating the Rademacher complexities of the corresponding function classes. In particular we obtain a complexity bound for function classes induced by kernels with given eigenvectors, i.e., we allow to vary the spectrum and keep the eigenvectors fix. This bound is only a logarithmic factor bigger than the complexity of the function class induced by a single kernel. However, optimizing the margin over such classes leads to overfitting. We thus propose a suitable way of constraining the class. We use an efficient algorithm to solve the resulting optimization problem, present preliminary experimental results, and compare them to an alignment-based approach.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Image Reconstruction by Linear Programming

Tsuda, K., Rätsch, G.

(118), Max Planck Institute for Biological Cybernetics, Tübingen, Germany, October 2003 (techreport)

PDF [BibTex]

PDF [BibTex]


no image
Control, Planning, Learning, and Imitation with Dynamic Movement Primitives

Schaal, S., Peters, J., Nakanishi, J., Ijspeert, A.

In IROS 2003, pages: 1-21, Workshop on Bilateral Paradigms on Humans and Humanoids, IEEE International Conference on Intelligent Robots and Systems, October 2003 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Discriminative Learning for Label Sequences via Boosting

Altun, Y., Hofmann, T., Johnson, M.

In Advances in Neural Information Processing Systems 15, pages: 977-984, (Editors: Becker, S. , S. Thrun, K. Obermayer ), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
This paper investigates a boosting approach to discriminative learning of label sequences based on a sequence rank loss function.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Multiple-step ahead prediction for non linear dynamic systems: A Gaussian Process treatment with propagation of the uncertainty

Girard, A., Rasmussen, CE., Quiñonero-Candela, J., Murray-Smith, R.

In Advances in Neural Information Processing Systems 15, pages: 529-536, (Editors: Becker, S. , S. Thrun, K. Obermayer), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
We consider the problem of multi-step ahead prediction in time series analysis using the non-parametric Gaussian process model. k-step ahead forecasting of a discrete-time non-linear dynamic system can be performed by doing repeated one-step ahead predictions. For a state-space model of the form y_t = f(y_{t-1},...,y_{t-L}), the prediction of y at time t + k is based on the point estimates of the previous outputs. In this paper, we show how, using an analytical Gaussian approximation, we can formally incorporate the uncertainty about intermediate regressor values, thus updating the uncertainty on the current prediction.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Cluster Kernels for Semi-Supervised Learning

Chapelle, O., Weston, J., Schölkopf, B.

In Advances in Neural Information Processing Systems 15, pages: 585-592, (Editors: S Becker and S Thrun and K Obermayer), MIT Press, Cambridge, MA, USA, 16th Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
We propose a framework to incorporate unlabeled data in kernel classifier, based on the idea that two points in the same cluster are more likely to have the same label. This is achieved by modifying the eigenspectrum of the kernel matrix. Experimental results assess the validity of this approach.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Mismatch String Kernels for SVM Protein Classification

Leslie, C., Eskin, E., Weston, J., Noble, W.

In Advances in Neural Information Processing Systems 15, pages: 1417-1424, (Editors: Becker, S. , S. Thrun, K. Obermayer), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the protein classification problem. These kernels measure sequence similarity based on shared occurrences of k-length subsequences, counted with up to m mismatches, and do not rely on any generative model for the positive training sequences. We compute the kernels efficiently using a mismatch tree data structure and report experiments on a benchmark SCOP dataset, where we show that the mismatch kernel used with an SVM classifier performs as well as the Fisher kernel, the most successful method for remote homology detection, while achieving considerable computational savings.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Real-Time Face Detection

Kienzle, W.

Biologische Kybernetik, Eberhard-Karls-Universitaet Tuebingen, Tuebingen, Germany, October 2003 (diplomathesis)

[BibTex]

[BibTex]


no image
YKL-39 (chitinase 3-like protein 2), but not YKL-40 (chitinase 3-like protein 1), is up regulated in osteoarthritic chondrocytes

Knorr, T., Obermayr, F., Bartnik, E., Zien, A., Aigner, T.

Annals of the Rheumatic Diseases, 62(10):995-998, October 2003 (article)

Abstract
OBJECTIVE: To investigate quantitatively the mRNA expression levels of YKL-40, an established marker of rheumatoid and osteoarthritic cartilage degeneration in synovial fluid and serum, and a closely related molecule YKL-39, in articular chondrocytes. METHODS: cDNA array and online quantitative polymerase chain reaction (PCR) were used to measure mRNA expression levels of YKL-39 and YKL-40 in chondrocytes in normal, early degenerative, and late stage osteoarthritic cartilage samples. RESULTS: Expression analysis showed high levels of both proteins in normal articular chondrocytes, with lower levels of YKL-39 than YKL-40. Whereas YKL-40 was significantly down regulated in late stage osteoarthritic chondrocytes, YKL-39 was significantly up regulated. In vitro both YKLs were down regulated by interleukin 1beta. CONCLUSIONS: The up regulation of YKL-39 in osteoarthritic cartilage suggests that YKL-39 may be a more accurate marker of chondrocyte activation than YKL-40, although it has yet to be established as a suitable marker in synovial fluid and serum. The decreased expression of YKL-40 by osteoarthritic chondrocytes is surprising as increased levels have been reported in rheumatoid and osteoarthritic synovial fluid, where it may derive from activated synovial cells or osteophytic tissue or by increased matrix destruction in the osteoarthritic joint. YKL-39 and YKL-40 are potentially interesting marker molecules for arthritic joint disease because they are abundantly expressed by both normal and osteoarthritic chondrocytes.

[BibTex]

[BibTex]


no image
Incremental Gaussian Processes

Quinonero Candela, J., Winther, O.

In Advances in Neural Information Processing Systems 15, pages: 1001-1008, (Editors: Becker, S. , S. Thrun, K. Obermayer), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
In this paper, we consider Tipping‘s relevance vector machine (RVM) and formalize an incremental training strategy as a variant of the expectation-maximization (EM) algorithm that we call subspace EM. Working with a subset of active basis functions, the sparsity of the RVM solution will ensure that the number of basis functions and thereby the computational complexity is kept low. We also introduce a mean field approach to the intractable classification model that is expected to give a very good approximation to exact Bayesian inference and contains the Laplace approximation as a special case. We test the algorithms on two large data sets with O(10^3-10^4) examples. The results indicate that Bayesian learning of large data sets, e.g. the MNIST database is realistic.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Kernel Dependency Estimation

Weston, J., Chapelle, O., Elisseeff, A., Schölkopf, B., Vapnik, V.

In Advances in Neural Information Processing Systems 15, pages: 873-880, (Editors: S Becker and S Thrun and K Obermayer), MIT Press, Cambridge, MA, USA, 16th Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

PDF Web [BibTex]

PDF Web [BibTex]


no image
Derivative observations in Gaussian Process models of dynamic systems

Solak, E., Murray-Smith, R., Leithead, WE., Leith, D., Rasmussen, CE.

In Advances in Neural Information Processing Systems 15, pages: 1033-1040, (Editors: Becker, S., S. Thrun and K. Obermayer), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
Gaussian processes provide an approach to nonparametric modelling which allows a straightforward combination of function and derivative observations in an empirical model. This is of particular importance in identification of nonlinear dynamic systems from experimental data. 1) It allows us to combine derivative information, and associated uncertainty with normal function observations into the learning and inference process. This derivative information can be in the form of priors specified by an expert or identified from perturbation data close to equilibrium. 2) It allows a seamless fusion of multiple local linear models in a consistent manner, inferring consistent models and ensuring that integrability constraints are met. 3) It improves dramatically the computational efficiency of Gaussian process models for dynamic system identification, by summarising large quantities of near-equilibrium data by a handful of linearisations, reducing the training set size - traditionally a problem for Gaussian process models.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Linear Combinations of Optic Flow Vectors for Estimating Self-Motion: a Real-World Test of a Neural Model

Franz, MO., Chahl, JS.

In Advances in Neural Information Processing Systems 15, pages: 1319-1326, (Editors: Becker, S., S. Thrun and K. Obermayer), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
The tangential neurons in the fly brain are sensitive to the typical optic flow patterns generated during self-motion. In this study, we examine whether a simplified linear model of these neurons can be used to estimate self-motion from the optic flow. We present a theory for the construction of an estimator consisting of a linear combination of optic flow vectors that incorporates prior knowledge both about the distance distribution of the environment, and about the noise and self-motion statistics of the sensor. The estimator is tested on a gantry carrying an omnidirectional vision sensor. The experiments show that the proposed approach leads to accurate and robust estimates of rotation rates, whereas translation estimates turn out to be less reliable.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Clustering with the Fisher score

Tsuda, K., Kawanabe, M., Müller, K.

In Advances in Neural Information Processing Systems 15, pages: 729-736, (Editors: Becker, S. , S. Thrun, K. Obermayer), MIT Press, Cambridge, MA, USA, Sixteenth Annual Conference on Neural Information Processing Systems (NIPS), October 2003 (inproceedings)

Abstract
Recently the Fisher score (or the Fisher kernel) is increasingly used as a feature extractor for classification problems. The Fisher score is a vector of parameter derivatives of loglikelihood of a probabilistic model. This paper gives a theoretical analysis about how class information is preserved in the space of the Fisher score, which turns out that the Fisher score consists of a few important dimensions with class information and many nuisance dimensions. When we perform clustering with the Fisher score, K-Means type methods are obviously inappropriate because they make use of all dimensions. So we will develop a novel but simple clustering algorithm specialized for the Fisher score, which can exploit important dimensions. This algorithm is successfully tested in experiments with artificial data and real data (amino acid sequences).

PDF Web [BibTex]

PDF Web [BibTex]


no image
Large Margin Methods for Label Sequence Learning

Altun, Y., Hofmann, T.

In pages: 993-996, International Speech Communication Association, Bonn, Germany, 8th European Conference on Speech Communication and Technology (EuroSpeech), September 2003 (inproceedings)

Web [BibTex]

Web [BibTex]


no image
Technical report on implementation of linear methods and validation on acoustic sources

Harmeling, S., Bünau, P., Ziehe, A., Pham, D.

EU-Project BLISS, September 2003 (techreport)

PDF [BibTex]

PDF [BibTex]


no image
Fast Pattern Selection Algorithm for Support Vector Classifiers: "Time Complexity Analysis"

Shin, H., Cho, S.

In Lecture Notes in Computer Science (LNCS 2690), LNCS 2690, pages: 1008-1015, Springer-Verlag, Heidelberg, The 4th International Conference on Intelligent Data Engineering (IDEAL), September 2003 (inproceedings)

Abstract
Training SVM requires large memory and long cpu time when the pattern set is large. To alleviate the computational burden in SVM training, we propose a fast preprocessing algorithm which selects only the patterns near the decision boundary. The time complexity of the proposed algorithm is much smaller than that of the naive M^2 algorithm

PDF [BibTex]

PDF [BibTex]


no image
Marginalized Kernels between Labeled Graphs

Kashima, H., Tsuda, K., Inokuchi, A.

In 20th International Conference on Machine Learning, pages: 321-328, (Editors: Faucett, T. and N. Mishra), 20th International Conference on Machine Learning, August 2003 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Sparse Gaussian Processes: inference, subspace identification and model selection

Csato, L., Opper, M.

In Proceedings, pages: 1-6, (Editors: Van der Hof, , Wahlberg), The Netherlands, 13th IFAC Symposium on System Identifiaction, August 2003, electronical version; Index ThA02-2 (inproceedings)

Abstract
Gaussian Process (GP) inference is a probabilistic kernel method where the GP is treated as a latent function. The inference is carried out using the Bayesian online learning and its extension to the more general iterative approach which we call TAP/EP learning. Sparsity is introduced in this context to make the TAP/EP method applicable to large datasets. We address the prohibitive scaling of the number of parameters by defining a subset of the training data that is used as the support the GP, thus the number of required parameters is independent of the training set, similar to the case of ``Support--‘‘ or ``Relevance--Vectors‘‘. An advantage of the full probabilistic treatment is that allows the computation of the marginal data likelihood or evidence, leading to hyper-parameter estimation within the GP inference. An EM algorithm to choose the hyper-parameters is proposed. The TAP/EP learning is the E-step and the M-step then updates the hyper-parameters. Due to the sparse E-step the resulting algorithm does not involve manipulation of large matrices. The presented algorithm is applicable to a wide variety of likelihood functions. We present results of applying the algorithm on classification and nonstandard regression problems for artificial and real datasets.

PDF GZIP [BibTex]

PDF GZIP [BibTex]


no image
Adaptive, Cautious, Predictive control with Gaussian Process Priors

Murray-Smith, R., Sbarbaro, D., Rasmussen, CE., Girard, A.

In Proceedings of the 13th IFAC Symposium on System Identification, pages: 1195-1200, (Editors: Van den Hof, P., B. Wahlberg and S. Weiland), Proceedings of the 13th IFAC Symposium on System Identification, August 2003 (inproceedings)

Abstract
Nonparametric Gaussian Process models, a Bayesian statistics approach, are used to implement a nonlinear adaptive control law. Predictions, including propagation of the state uncertainty are made over a k-step horizon. The expected value of a quadratic cost function is minimised, over this prediction horizon, without ignoring the variance of the model predictions. The general method and its main features are illustrated on a simulation example.

PDF [BibTex]

PDF [BibTex]


no image
Statistical Learning Theory

Bousquet, O.

Machine Learning Summer School, August 2003 (talk)

PDF [BibTex]

PDF [BibTex]


no image
Generative Model-based Clustering of Directional Data

Banerjee, A., Dhillon, I., Ghosh, J., Sra, S.

In Proc. ACK SIGKDD, pages: 00-00, KDD, August 2003 (inproceedings)

GZIP [BibTex]

GZIP [BibTex]


no image
Hidden Markov Support Vector Machines

Altun, Y., Tsochantaridis, I., Hofmann, T.

In pages: 4-11, (Editors: Fawcett, T. , N. Mishra), AAAI Press, Menlo Park, CA, USA, Twentieth International Conference on Machine Learning (ICML), August 2003 (inproceedings)

Web [BibTex]

Web [BibTex]