Header logo is ei


2007


no image

no image
Collaborative Filtering via Ensembles of Matrix Factorizations

Wu, M.

In KDD Cup and Workshop 2007, pages: 43-47, KDD Cup and Workshop, August 2007 (inproceedings)

Abstract
We present a Matrix Factorization(MF) based approach for the Netflix Prize competition. Currently MF based algorithms are popular and have proved successful for collaborative filtering tasks. For the Netflix Prize competition, we adopt three different types of MF algorithms: regularized MF, maximum margin MF and non-negative MF. Furthermore, for each MF algorithm, instead of selecting the optimal parameters, we combine the results obtained with several parameters. With this method, we achieve a performance that is more than 6% better than the Netflix‘s own system.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Thinking Out Loud: Research and Development of Brain Computer Interfaces

Hill, NJ.

Invited keynote talk at the Max Planck Society‘s PhDNet Workshop., July 2007 (talk)

Abstract
My principal interest is in applying machine-learning methods to the development of Brain-Computer Interfaces (BCI). This involves the classification of a user‘s intentions or mental states, or regression against some continuous intentional control signal, using brain signals obtained for example by EEG, ECoG or MEG. The long-term aim is to develop systems that a completely paralysed person (such as someone suffering from advanced Amyotrophic Lateral Sclerosis) could use to communicate. Such systems have the potential to improve the lives of many people who would be otherwise completely unable to communicate, but they are still very much in the research and development stages.

PDF [BibTex]

PDF [BibTex]


no image
Fusion of spectral and spatial information by a novel SVM classification technique

Bruzzone, L., Marconcini, M., Persello, C.

In pages: 4838-4841 , IEEE, Piscataway, NJ, USA, IEEE International Geoscience and Remote Sensing Symposium (IGARSS), July 2007 (inproceedings)

Abstract
A novel context-sensitive semisupervised classification technique based on support vector machines is proposed. This technique aims at exploiting the SVM method for image classification by properly fusing spectral information with spatial- context information. This results in: i) an increased robustness to noisy training sets in the learning phase of the classifier; ii) a higher and more stable classification accuracy with respect to the specific patterns included in the training set; and iii) a regularized classification map. The main property of the proposed context sensitive semisupervised SVM (CS4VM) is to adaptively exploit the contextual information in the training phase of the classifier, without any critical assumption on the expected labels of the pixels included in the same neighborhood system. This is done by defining a novel context-sensitive term in the objective function used in the learning of the classifier. In addition, the proposed CS4VM can be integrated with a Markov random field (MRF) approach for exploiting the contextual information also to regularize the classification map. Experiments carried out on very high geometrical resolution images confirmed the effectiveness of the proposed technique.

Web DOI [BibTex]

Web DOI [BibTex]


no image
Online-Computation Approach to Optimal Control of Noise-Affected Nonlinear Systems with Continuous State and Control Spaces

Deisenroth, MP., Weissel, F., Ohtsuka, T., Hanebeck, UD.

In ECC‘07, pages: 3664-3671, 9th European Control Conference, July 2007 (inproceedings)

Abstract
A novel online-computation approach to optimal control of nonlinear, noise-affected systems with continuous state and control spaces is presented. In the proposed algorithm, system noise is explicitly incorporated into the control decision. This leads to superior results compared to state-of-the-art nonlinear controllers that neglect this influence. The solution of an optimal nonlinear controller for a corresponding deterministic system is employed to find a meaningful state space restriction. This restriction is obtained by means of approximate state prediction using the noisy system equation. Within this constrained state space, an optimal closed-loop solution for a finite decision-making horizon (prediction horizon) is determined within an adaptively restricted optimization space. Interleaving stochastic dynamic programming and value function approximation yields a solution to the considered optimal control problem. The enhanced performance of the proposed discrete-time controller is illustrated by means o f a scalar example system. Nonlinear model predictive control is applied to address approximate treatment of infinite-horizon problems by the finite-horizon controller.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Error Correcting Codes for the P300 Visual Speller

Biessmann, F.

Biologische Kybernetik, Eberhard-Karls-Universität Tübingen, Tübingen, Germany, July 2007 (diplomathesis)

Abstract
The aim of brain-computer interface (BCI) research is to establish a communication system based on intentional modulation of brain activity. This is accomplished by classifying patterns of brain ac- tivity, volitionally induced by the user. The BCI presented in this study is based on a classical paradigm as proposed by (Farwell and Donchin, 1988), the P300 visual speller. Recording electroencephalo- grams (EEG) from the scalp while presenting letters successively to the user, the speller can infer from the brain signal which letter the user was focussing on. Since EEG recordings are noisy, usually many repetitions are needed to detect the correct letter. The focus of this study was to improve the accuracy of the visual speller applying some basic principles from information theory: Stimulus sequences of the speller have been modified into error-correcting codes. Additionally a language model was incorporated into the probabilistic letter de- coder. Classification of single EEG epochs was less accurate using error correcting codes. However, the novel code could compensate for that such that overall, letter accuracies were as high as or even higher than for classical stimulus codes. In particular at high noise levels, error-correcting decoding achieved higher letter accuracies.

PDF [BibTex]

PDF [BibTex]


no image
Feature Selection for Trouble Shooting in Complex Assembly Lines

Pfingsten, T., Herrmann, D., Schnitzler, T., Feustel, A., Schölkopf, B.

IEEE Transactions on Automation Science and Engineering, 4(3):465-469, July 2007 (article)

Abstract
The final properties of sophisticated products can be affected by many unapparent dependencies within the manufacturing process, and the products’ integrity can often only be checked in a final measurement. Troubleshooting can therefore be very tedious if not impossible in large assembly lines. In this paper we show that Feature Selection is an efficient tool for serial-grouped lines to reveal causes for irregularities in product attributes. We compare the performance of several methods for Feature Selection on real-world problems in mass-production of semiconductor devices. Note to Practitioners— We present a data based procedure to localize flaws in large production lines: using the results of final quality inspections and information about which machines processed which batches, we are able to identify machines which cause low yield.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
A Kernel Approach to Comparing Distributions

Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B., Smola, A.

In Proceedings of the 22. AAAI Conference on Artificial Intelligence, pages: 1637-1641, AAAI Press, Menlo Park, CA, USA, Twenty-Second AAAI Conference on Artificial Intelligence (AAAI), July 2007 (inproceedings)

Abstract
We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a Reproducing Kernel Hilbert Space. We apply this technique to construct a two-sample test, which is used for determining whether two sets of observations arise from the same distribution. We use this test in attribute matching for databases using the Hungarian marriage method, where it performs strongly. We also demonstrate excellent performance when comparing distributions over graphs, for which no alternative tests currently exist.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Gene selection via the BAHSIC family of algorithms

Song, L., Bedo, J., Borgwardt, K., Gretton, A., Smola, A.

Bioinformatics, 23(13: ISMB/ECCB 2007 Conference Proceedings):i490-i498, July 2007 (article)

Abstract
Motivation: Identifying significant genes among thousands of sequences on a microarray is a central challenge for cancer research in bioinformatics. The ultimate goal is to detect the genes that are involved in disease outbreak and progression. A multitude of methods have been proposed for this task of feature selection, yet the selected gene lists differ greatly between different methods. To accomplish biologically meaningful gene selection from microarray data, we have to understand the theoretical connections and the differences between these methods. In this article, we define a kernel-based framework for feature selection based on the Hilbert–Schmidt independence criterion and backward elimination, called BAHSIC. We show that several well-known feature selectors are instances of BAHSIC, thereby clarifying their relationship. Furthermore, by choosing a different kernel, BAHSIC allows us to easily define novel feature selection algorithms. As a further advantage, feature selection via BAHSIC works directly on multiclass problems. Results: In a broad experimental evaluation, the members of the BAHSIC family reach high levels of accuracy and robustness when compared to other feature selection techniques. Experiments show that features selected with a linear kernel provide the best classification performance in general, but if strong non-linearities are present in the data then non-linear kernels can be more suitable.

Web DOI [BibTex]

Web DOI [BibTex]


no image
Phenotyping of Chondrocytes In Vivo and In Vitro Using cDNA Array Technology

Zien, A., Gebhard, P., Fundel, K., Aigner, T.

Clinical Orthopaedics and Related Research, 460, pages: 226-233, July 2007 (article)

Abstract
The cDNA array technology is a powerful tool to analyze a high number of genes in parallel. We investigated whether large-scale gene expression analysis allows clustering and identification of cellular phenotypes of chondrocytes in different in vivo and in vitro conditions. In 100% of cases, clustering analysis distinguished between in vivo and in vitro samples, suggesting fundamental differences in chondrocytes in situ and in vitro regardless of the culture conditions or disease status. It also allowed us to differentiate between healthy and osteoarthritic cartilage. The clustering also revealed the relative importance of the investigated culturing conditions (stimulation agent, stimulation time, bead/monolayer). We augmented the cluster analysis with a statistical search for genes showing differential expression. The identified genes provided hints to the molecular basis of the differences between the sample classes. Our approach shows the power of modern bioinformatic algorithms for understanding and class ifying chondrocytic phenotypes in vivo and in vitro. Although it does not generate new experimental data per se, it provides valuable information regarding the biology of chondrocytes and may provide tools for diagnosing and staging the osteoarthritic disease process.

DOI [BibTex]

DOI [BibTex]


no image
Manifold Denoising as Preprocessing for Finding Natural Representations of Data

Hein, M., Maier, M.

In AAAI-07, pages: 1646-1649, AAAI Press, Menlo Park, CA, USA, Twenty-Second AAAI Conference on Artificial Intelligence (AAAI-07), July 2007 (inproceedings)

Abstract
A natural representation of data are the parameters which generated the data. If the parameter space is continuous we can regard it as a manifold. In practice we usually do not know this manifold but we just have some representation of the data, often in a very high-dimensional feature space. Since the number of internal parameters does not change with the representation, the data will effectively lie on a low-dimensional submanifold in feature space. Due to measurement errors this data is usually corrupted by noise which particularly in high-dimensional feature spaces makes it almost impossible to find the manifold structure. This paper reviews a method called Manifold Denoising which projects the data onto the submanifold using a diffusion process on a graph generated by the data. We will demonstrate that the method is capable of dealing with non-trival high-dimensional noise. Moreover we will show that using the method as a preprocessing step one can significantly improve the results of a semi-supervised learning algorithm.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Learning the Influence of Spatio-Temporal Variations in Local Image Structure on Visual Saliency

Kienzle, W., Wichmann, F., Schölkopf, B., Franz, M.

10th T{\"u}binger Wahrnehmungskonferenz (TWK 2007), 10, pages: 1, July 2007 (poster)

Abstract
Computational models for bottom-up visual attention traditionally consist of a bank of Gabor-like or Difference-of-Gaussians filters and a nonlinear combination scheme which combines the filter responses into a real-valued saliency measure [1]. Recently it was shown that a standard machine learning algorithm can be used to derive a saliency model from human eye movement data with a very small number of additional assumptions. The learned model is much simpler than previous models, but nevertheless has state-of-the-art prediction performance [2]. A central result from this study is that DoG-like center-surround filters emerge as the unique solution to optimizing the predictivity of the model. Here we extend the learning method to the temporal domain. While the previous model [2] predicts visual saliency based on local pixel intensities in a static image, our model also takes into account temporal intensity variations. We find that the learned model responds strongly to temporal intensity changes ocurring 200-250ms before a saccade is initiated. This delay coincides with the typical saccadic latencies, indicating that the learning algorithm has extracted a meaningful statistic from the training data. In addition, we show that the model correctly predicts a significant proportion of human eye movements on previously unseen test data.

Web [BibTex]

Web [BibTex]


no image
Graph-based Protein Functional Classification

Shin, H., Lisewski, A., Lichtarge, O.

In BIOCOMP‘07, pages: 738-744, (Editors: Arabnia, H. R., M. Q. Yang, J. Y. Yang), CSREA Press, Las Vegas, NV, USA, 2007 International Conference on Bioinformatics and Computational Biology, July 2007 (inproceedings)

Web [BibTex]

Web [BibTex]


no image
Data-driven goodness-of-fit tests

Langovoy, MA.

Biologische Kybernetik, Georg-August-Universität Göttingen, Göttingen, Germany, July 2007 (phdthesis)

Web [BibTex]

Web [BibTex]


no image
Common Sequence Polymorphisms Shaping Genetic Diversity in Arabidopsis thaliana

Clark, R., Schweikert, G., Toomajian, C., Ossowski, S., Zeller, G., Shinn, P., Warthmann, N., Hu, T., Fu, G., Hinds, D., Chen, H., Frazer, K., Huson, D., Schölkopf, B., Nordborg, M., Rätsch, G., Ecker, J., Weigel, D.

Science, 317(5836):338-342, July 2007 (article)

Abstract
The genomes of individuals from the same species vary in sequence as a result of different evolutionary processes. To examine the patterns of, and the forces shaping, sequence variation in Arabidopsis thaliana, we performed high-density array resequencing of 20 diverse strains (accessions). More than 1 million nonredundant single-nucleotide polymorphisms (SNPs) were identified at moderate false discovery rates (FDRs), and ~4% of the genome was identified as being highly dissimilar or deleted relative to the reference genome sequence. Patterns of polymorphism are highly nonrandom among gene families, with genes mediating interaction with the biotic environment having exceptional polymorphism levels. At the chromosomal scale, regional variation in polymorphism was readily apparent. A scan for recent selective sweeps revealed several candidate regions, including a notable example in which almost all variation was removed in a 500-kilobase window. Analyzing the polymorphisms we describe in larger sets of accessions will enable a detailed understanding of forces shaping population-wide sequence variation in A. thaliana.

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Supervised Feature Selection via Dependence Estimation

Song, L., Smola, A., Gretton, A., Borgwardt, K., Bedo, J.

In Proceedings of the 24th Annual International Conference on Machine Learning (ICML 2007), pages: 823-830, (Editors: Ghahramani, Z. ), ACM Press, New York, NY, USA, Twenty-Fourth Annual International Conference on Machine Learning (ICML), June 2007 (inproceedings)

Abstract
We introduce a framework for filtering features that employs the Hilbert-Schmidt Independence Criterion (HSIC) as a measure of dependence between the features and the labels. The key idea is that good features should maximise such dependence. Feature selection for various supervised learning problems (including classification and regression) is unified under this framework, and the solutions can be approximated using a backward-elimination algorithm. We demonstrate the usefulness of our method on both artificial and real world datasets.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
A Kernel-Based Causal Learning Algorithm

Sun, X., Janzing, D., Schölkopf, B., Fukumizu, K.

In Proceedings of the 24th International Conference on Machine Learning, pages: 855-862, (Editors: Z Ghahramani), ACM Press, New York, NY, USA, ICML, June 2007 (inproceedings)

Abstract
We describe a causal learning method, which employs measuring the strength of statistical dependences in terms of the Hilbert-Schmidt norm of kernel-based cross-covariance operators. Following the line of the common faithfulness assumption of constraint-based causal learning, our approach assumes that a variable Z is likely to be a common effect of X and Y, if conditioning on Z increases the dependence between X and Y. Based on this assumption, we collect "votes" for hypothetical causal directions and orient the edges by the majority principle. In most experiments with known causal structures, our method provided plausible results and outperformed the conventional constraint-based PC algorithm.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Entire Regularization Paths for Graph Data

Tsuda, K.

In ICML 2007, pages: 919-926, (Editors: Ghahramani, Z. ), ACM Press, New York, NY, USA, 24th Annual International Conference on Machine Learning, June 2007 (inproceedings)

Abstract
Graph data such as chemical compounds and XML documents are getting more common in many application domains. A main difficulty of graph data processing lies in the intrinsic high dimensionality of graphs, namely, when a graph is represented as a binary feature vector of indicators of all possible subgraph patterns, the dimensionality gets too large for usual statistical methods. We propose an efficient method to select a small number of salient patterns by regularization path tracking. The generation of useless patterns is minimized by progressive extension of the search space. In experiments, it is shown that our technique is considerably more efficient than a simpler approach based on frequent substructure mining.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Les Représentations Prédictives des États et des Politiques

Boularias, A., Chaib-Draa, B.

In MFI 2007, pages: 37-48, Quatrièmes Journées Francophones Modèles Formels de l‘Interaction, June 2007 (inproceedings)

Abstract
Nous proposons dans cet article une nouvelle approche pour représenter les politiques (stratégies) dans les environnements stochastiques et partiellement observables. Nous nous intéressons plus particulièrement aux systèmes multi-agents, où chaque agent connaît uniquement ses propres politiques, et doit choisir la meilleure parmi elles selon son état de croyance sur les politiques du reste des agents. Notre modèle utilise moins de paramètres que les méthodes de représentation usuelles, telles que les arbres de décision ou les contrôleurs d’états finis stochastiques, permettant ainsi une accélération des algorithmes de planification. Nous montrons aussi comment ce modèle peut être utilisé efficacement dans le cas de la planification multiagents coopérative et sans communication, les résultats empiriques sont comparés avec le modèle DEC-POMDP (Decentralized Partially Observable Markov Decision Process).

PDF Web [BibTex]

PDF Web [BibTex]


no image
Graph Laplacians and their Convergence on Random Neighborhood Graphs

Hein, M., Audibert, J., von Luxburg, U.

Journal of Machine Learning Research, 8, pages: 1325-1370, June 2007 (article)

Abstract
Given a sample from a probability measure with support on a submanifold in Euclidean space one can construct a neighborhood graph which can be seen as an approximation of the submanifold. The graph Laplacian of such a graph is used in several machine learning methods like semi-supervised learning, dimensionality reduction and clustering. In this paper we determine the pointwise limit of three different graph Laplacians used in the literature as the sample size increases and the neighborhood size approaches zero. We show that for a uniform measure on the submanifold all graph Laplacians have the same limit up to constants. However in the case of a non-uniform measure on the submanifold only the so called random walk graph Laplacian converges to the weighted Laplace-Beltrami operator.

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Dirichlet Process Mixtures of Factor Analysers

Görür, D., Rasmussen, C.

Fifth Workshop on Bayesian Inference in Stochastic Processes (BSP5), June 2007 (talk)

Abstract
Mixture of factor analysers (MFA) is a well-known model that combines the dimensionality reduction technique of Factor Analysis (FA) with mixture modeling. The key issue in MFA is deciding on the latent dimension and the number of mixture components to be used. The Bayesian treatment of MFA has been considered by Beal and Ghahramani (2000) using variational approximation and by Fokoué and Titterington (2003) using birth-and –death Markov chain Monte Carlo (MCMC). Here, we present the nonparametric MFA model utilizing a Dirichlet process (DP) prior on the component parameters (that is, the factor loading matrix and the mean vector of each component) and describe an MCMC scheme for inference. The clustering property of the DP provides automatic selection of the number of mixture components. The latent dimensionality of each component is inferred by automatic relevance determination (ARD). Identifying the action potentials of individual neurons from extracellular recordings, known as spike sorting, is a challenging clustering problem. We apply our model for clustering the waveforms recorded from the cortex of a macaque monkey.

Web [BibTex]

Web [BibTex]


no image
An Extensible Probabilistic Transformation-based Approach to the Third Recognizing Textual Entailment Challenge

Harmeling, S.

In TextEntail 2007, pages: 137-142, ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, June 2007 (inproceedings)

Abstract
We introduce a system for textual entailment that is based on a probabilistic model of entailment. The model is defined using some calculus of transformations on dependency trees, which is characterized by the fact that derivations in that calculus preserve the truth only with a certain probability. We also describe a possible set of transformations (and with it implicitly a calculus) that was successfully applied to the RTE3 challenge data. However, our system can be improved in many ways and we see it as the starting point for a promising new approach to textual entailment.

Web [BibTex]

Web [BibTex]


no image
Weighted Substructure Mining for Image Analysis

Nowozin, S., Tsuda, K., Uno, T., Kudo, T., BakIr, G.

In CVPR 2007, pages: 1-8, IEEE Computer Society, Los Alamitos, CA, USA, 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2007 (inproceedings)

Abstract
In web-related applications of image categorization, it is desirable to derive an interpretable classification rule with high accuracy. Using the bag-of-words representation and the linear support vector machine, one can partly fulfill the goal, but the accuracy of linear classifiers is not high and the obtained features are not informative for users. We propose to combine item set mining and large margin classifiers to select features from the power set of all visual words. Our resulting classification rule is easier to browse and simpler to understand, because each feature has richer information. As a next step, each image is represented as a graph where nodes correspond to local image features and edges encode geometric relations between features. Combining graph mining and boosting, we can obtain a classification rule based on subgraph features that contain more information than the set features. We evaluate our algorithm in a web-retrieval ranking task where the goal is to reject outliers from a set of images returned for a keyword query. Furthermore, it is evaluated on the supervised classification tasks with the challenging VOC2005 data set. Our approach yields excellent accuracy in the unsupervised ranking task compared to a recently proposed probabilistic model and competitive results in the supervised classification task.

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Local Learning Projections

Wu, M., Yu, K., Yu, S., Schölkopf, B.

In Proceedings of the 24th International Conference on Machine Learning, pages: 1039-1046, (Editors: Z Ghahramani), ACM Press, New York, NY, USA, ICML, June 2007 (inproceedings)

Abstract
This paper presents a Local Learning Projection (LLP) approach for linear dimensionality reduction. We first point out that the well known Principal Component Analysis (PCA) essentially seeks the projection that has the minimal global estimation error. Then we propose a dimensionality reduction algorithm that leads to the projection with the minimal local estimation error, and elucidate its advantages for classification tasks. We also indicate that LLP keeps the local information in the sense that the projection value of each point can be well estimated based on its neighbors and their projection values. Experimental results are provided to validate the effectiveness of the proposed algorithm.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Training and Approximation of a Primal Multiclass Support Vector Machine

Zien, A., Bona, F., Ong, C.

In ASMDA 2007, pages: 1-8, (Editors: Skiadas, C. H.), 12th International Conference on Applied Stochastic Models and Data Analysis, June 2007 (inproceedings)

Abstract
We revisit the multiclass support vector machine (SVM) and generalize the formulation to convex loss functions and joint feature maps. Motivated by recent work [Chapelle, 2006] we use logistic loss and softmax to enable gradient based primal optimization. Kernels are incorporated via kernel principal component analysis (KPCA), which naturally leads to approximation methods for large scale problems. We investigate similarities and differences to previous multiclass SVM approaches. Experimental comparisons to previous approaches and to the popular one-vs-rest SVM are presented on several different datasets.

PDF PostScript Web [BibTex]

PDF PostScript Web [BibTex]


no image
New BCI approaches: Selective Attention to Auditory and Tactile Stimulus Streams

Hill, N., Raths, C.

Invited talk at the PASCAL Workshop on Methods of Data Analysis in Computational Neuroscience and Brain Computer Interfaces, June 2007 (talk)

Abstract
When considering Brain-Computer Interface (BCI) development for patients in the most severely paralysed states, there is considerable motivation to move away from BCI systems based on either motor cortex activity, or on visual stimuli. Together these account for most of current BCI research. I present the results of our recent exploration of new auditory- and tactile-stimulus-driven BCIs. The talk includes a tutorial on the construction and interpretation of classifiers which extract spatio-temporal features from event-related potential data. The effects and implications of whitening are discussed, and preliminary results on the effectiveness of a low-rank constraint (Tomioka and Aihara 2007) are shown.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Towards Motor Skill Learning in Robotics

Peters, J.

Interactive Robot Learning - RSS workshop, June 2007 (talk)

Web [BibTex]

Web [BibTex]


no image
Nonlinear independent component analysis with minimum nonlinear distortion

Zhang, K., Chan, L.

In ICML ’07: Proceedings of the 24th international conference on Machine learning, pages: 1127-1134, (Editors: Z Ghahramani), ACM, New York, NY, USA, 24th International Conference on Machine Learning (ICML), June 2007 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Information-theoretic Metric Learning

Davis, J., Kulis, B., Jain, P., Sra, S., Dhillon, I.

In ICML 2007, pages: 209-216, (Editors: Ghahramani, Z. ), ACM Press, New York, NY, USA, 24th Annual International Conference on Machine Learning, June 2007 (inproceedings)

Abstract
In this paper, we present an information-theoretic approach to learning a Mahalanobis distance function. We formulate the problem as that of minimizing the differential relative entropy between two multivariate Gaussians under constraints on the distance function. We express this problem as a particular Bregman optimization problem---that of minimizing the LogDet divergence subject to linear constraints. Our resulting algorithm has several advantages over existing methods. First, our method can handle a wide variety of constraints and can optionally incorporate a prior on the distance function. Second, it is fast and scalable. Unlike most existing methods, no eigenvalue computations or semi-definite programming are required. We also present an online version and derive regret bounds for the resulting algorithm. Finally, we evaluate our method on a recent error reporting system for software called Clarify, in the context of metric learning for nearest neighbor classification, as well as on standard data sets.

Web DOI [BibTex]

Web DOI [BibTex]


no image
A Dependence Maximization View of Clustering

Song, L., Smola, A., Gretton, A., Borgwardt, K.

In Proceedings of the 24th Annual International Conference on Machine Learning (ICML 2007), pages: 815-822, (Editors: Ghahramani, Z. ), ACM Press, New York, NY, USA, Twenty-Fourth Annual International Conference on Machine Learning (ICML), June 2007 (inproceedings)

Abstract
We propose a family of clustering algorithms based on the maximization of dependence between the input variables and their cluster labels, as expressed by the Hilbert-Schmidt Independence Criterion (HSIC). Under this framework, we unify the geometric, spectral, and statistical dependence views of clustering, and subsume many existing algorithms as special cases (e.g. k-means and spectral clustering). Distinctive to our framework is that kernels can also be applied on the labels, which can endow them with particular structures. We also obtain a perturbation bound on the change in k-means clustering.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Multiclass Multiple Kernel Learning

Zien, A., Ong, C.

In ICML 2007, pages: 1191-1198, (Editors: Ghahramani, Z. ), ACM Press, New York, NY, USA, 24th International Conference on Machine Learning, June 2007 (inproceedings)

Abstract
In many applications it is desirable to learn from several kernels. “Multiple kernel learning” (MKL) allows the practitioner to optimize over linear combinations of kernels. By enforcing sparse coefficients, it also generalizes feature selection to kernel selection. We propose MKL for joint feature maps. This provides a convenient and principled way for MKL with multiclass problems. In addition, we can exploit the joint feature map to learn kernels on output spaces. We show the equivalence of several different primal formulations including different regularizers. We present several optimization methods, and compare a convex quadratically constrained quadratic program (QCQP) and two semi-infinite linear programs (SILPs) toy data, showing that the SILPs are faster than the QCQP. We then demonstrate the utility of our method by applying the SILP to three real world datasets.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Transductive Support Vector Machines for Structured Variables

Zien, A., Brefeld, U., Scheffer, T.

In ICML 2007, pages: 1183-1190, (Editors: Ghahramani, Z. ), ACM Press, New York, NY, USA, 24th International Conference on Machine Learning, June 2007 (inproceedings)

Abstract
We study the problem of learning kernel machines transductively for structured output variables. Transductive learning can be reduced to combinatorial optimization problems over all possible labelings of the unlabeled data. In order to scale transductive learning to structured variables, we transform the corresponding non-convex, combinatorial, constrained optimization problems into continuous, unconstrained optimization problems. The discrete optimization parameters are eliminated and the resulting differentiable problems can be optimized efficiently. We study the effectiveness of the generalized TSVM on multiclass classification and label-sequence learning problems empirically.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Transductive Support Vector Machines for Structured Variables

Zien, A., Brefeld, U., Scheffer, T.

International Conference on Machine Learning (ICML), June 2007 (talk)

Abstract
We study the problem of learning kernel machines transductively for structured output variables. Transductive learning can be reduced to combinatorial optimization problems over all possible labelings of the unlabeled data. In order to scale transductive learning to structured variables, we transform the corresponding non-convex, combinatorial, constrained optimization problems into continuous, unconstrained optimization problems. The discrete optimization parameters are eliminated and the resulting differentiable problems can be optimized efficiently. We study the effectiveness of the generalized TSVM on multiclass classification and label-sequence learning problems empirically.

PDF PDF Web [BibTex]

PDF PDF Web [BibTex]


no image
Pattern detection

Blake, A., Romdhani, S., Schölkopf, B., Torr, P. H. S.

United States Patent, No 7236626, June 2007 (patent)

[BibTex]

[BibTex]


no image
Asymptotic stability of the solution of the M/MB/1 queueing model

Haji, A., Radl, A.

Computers and Mathematics with Applications, 53(9):1411-1420, May 2007 (article)

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Competition and Coordination in Stochastic Games

Burkov, A., Boularias, A., Chaib-Draa, B.

In Canadian AI 2007, pages: 26-37, (Editors: Kobti, Z. , D. Wu), Springer, Berlin, Germany, 20th Conference of the Canadian Society for Computational Studies of Intelligence, May 2007 (inproceedings)

Abstract
Agent competition and coordination are two classical and most important tasks in multiagent systems. In recent years, there was a number of learning algorithms proposed to resolve such type of problems. Among them, there is an important class of algorithms, called adaptive learning algorithms, that were shown to be able to converge in self-play to a solution in a wide variety of the repeated matrix games. Although certain algorithms of this class, such as Infinitesimal Gradient Ascent (IGA), Policy Hill-Climbing (PHC) and Adaptive Play Q-learning (APQ), have been catholically studied in the recent literature, a question of how these algorithms perform versus each other in general form stochastic games is remaining little-studied. In this work we are trying to answer this question. To do that, we analyse these algorithms in detail and give a comparative analysis of their behavior on a set of competition and coordination stochastic games. Also, we introduce a new multiagent learning algorithm, called ModIGA. This is an extension of the IGA algorithm, which is able to estimate the strategy of its opponents in the cases when they do not explicitly play mixed strategies (e.g., APQ) and which can be applied to the games with more than two actions.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
MR Angiography of Dural Arteriovenous Fistulas: Diagnosis and Follow-Up after Treatment Using a Time-Resolved 3D Contrast-Enhanced Technique

Meckel, S., Maier, M., San Millan Ruiz, D., Yilmaz, H., Scheffler, K., Radü, E., Wetzel, S.

American Journal of Neuroradiology, 28(5):877-884, May 2007 (article)

Abstract
BACKGROUND AND PURPOSE: Digital subtraction angiography (DSA) is the method of reference for imaging of dural arteriovenous fistula (DAVF). The goal of this study was to analyze the value of different MR images including 3D contrast-enhanced MR angiography (MRA) with a high temporal resolution in diagnostic and follow-up imaging of DAVFs. MATERIALS AND METHODS: A total of 18 MR/MRA examinations from 14 patients with untreated (n = 9) and/or treated (n = 9) DAVFs were evaluated. Two observers assessed all MR and MRA investigations for signs indicating the presence of a DAVF, for fistula characteristics such as fistula grading, location of fistulous point, and fistula obliteration after treatment. All results were compared with DSA findings. RESULTS: On time-resolved 3D contrast-enhanced (TR 3D) MRA, the side and presence of all patent fistulas (n = 13) were correctly indicated, and no false-positive findings were observed in occluded DAVFs (n = 5). Grading of fistulas with this imaging technique was correct in 77% and 85% of patent fistulas for both readers, respectively. On T2-weighted images, signs indicative of a DAVF were encountered only in fistulas with cortical venous reflux (56%), whereas on 3D time-of-flight (TOF) MRA, most fistulas (88%) were correctly detected. In complete fistula occlusion, false-positive findings were encountered on both T2-weighted images and on TOF MRA images. CONCLUSION: In this study, TR 3D MRA proved reliable in detecting DAVFs and suitable for follow-up imaging. The technique allowed—within limitations—to grade DAVFs. Although 3D TOF MRA can depict signs of DAVFs, its value for follow-up imaging is limited.

Web [BibTex]

Web [BibTex]


no image
Bayesian Reconstruction of the Density of States

Habeck, M.

Physical Review Letters, 98(20, 200601):1-4, May 2007 (article)

Abstract
A Bayesian framework is developed to reconstruct the density of states from multiple canonical simulations. The framework encompasses the histogram reweighting method of Ferrenberg and Swendsen. The new approach applies to nonparametric as well as parametric models and does not require simulation data to be discretized. It offers a means to assess the precision of the reconstructed density of states and of derived thermodynamic quantities.

Web DOI [BibTex]

Web DOI [BibTex]


no image
PALMA: mRNA to Genome Alignments using Large Margin Algorithms

Schulze, U., Hepp, B., Ong, C., Rätsch, G.

Bioinformatics, 23(15):1892-1900, May 2007 (article)

Abstract
Motivation: Despite many years of research on how to properly align sequences in the presence of sequencing errors, alternative splicing and micro-exons, the correct alignment of mRNA sequences to genomic DNA is still a challenging task. Results: We present a novel approach based on large margin learning that combines accurate plice site predictions with common sequence alignment techniques. By solving a convex optimization problem, our algorithm – called PALMA – tunes the parameters of the model such that true alignments score higher than other alignments. We study the accuracy of alignments of mRNAs containing artificially generated micro-exons to genomic DNA. In a carefully designed experiment, we show that our algorithm accurately identifies the intron boundaries as well as boundaries of the optimal local alignment. It outperforms all other methods: for 5702 artificially shortened EST sequences from C. elegans and human it correctly identifies the intron boundaries in all except two cases. The best other method is a recently proposed method called exalin which misaligns 37 of the sequences. Our method also demonstrates robustness to mutations, insertions and deletions, retaining accuracy even at high noise levels. Availability: Datasets for training, evaluation and testing, additional results and a stand-alone alignment tool implemented in C++ and python are available at http://www.fml.mpg.de/raetsch/projects/palma.

Web DOI [BibTex]

Web DOI [BibTex]


no image
Impact of target-to-target interval on classification performance in the P300 speller

Martens, S., Hill, J., Farquhar, J., Schölkopf, B.

Scientific Meeting "Applied Neuroscience for Healthy Brain Function", May 2007 (talk)

PDF Web [BibTex]

PDF Web [BibTex]


no image
Cluster Identification in Nearest-Neighbor Graphs

Maier, M., Hein, M., von Luxburg, U.

(163), Max-Planck-Institute for Biological Cybernetics, Tübingen, Germany, May 2007 (techreport)

Abstract
Assume we are given a sample of points from some underlying distribution which contains several distinct clusters. Our goal is to construct a neighborhood graph on the sample points such that clusters are ``identified‘‘: that is, the subgraph induced by points from the same cluster is connected, while subgraphs corresponding to different clusters are not connected to each other. We derive bounds on the probability that cluster identification is successful, and use them to predict ``optimal‘‘ values of k for the mutual and symmetric k-nearest-neighbor graphs. We point out different properties of the mutual and symmetric nearest-neighbor graphs related to the cluster identification problem.

PDF [BibTex]

PDF [BibTex]


no image
The role of the striatum in adaptation learning: a computational model

Grosse-Wentrup, M., Contreras-Vidal, J.

Biological Cybernetics, 96(4):377-388, April 2007 (article)

Abstract
To investigate the functional role of the striatum in visuo-motor adaptation, we extend the DIRECT-model for visuo-motor reaching movements formulated by Bullock et al.(J Cogn Neurosci 5:408–435,1993) through two parallel loops, each modeling a distinct contribution of the cortico–cerebellar–thalamo–cortical and the cortico–striato–thalamo–cortical networks to visuo-motor adaptation. Based on evidence of Robertson and Miall(Neuroreport 10(5): 1029–1034, 1999), we implement the function of the cortico–cerebellar–thalamo–cortical loop as a module that gradually adapts to small changes in sensorimotor relationships. The cortico–striato–thalamo–cortical loop on the other hand is hypothesized to act as an adaptive search element, guessing new sensorimotor-transformations and reinforcing successful guesses while punishing unsuccessful ones. In a first step, we show that the model reproduces trajectories and error curves of healthy subjects in a two dimensional center-out reaching task with rotated screen cursor visual feedback. In a second step, we disable learning processes in the cortico–striato– thalamo–cortical loop to simulate subjects with Parkinson’s disease (PD), and show that this leads to error curves typical of subjects with PD. We conclude that the results support our hypothesis, i.e., that the role of the cortico–striato–thalamo–cortical loop in visuo-motor adaptation is that of an adaptive search element.

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Benchmarking of Policy Gradient Methods

Peters, J.

ADPRL Workshop, April 2007 (talk)

[BibTex]

[BibTex]


no image
Change-Point Detection using Krylov Subspace Learning

Ide, T., Tsuda, K.

In SDM 2007, pages: 515-520, (Editors: Apte, C. ), Society for Industrial and Applied Mathematics, Pittsburgh, PA, USA, SIAM International Conference on Data Mining, April 2007 (inproceedings)

Abstract
We propose an efficient algorithm for principal component analysis (PCA) that is applicable when only the inner product with a given vector is needed. We show that Krylov subspace learning works well both in matrix compression and implicit calculation of the inner product by taking full advantage of the arbitrariness of the seed vector. We apply our algorithm to a PCA-based change-point detection algorithm, and show that it results in about 50 times improvement in computational time.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Nonparametric Bayesian Discrete Latent Variable Models for Unsupervised Learning

Görür, D.

Biologische Kybernetik, Technische Universität Berlin, Berlin, Germany, April 2007, published online (phdthesis)

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
A Bayesian Approach to Nonlinear Parameter Identification for Rigid Body Dynamics

Ting, J., Mistry, M., Peters, J., Schaal, S., Nakanishi, J.

In RSS 2006, pages: 247-254, (Editors: Sukhatme, G. S., S. Schaal, W. Burgard, D. Fox), MIT Press, Cambridge, MA, USA, Robotics: Science and Systems II (RSS ), April 2007 (inproceedings)

Abstract
For robots of increasing complexity such as humanoid robots, conventional identification of rigid body dynamics models based on CAD data and actuator models becomes difficult and inaccurate due to the large number of additional nonlinear effects in these systems, e.g., stemming from stiff wires, hydraulic hoses, protective shells, skin, etc. Data driven parameter estimation offers an alternative model identification method, but it is often burdened by various other problems, such as significant noise in all measured or inferred variables of the robot. The danger of physically inconsistent results also exists due to unmodeled nonlinearities or insufficiently rich data. In this paper, we address all these problems by developing a Bayesian parameter identification method that can automatically detect noise in both input and output data for the regression algorithm that performs system identification. A post-processing step ensures physically consistent rigid body parameters by nonlinearly projecting the result of the Bayesian estimation onto constraints given by positive definite inertia matrices and the parallel axis theorem. We demonstrate on synthetic and actual robot data that our technique performs parameter identification with 5 to 20% higher accuracy than traditional methods. Due to the resulting physically consistent parameters, our algorithm enables us to apply advanced control methods that algebraically require physical consistency on robotic platforms.

PDF Web [BibTex]

PDF Web [BibTex]


no image
A robust fetal ECG detection method for abdominal recordings

Martens, SMM., Rabotti, C., Mischi, M., Sluijter, RJ.

Physiological Measurement, 28(4):373-388, April 2007, Martin Black Prize for best paper Physiological Measurement 2007 (article)

Abstract
In this paper, we propose a new method for FECG detection in abdominal recordings. The method consists of a sequential analysis approach, in which the a priori information about the interference signals is used for the detection of the FECG. Our method is evaluated on a set of 20 abdominal recordings from pregnant women with different gestational ages. Its performance in terms of fetal heart rate (FHR) detection success is compared with that of independent component analysis (ICA). The results show that our sequential estimation method outperforms ICA with a FHR detection rate of 85% versus 60% of ICA. The superior performance of our method is especially evident in recordings with a low signal-to-noise ratio (SNR). This indicates that our method is more robust than ICA for FECG detection.

DOI [BibTex]

DOI [BibTex]


no image
Learning causality by identifying common effects with kernel-based dependence measures

Sun, X., Janzing, D.

In ESANN 2007, pages: 453-458, D-Side, Evere, Belgium, 15th European Symposium on Artificial Neural Networks, April 2007 (inproceedings)

Abstract
We describe a method for causal inference that measures the strength of statistical dependence by the Hilbert-Schmidt norm of kernel-based conditional cross-covariance operators. We consider the increase of the dependence of two variables X and Y by conditioning on a third variable Z as a hint for Z being a common effect of X and Y. Based on this assumption, we collect "votes" for hypothetical causal directions and orient the edges according to the majority vote. For most of our experiments with artificial and real-world data our method has outperformed the conventional constraint-based inductive causation (IC) algorithm.

PDF Web [BibTex]

PDF Web [BibTex]