Header logo is ei


2008


no image
At-TAX: A Whole Genome Tiling Array Resource for Developmental Expression Analysis and Transcript Identification in Arabidopsis thaliana

Laubinger, S., Zeller, G., Henz, S., Sachsenberg, T., Widmer, C., Naouar, N., Vuylsteke, M., Schölkopf, B., Rätsch, G., Weigel, D.

Genome Biology, 9(7: R112):1-16, July 2008 (article)

Abstract
Gene expression maps for model organisms, including Arabidopsis thaliana, have typically been created using gene-centric expression arrays. Here, we describe a comprehensive expression atlas, Arabidopsis thaliana Tiling Array Express (At-TAX), which is based on whole-genome tiling arrays. We demonstrate that tiling arrays are accurate tools for gene expression analysis and identified more than 1,000 unannotated transcribed regions. Visualizations of gene expression estimates, transcribed regions, and tiling probe measurements are accessible online at the At-TAX homepage.

PDF DOI [BibTex]

2008

PDF DOI [BibTex]


no image
Motor Skill Learning for Cognitive Robotics

Peters, J.

6th International Cognitive Robotics Workshop (CogRob), July 2008 (talk)

Abstract
Autonomous robots that can assist humans in situations of daily life have been a long standing vision of robotics, artificial intelligence, and cognitive sciences. A first step towards this goal is to create robots that can learn tasks triggered by environmental context or higher level instruction. However, learning techniques have yet to live up to this promise as only few methods manage to scale to high-dimensional manipulator or humanoid robots. In this tutorial, we give a general overview on motor skill learning for cognitive robotics using research at ATR, USC, CMU and Max-Planck in order to illustrate the problems in motor skill learning. For doing so, we discuss task-appropriate representations and algorithms for learning robot motor skills. Among the topics are the learning basic movements or motor primitives by imitation and reinforcement learning, learning rhytmic and discrete movements, fast regression methods for learning inverse dynamics and setups for learning task-space policies. Examples on various robots, e.g., SARCOS DB, the SARCOS Master Arm, BDI Little Dog and a Barrett WAM, are shown and include Ball-in-a-Cup, T-Ball, Juggling, Devil-Sticking, Operational Space Control and many others.

Web [BibTex]

Web [BibTex]


no image
Painless Embeddings of Distributions: the Function Space View (Part 1)

Fukumizu, K., Gretton, A., Smola, A.

25th International Conference on Machine Learning (ICML), July 2008 (talk)

Abstract
This tutorial will give an introduction to the recent understanding and methodology of the kernel method: dealing with higher order statistics by embedding painlessly random variables/probability distributions. In the early days of kernel machines research, the "kernel trick" was considered a useful way of constructing nonlinear algorithms from linear ones. More recently, however, it has become clear that a potentially more far reaching use of kernels is as a linear way of dealing with higher order statistics by embedding distributions in a suitable reproducing kernel Hilbert space (RKHS). Notably, unlike the straightforward expansion of higher order moments or conventional characteristic function approach, the use of kernels or RKHS provides a painless, tractable way of embedding distributions. This line of reasoning leads naturally to the questions: what does it mean to embed a distribution in an RKHS? when is this embedding injective (and thus, when do different distributions have unique mappings)? what implications are there for learning algorithms that make use of these embeddings? This tutorial aims at answering these questions. There are a great variety of applications in machine learning and computer science, which require distribution estimation and/or comparison.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Reinforcement Learning for Robotics

Peters, J.

8th European Workshop on Reinforcement Learning for Robotics (EWRL), July 2008 (talk)

Web [BibTex]

Web [BibTex]


no image
Graphical Analysis of NMR Structural Quality and Interactive Contact Map of NOE Assignments in ARIA

Bardiaux, B., Bernard, A., Rieping, W., Habeck, M., Malliavin, T., Nilges, M.

BMC Structural Biology, 8(30):1-5, June 2008 (article)

Abstract
BACKGROUND: The Ambiguous Restraints for Iterative Assignment (ARIA) approach is widely used for NMR structure determination. It is based on simultaneously calculating structures and assigning NOE through an iterative protocol. The final solution consists of a set of conformers and a list of most probable assignments for the input NOE peak list. RESULTS: ARIA was extended with a series of graphical tools to facilitate a detailed analysis of the intermediate and final results of the ARIA protocol. These additional features provide (i) an interactive contact map, serving as a tool for the analysis of assignments, and (ii) graphical representations of structure quality scores and restraint statistics. The interactive contact map between residues can be clicked to obtain information about the restraints and their contributions. Profiles of quality scores are plotted along the protein sequence, and contact maps provide information of the agreement with the data on a residue pair level. CONCLUSIONS: The g raphical tools and outputs described here significantly extend the validation and analysis possibilities of NOE assignments given by ARIA as well as the analysis of the quality of the final structure ensemble. These tools are included in the latest version of ARIA, which is available at http://aria.pasteur.fr. The Web site also contains an installation guide, a user manual and example calculations.

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Operational Space Control: A Theoretical and Empirical Comparison

Nakanishi, J., Cory, R., Mistry, M., Peters, J., Schaal, S.

International Journal of Robotics Research, 27(6):737-757, June 2008 (article)

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Kernel Methods in Machine Learning

Hofmann, T., Schölkopf, B., Smola, A.

Annals of Statistics, 36(3):1171-1220, June 2008 (article)

Abstract
We review machine learning methods employing positive definite kernels. These methods formulate learning and estimation problems in a reproducing kernel Hilbert space (RKHS) of functions defined on the data domain, expanded in terms of a kernel. Working in linear spaces of function has the benefit of facilitating the construction and analysis of learning algorithms while at the same time allowing large classes of functions. The latter include nonlinear functions as well as functions defined on nonvectorial data.

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Cross-validation Optimization for Large Scale Structured Classification Kernel Methods

Seeger, M.

Journal of Machine Learning Research, 9, pages: 1147-1178, June 2008 (article)

Abstract
We propose a highly efficient framework for penalized likelihood kernel methods applied to multi-class models with a large, structured set of classes. As opposed to many previous approaches which try to decompose the fitting problem into many smaller ones, we focus on a Newton optimization of the complete model, making use of model structure and linear conjugate gradients in order to approximate Newton search directions. Crucially, our learning method is based entirely on matrix-vector multiplication primitives with the kernel matrices and their derivatives, allowing straightforward specialization to new kernels, and focusing code optimization efforts to these primitives only. Kernel parameters are learned automatically, by maximizing the cross-validation log likelihood in a gradient-based way, and predictive probabilities are estimated. We demonstrate our approach on large scale text classification tasks with hierarchical structure on thousands of classes, achieving state-of-the-art results in an order of magnitude less time than previous work.

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Multi-Classification by Categorical Features via Clustering

Seldin, Y.

25th International Conference on Machine Learning (ICML), June 2008 (talk)

Abstract
We derive a generalization bound for multi-classification schemes based on grid clustering in categorical parameter product spaces. Grid clustering partitions the parameter space in the form of a Cartesian product of partitions for each of the parameters. The derived bound provides a means to evaluate clustering solutions in terms of the generalization power of a built-on classifier. For classification based on a single feature the bound serves to find a globally optimal classification rule. Comparison of the generalization power of individual features can then be used for feature ranking. Our experiments show that in this role the bound is much more precise than mutual information or normalized correlation indices.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Thin-Plate Splines Between Riemannian Manifolds

Steinke, F., Hein, M., Schölkopf, B.

Workshop on Geometry and Statistics of Shapes, June 2008 (talk)

Abstract
With the help of differential geometry we describe a framework to define a thin-plate spline like energy for maps between arbitrary Riemannian manifolds. The so-called Eells energy only depends on the intrinsic geometry of the input and output manifold, but not on their respective representation. The energy can then be used for regression between manifolds, we present results for cases where the outputs are rotations, sets of angles, or points on 3D surfaces. In the future we plan to also target regression where the output is an element of "shape space", understood as a Riemannian manifold. One could also further explore the meaning of the Eells energy when applied to diffeomorphisms between shapes, especially with regard to its potential use as a distance measure between shapes that does not depend on the embedding or the parametrisation of the shapes.

Web [BibTex]

Web [BibTex]


no image
Integrated Information in Discrete Dynamical Systems: Motivation and Theoretical Framework

Balduzzi, D., Tononi, G.

PLoS Computational Biology, 4(6):1-18, June 2008 (article)

Abstract
This paper introduces a time- and state-dependent measure of integrated information, φ, which captures the repertoire of causal states available to a system as a whole. Specifically, φ quantifies how much information is generated (uncertainty is reduced) when a system enters a particular state through causal interactions among its elements, above and beyond the information generated independently by its parts. Such mathematical characterization is motivated by the observation that integrated information captures two key phenomenological properties of consciousness: (i) there is a large repertoire of conscious experiences so that, when one particular experience occurs, it generates a large amount of information by ruling out all the others; and (ii) this information is integrated, in that each experience appears as a whole that cannot be decomposed into independent parts. This paper extends previous work on stationary systems and applies integrated information to discrete networks as a function of their dynamics and causal architecture. An analysis of basic examples indicates the following: (i) φ varies depending on the state entered by a network, being higher if active and inactive elements are balanced and lower if the network is inactive or hyperactive. (ii) φ varies for systems with identical or similar surface dynamics depending on the underlying causal architecture, being low for systems that merely copy or replay activity states. (iii) φ varies as a function of network architecture. High φ values can be obtained by architectures that conjoin functional specialization with functional integration. Strictly modular and homogeneous systems cannot generate high φ because the former lack integration, whereas the latter lack information. Feedforward and lattice architectures are capable of generating high φ but are inefficient. (iv) In Hopfield networks, φ is low for attractor states and neutral states, but increases if the networks are optimized to achieve tension between local and global interactions. These basic examples appear to match well against neurobiological evidence concerning the neural substrates of consciousness. More generally, φ appears to be a useful metric to characterize the capacity of any physical system to integrate information.

Web DOI [BibTex]


no image
Information Consistency of Nonparametric Gaussian Process Methods

Seeger, MW., Kakade, SM., Foster, DP.

IEEE Transactions on Information Theory, 54(5):2376-2382, May 2008 (article)

Abstract
Abstract—Bayesian nonparametric models are widely and successfully used for statistical prediction. While posterior consistency properties are well studied in quite general settings, results have been proved using abstract concepts such as metric entropy, and they come with subtle conditions which are hard to validate and not intuitive when applied to concrete models. Furthermore, convergence rates are difficult to obtain. By focussing on the concept of information consistency for Bayesian Gaussian process (GP)models, consistency results and convergence rates are obtained via a regret bound on cumulative log loss. These results depend strongly on the covariance function of the prior process, thereby giving a novel interpretation to penalization with reproducing kernel Hilbert space norms and to commonly used covariance function classes and their parameters. The proof of the main result employs elementary convexity arguments only. A theorem of Widom is used in order to obtain precise convergence rates for several covariance functions widely used in practice.

Web DOI [BibTex]

Web DOI [BibTex]


no image
New Frontiers in Characterizing Structure and Dynamics by NMR

Nilges, M., Markwick, P., Malliavin, TE., Rieping, W., Habeck, M.

In Computational Structural Biology: Methods and Applications, pages: 655-680, (Editors: Schwede, T. , M. C. Peitsch), World Scientific, New Jersey, NJ, USA, May 2008 (inbook)

Abstract
Nuclear Magnetic Resonance (NMR) spectroscopy has emerged as the method of choice for studying both the structure and the dynamics of biological macromolecule in solution. Despite the maturity of the NMR method for structure determination, its application faces a number of challenges. The method is limited to systems of relatively small molecular mass, data collection times are long, data analysis remains a lengthy procedure, and it is difficult to evaluate the quality of the final structures. The last years have seen significant advances in experimental techniques to overcome or reduce some limitations. The function of bio-macromolecules is determined by both their 3D structure and conformational dynamics. These molecules are inherently flexible systems displaying a broad range of dynamics on time–scales from picoseconds to seconds. NMR is unique in its ability to obtain dynamic information on an atomic scale. The experimental information on structure and dynamics is intricately mixed. It is however difficult to unite both structural and dynamical information into one consistent model, and protocols for the determination of structure and dynamics are performed independently. This chapter deals with the challenges posed by the interpretation of NMR data on structure and dynamics. We will first relate the standard structure calculation methods to Bayesian probability theory. We will then briefly describe the advantages of a fully Bayesian treatment of structure calculation. Then, we will illustrate the advantages of using Bayesian reasoning at least partly in standard structure calculations. The final part will be devoted to interpretation of experimental data on dynamics.

Web [BibTex]

Web [BibTex]


no image
Learning resolved velocity control

Peters, J.

2008 IEEE International Conference on Robotics and Automation (ICRA), May 2008 (talk)

Web [BibTex]

Web [BibTex]


no image
Relating the Thermodynamic Arrow of Time to the Causal Arrow

Allahverdyan, A., Janzing, D.

Journal of Statistical Mechanics, 2008(P04001):1-21, April 2008 (article)

Abstract
Consider a Hamiltonian system that consists of a slow subsystem S and a fast subsystem F. The autonomous dynamics of S is driven by an effective Hamiltonian, but its thermodynamics is unexpected. We show that a well-defined thermodynamic arrow of time (second law) emerges for S whenever there is a well-defined causal arrow from S to F and the back-action is negligible. This is because the back-action of F on S is described by a non-globally Hamiltonian Born–Oppenheimer term that violates the Liouville theorem, and makes the second law inapplicable to S. If S and F are mixing, under the causal arrow condition they are described by microcanonical distributions P(S) and P(S|F). Their structure supports a causal inference principle proposed recently in machine learning.

Web DOI [BibTex]

Web DOI [BibTex]


no image
Generalization and Similarity in Exemplar Models of Categorization: Insights from Machine Learning

Jäkel, F., Schölkopf, B., Wichmann, F.

Psychonomic Bulletin and Review, 15(2):256-271, April 2008 (article)

Abstract
Exemplar theories of categorization depend on similarity for explaining subjects’ ability to generalize to new stimuli. A major criticism of exemplar theories concerns their lack of abstraction mechanisms and thus, seemingly, generalization ability. Here, we use insights from machine learning to demonstrate that exemplar models can actually generalize very well. Kernel methods in machine learning are akin to exemplar models and very successful in real-world applications. Their generalization performance depends crucially on the chosen similaritymeasure. While similarity plays an important role in describing generalization behavior it is not the only factor that controls generalization performance. In machine learning, kernel methods are often combined with regularization techniques to ensure good generalization. These same techniques are easily incorporated in exemplar models. We show that the Generalized Context Model (Nosofsky, 1986) and ALCOVE (Kruschke, 1992) are closely related to a statistical model called kernel logistic regression. We argue that generalization is central to the enterprise of understanding categorization behavior and suggest how insights from machine learning can offer some guidance. Keywords: kernel, similarity, regularization, generalization, categorization.

PDF Web DOI [BibTex]


no image
Bayesian methods for protein structure determination

Habeck, M.

Machine Learning in Structural Bioinformatics, April 2008 (talk)

Web [BibTex]

Web [BibTex]


no image
Manifold-valued Thin-plate Splines with Applications in Computer Graphics

Steinke, F., Hein, M., Peters, J., Schölkopf, B.

Computer Graphics Forum, 27(2):437-448, April 2008 (article)

Abstract
We present a generalization of thin-plate splines for interpolation and approximation of manifold-valued data, and demonstrate its usefulness in computer graphics with several applications from different fields. The cornerstone of our theoretical framework is an energy functional for mappings between two Riemannian manifolds which is independent of parametrization and respects the geometry of both manifolds. If the manifolds are Euclidean, the energy functional reduces to the classical thin-plate spline energy. We show how the resulting optimization problems can be solved efficiently in many cases. Our example applications range from orientation interpolation and motion planning in animation over geometric modelling tasks to color interpolation.

PDF AVI Web DOI [BibTex]


no image
Data-driven efficient score tests for deconvolution hypotheses

Langovoy, M.

Inverse Problems, 24(2):1-17, April 2008 (article)

Abstract
We consider testing statistical hypotheses about densities of signals in deconvolution models. A new approach to this problem is proposed. We constructed score tests for the deconvolution density testing with the known noise density and efficient score tests for the case of unknown density. The tests are incorporated with model selection rules to choose reasonable model dimensions automatically by the data. Consistency of the tests is proved.

PDF DOI [BibTex]


no image
The Metric Nearness Problem

Brickell, J., Dhillon, I., Sra, S., Tropp, J.

SIAM Journal on Matrix Analysis and Applications, 30(1):375-396, April 2008 (article)

Abstract
Metric nearness refers to the problem of optimally restoring metric properties to distance measurements that happen to be nonmetric due to measurement errors or otherwise. Metric data can be important in various settings, for example, in clustering, classification, metric-based indexing, query processing, and graph theoretic approximation algorithms. This paper formulates and solves the metric nearness problem: Given a set of pairwise dissimilarities, find a “nearest” set of distances that satisfy the properties of a metric—principally the triangle inequality. For solving this problem, the paper develops efficient triangle fixing algorithms that are based on an iterative projection method. An intriguing aspect of the metric nearness problem is that a special case turns out to be equivalent to the all pairs shortest paths problem. The paper exploits this equivalence and develops a new algorithm for the latter problem using a primal-dual method. Applications to graph clustering are provided as an illustratio n. We include experiments that demonstrate the computational superiority of triangle fixing over general purpose convex programming software. Finally, we conclude by suggesting various useful extensions and generalizations to metric nearness.

Web DOI [BibTex]

Web DOI [BibTex]


no image
Bayesian Inference and Optimal Design for the Sparse Linear Model

Seeger, MW.

Journal of Machine Learning Research, 9, pages: 759-813, April 2008 (article)

Abstract
The linear model with sparsity-favouring prior on the coefficients has important applications in many different domains. In machine learning, most methods to date search for maximum a posteriori sparse solutions and neglect to represent posterior uncertainties. In this paper, we address problems of Bayesian optimal design (or experiment planning), for which accurate estimates of uncertainty are essential. To this end, we employ expectation propagation approximate inference for the linear model with Laplace prior, giving new insight into numerical stability properties and proposing a robust algorithm. We also show how to estimate model hyperparameters by empirical Bayesian maximisation of the marginal likelihood, and propose ideas in order to scale up the method to very large underdetermined problems. We demonstrate the versatility of our framework on the application of gene regulatory network identification from micro-array expression data, where both the Laplace prior and the active experimental design approach are shown to result in significant improvements. We also address the problem of sparse coding of natural images, and show how our framework can be used for compressive sensing tasks.

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Consistency of Spectral Clustering

von Luxburg, U., Belkin, M., Bousquet, O.

Annals of Statistics, 36(2):555-586, April 2008 (article)

Abstract
Consistency is a key property of statistical algorithms when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of the popular family of spectral clustering algorithms, which clusters the data with the help of eigenvectors of graph Laplacian matrices. We develop new methods to establish that for increasing sample size, those eigenvectors converge to the eigenvectors of certain limit operators. As a result we can prove that one of the two major classes of spectral clustering (normalized clustering) converges under very general conditions, while the other (unnormalized clustering) is only consistent under strong additional assumptions, which are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering.

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Plant Classification from Bat-Like Echolocation Signals

Yovel, Y., Franz, MO., Stilz, P., Schnitzler, H-U.

PLoS Computational Biology, 4(3, e1000032):1-13, March 2008 (article)

Abstract
Classification of plants according to their echoes is an elementary component of bat behavior that plays an important role in spatial orientation and food acquisition. Vegetation echoes are, however, highly complex stochastic signals: from an acoustical point of view, a plant can be thought of as a three-dimensional array of leaves reflecting the emitted bat call. The received echo is therefore a superposition of many reflections. In this work we suggest that the classification of these echoes might not be such a troublesome routine for bats as formerly thought. We present a rather simple approach to classifying signals from a large database of plant echoes that were created by ensonifying plants with a frequency-modulated bat-like ultrasonic pulse. Our algorithm uses the spectrogram of a single echo from which it only uses features that are undoubtedly accessible to bats. We used a standard machine learning algorithm (SVM) to automatically extract suitable linear combinations of time and frequency cues from the spectrograms such that classification with high accuracy is enabled. This demonstrates that ultrasonic echoes are highly informative about the species membership of an ensonified plant, and that this information can be extracted with rather simple, biologically plausible analysis. Thus, our findings provide a new explanatory basis for the poorly understood observed abilities of bats in classifying vegetation and other complex objects.

Web DOI [BibTex]

Web DOI [BibTex]


no image
Causal Reasoning by Evaluating the Complexity of Conditional Densities with Kernel Methods

Sun, X., Janzing, D., Schölkopf, B.

Neurocomputing, 71(7-9):1248-1256, March 2008 (article)

Abstract
We propose a method to quantify the complexity of conditional probability measures by a Hilbert space seminorm of the logarithm of its density. The concept of reproducing kernel Hilbert spaces (RKHSs) is a flexible tool to define such a seminorm by choosing an appropriate kernel. We present several examples with artificial data sets where our kernel-based complexity measure is consistent with our intuitive understanding of complexity of densities. The intention behind the complexity measure is to provide a new approach to inferring causal directions. The idea is that the factorization of the joint probability measure P(effect, cause) into P(effect|cause)P(cause) leads typically to "simpler" and "smoother" terms than the factorization into P(cause|effect)P(effect). Since the conventional constraint-based approach of causal discovery is not able to determine the causal direction between only two variables, our inference principle can in particular be useful when combined with other existing methods. We provide several simple examples with real-world data where the true causal directions indeed lead to simpler (conditional) densities.

Web DOI [BibTex]


no image
Natural Actor-Critic

Peters, J., Schaal, S.

Neurocomputing, 71(7-9):1180-1190, March 2008 (article)

Abstract
In this paper, we suggest a novel reinforcement learning architecture, the Natural Actor-Critic. The actor updates are achieved using stochastic policy gradients em- ploying Amari’s natural gradient approach, while the critic obtains both the natural policy gradient and additional parameters of a value function simultaneously by lin- ear regression. We show that actor improvements with natural policy gradients are particularly appealing as these are independent of coordinate frame of the chosen policy representation, and can be estimated more efficiently than regular policy gra- dients. The critic makes use of a special basis function parameterization motivated by the policy-gradient compatible function approximation. We show that several well-known reinforcement learning methods such as the original Actor-Critic and Bradtke’s Linear Quadratic Q-Learning are in fact Natural Actor-Critic algorithms. Empirical evaluations illustrate the effectiveness of our techniques in comparison to previous methods, and also demonstrate their applicability for learning control on an anthropomorphic robot arm.

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Inferring Spike Trains From Local Field Potentials

Rasch, M., Gretton, A., Murayama, Y., Maass, W., Logothetis, N.

Journal of Neurophysiology, 99(3):1461-1476, March 2008 (article)

Abstract
We investigated whether it is possible to infer spike trains solely on the basis of the underlying local field potentials (LFPs). Using support vector machines and linear regression models, we found that in the primary visual cortex (V1) of monkeys, spikes can indeed be inferred from LFPs, at least with moderate success. Although there is a considerable degree of variation across electrodes, the low-frequency structure in spike trains (in the 100-ms range) can be inferred with reasonable accuracy, whereas exact spike positions are not reliably predicted. Two kinds of features of the LFP are exploited for prediction: the frequency power of bands in the high gamma-range (40–90 Hz) and information contained in lowfrequency oscillations ( 10 Hz), where both phase and power modulations are informative. Information analysis revealed that both features code (mainly) independent aspects of the spike-to-LFP relationship, with the low-frequency LFP phase coding for temporally clustered spiking activity. Although both features and prediction quality are similar during seminatural movie stimuli and spontaneous activity, prediction performance during spontaneous activity degrades much more slowly with increasing electrode distance. The general trend of data obtained with anesthetized animals is qualitatively mirrored in that of a more limited data set recorded in V1 of non-anesthetized monkeys. In contrast to the cortical field potentials, thalamic LFPs (e.g., LFPs derived from recordings in the dorsal lateral geniculate nucleus) hold no useful information for predicting spiking activity.

Web DOI [BibTex]

Web DOI [BibTex]


no image
Poisson Geometry of Parabolic Bundles on Elliptic Curves

Balduzzi, D.

International Journal of Mathematics , 19(3):339-367, March 2008 (article)

Abstract
The moduli space of G-bundles on an elliptic curve with additional flag structure admits a Poisson structure. The bivector can be defined using double loop group, loop group and sheaf cohomology constructions. We investigate the links between these methods and for the case SL2 perform explicit computations, describing the bracket and its leaves in detail.

Web DOI [BibTex]

Web DOI [BibTex]


no image
ISD: A Software Package for Bayesian NMR Structure Calculation

Rieping, W., Nilges, M., Habeck, M.

Bioinformatics, 24(8):1104-1105, February 2008 (article)

Abstract
SUMMARY: The conventional approach to calculating biomolecular structures from nuclear magnetic resonance (NMR) data is often viewed as subjective due to its dependence on rules of thumb for deriving geometric constraints and suitable values for theory parameters from noisy experimental data. As a result, it can be difficult to judge the precision of an NMR structure in an objective manner. The Inferential Structure Determination (ISD) framework, which has been introduced recently, addresses this problem by using Bayesian inference to derive a probability distribution that represents both the unknown structure and its uncertainty. It also determines additional unknowns, such as theory parameters, that normally need be chosen empirically. Here we give an overview of the ISD software package, which implements this methodology. AVAILABILITY: The program is available at http://www.bioc.cam.ac.uk/isd

Web DOI [BibTex]

Web DOI [BibTex]


no image
Probabilistic Structure Calculation

Nilges, M., Habeck, M., Rieping, W.

Comptes Rendus Chimie, 11(4-5):356-369, February 2008 (article)

Abstract
Molecular structures are usually calculated from experimental data with some method of energy minimisation or non-linear optimisation. Key aims of a structure calculation are to estimate the coordinate uncertainty, and to provide a meaningful measure of the quality of the fit to the data. We discuss approaches to optimally combine prior information and experimental data and the connection to probability theory. We analyse the appropriate statistics for NOEs and NOE-derived distances, and the related question of restraint potentials. Finally, we will discuss approaches to determine the appropriate weight on the experimental evidence and to obtain in this way an estimate of the data quality from the structure calculation. Whereas objective estimates of coordinates and their uncertainties can only be obtained by a full Bayesian treatment of the problem, standard structure calculation methods continue to play an important role. To obtain the full benefit of these methods, they should be founded on a rigorous Baye sian analysis.

Web DOI [BibTex]

Web DOI [BibTex]


no image
Fast Projection-based Methods for the Least Squares Nonnegative Matrix Approximation Problem

Kim, D., Sra, S., Dhillon, I.

Statistical Analysis and Data Mining, 1(1):38-51, February 2008 (article)

Abstract
Nonnegative matrix approximation (NNMA) is a popular matrix decomposition technique that has proven to be useful across a diverse variety of fields with applications ranging from document analysis and image processing to bioinformatics and signal processing. Over the years, several algorithms for NNMA have been proposed, e.g. Lee and Seung‘s multiplicative updates, alternating least squares (ALS), and gradient descent-based procedures. However, most of these procedures suffer from either slow convergence, numerical instability, or at worst, serious theoretical drawbacks. In this paper, we develop a new and improved algorithmic framework for the least-squares NNMA problem, which is not only theoretically well-founded, but also overcomes many deficiencies of other methods. Our framework readily admits powerful optimization techniques and as concrete realizations we present implementations based on the Newton, BFGS and conjugate gradient methods. Our algorithms provide numerical resu lts supe rior to both Lee and Seung‘s method as well as to the alternating least squares heuristic, which was reported to work well in some situations but has no theoretical guarantees[1]. Our approach extends naturally to include regularization and box-constraints without sacrificing convergence guarantees. We present experimental results on both synthetic and real-world datasets that demonstrate the superiority of our methods, both in terms of better approximations as well as computational efficiency.

Web DOI [BibTex]

Web DOI [BibTex]


no image
Optimization Techniques for Semi-Supervised Support Vector Machines

Chapelle, O., Sindhwani, V., Keerthi, S.

Journal of Machine Learning Research, 9, pages: 203-233, February 2008 (article)

Abstract
Due to its wide applicability, the problem of semi-supervised classification is attracting increasing attention in machine learning. Semi-Supervised Support Vector Machines (S3VMs) are based on applying the margin maximization principle to both labeled and unlabeled examples. Unlike SVMs, their formulation leads to a non-convex optimization problem. A suite of algorithms have recently been proposed for solving S3VMs. This paper reviews key ideas in this literature. The performance and behavior of various S3VMs algorithms is studied together, under a common experimental setting.

PDF [BibTex]

PDF [BibTex]


no image
A Unifying Probabilistic Framework for Analyzing Residual Dipolar Couplings

Habeck, M., Nilges, M., Rieping, W.

Journal of Biomolecular NMR, 40(2):135-144, February 2008 (article)

Abstract
Residual dipolar couplings provide complementary information to the nuclear Overhauser effect measurements that are traditionally used in biomolecular structure determination by NMR. In a de novo structure determination, however, lack of knowledge about the degree and orientation of molecular alignment complicates the analysis of dipolar coupling data. We present a probabilistic framework for analyzing residual dipolar couplings and demonstrate that it is possible to estimate the atomic coordinates, the complete molecular alignment tensor, and the error of the couplings simultaneously. As a by-product, we also obtain estimates of the uncertainty in the coordinates and the alignment tensor. We show that our approach encompasses existing methods for determining the alignment tensor as special cases, including least squares estimation, histogram fitting, and elimination of an explicit alignment tensor in the restraint energy.

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Contour-propagation Algorithms for Semi-automated Reconstruction of Neural Processes

Macke, J., Maack, N., Gupta, R., Denk, W., Schölkopf, B., Borst, A.

Journal of Neuroscience Methods, 167(2):349-357, January 2008 (article)

Abstract
A new technique, ”Serial Block Face Scanning Electron Microscopy” (SBFSEM), allows for automatic sectioning and imaging of biological tissue with a scanning electron microscope. Image stacks generated with this technology have a resolution sufficient to distinguish different cellular compartments, including synaptic structures, which should make it possible to obtain detailed anatomical knowledge of complete neuronal circuits. Such an image stack contains several thousands of images and is recorded with a minimal voxel size of 10-20nm in the x and y- and 30nm in z-direction. Consequently, a tissue block of 1mm3 (the approximate volume of the Calliphora vicina brain) will produce several hundred terabytes of data. Therefore, highly automated 3D reconstruction algorithms are needed. As a first step in this direction we have developed semiautomated segmentation algorithms for a precise contour tracing of cell membranes. These algorithms were embedded into an easy-to-operate user interface, which allows direct 3D observation of the extracted objects during the segmentation of image stacks. Compared to purely manual tracing, processing time is greatly accelerated.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
A Quantum-Statistical-Mechanical Extension of Gaussian Mixture Model

Tanaka, K., Tsuda, K.

Journal of Physics: Conference Series, 95(012023):1-9, January 2008 (article)

Abstract
We propose an extension of Gaussian mixture models in the statistical-mechanical point of view. The conventional Gaussian mixture models are formulated to divide all points in given data to some kinds of classes. We introduce some quantum states constructed by superposing conventional classes in linear combinations. Our extension can provide a new algorithm in classifications of data by means of linear response formulas in the statistical mechanics.

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
A Robot System for Biomimetic Navigation: From Snapshots to Metric Embeddings of View Graphs

Franz, MO., Stürzl, W., Reichardt, W., Mallot, HA.

In Robotics and Cognitive Approaches to Spatial Mapping, pages: 297-314, Springer Tracts in Advanced Robotics ; 38, (Editors: Jefferies, M.E. , W.-K. Yeap), Springer, Berlin, Germany, 2008 (inbook)

Abstract
Complex navigation behaviour (way-finding) involves recognizing several places and encoding a spatial relationship between them. Way-finding skills can be classified into a hierarchy according to the complexity of the tasks that can be performed [8]. The most basic form of way-finding is route navigation, followed by topological navigation where several routes are integrated into a graph-like representation. The highest level, survey navigation, is reached when this graph can be embedded into a common reference frame. In this chapter, we present the building blocks for a biomimetic robot navigation system that encompasses all levels of this hierarchy. As a local navigation method, we use scene-based homing. In this scheme, a goal location is characterized either by a panoramic snapshot of the light intensities as seen from the place, or by a record of the distances to the surrounding objects. The goal is found by moving in the direction that minimizes the discrepancy between the recorded intensities or distances and the current sensory input. For learning routes, the robot selects distinct views during exploration that are close enough to be reached by snapshot-based homing. When it encounters already visited places during route learning, it connects the routes and thus forms a topological representation of its environment termed a view graph. The final stage, survey navigation, is achieved by a graph embedding procedure which complements the topologic information of the view graph with odometric position estimates. Calculation of the graph embedding is done with a modified multidimensional scaling algorithm which makes use of distances and angles between nodes.

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Minimal Nonlinear Distortion Principle for Nonlinear Independent Component Analysis

Zhang, K., Chan, L.

Journal of Machine Learning Research, 9, pages: 2455-2487, 2008 (article)

Abstract
It is well known that solutions to the nonlinear independent component analysis (ICA) problem are highly non-unique. In this paper we propose the "minimal nonlinear distortion" (MND) principle for tackling the ill-posedness of nonlinear ICA problems. MND prefers the nonlinear ICA solution with the estimated mixing procedure as close as possible to linear, among all possible solutions. It also helps to avoid local optima in the solutions. To achieve MND, we exploit a regularization term to minimize the mean square error between the nonlinear mixing mapping and the best-fitting linear one. The effect of MND on the inherent trivial and non-trivial indeterminacies in nonlinear ICA solutions is investigated. Moreover, we show that local MND is closely related to the smoothness regularizer penalizing large curvature, which provides another useful regularization condition for nonlinear ICA. Experiments on synthetic data show the usefulness of the MND principle for separating various nonlinear mixtures. Finally, as an application, we use nonlinear ICA with MND to separate daily returns of a set of stocks in Hong Kong, and the linear causal relations among them are successfully discovered. The resulting causal relations give some interesting insights into the stock market. Such a result can not be achieved by linear ICA. Simulation studies also verify that when doing causality discovery, sometimes one should not ignore the nonlinear distortion in the data generation procedure, even if it is weak.

Web [BibTex]

Web [BibTex]


no image
Transport processes in networks with scattering ramification nodes

Radl, A.

Journal of Applied Functional Analysis, 3, pages: 461-483, 2008 (article)

Web [BibTex]

Web [BibTex]


no image
Learning to control in operational space

Peters, J., Schaal, S.

International Journal of Robotics Research, 27, pages: 197-212, 2008, clmc (article)

Abstract
One of the most general frameworks for phrasing control problems for complex, redundant robots is operational space control. However, while this framework is of essential importance for robotics and well-understood from an analytical point of view, it can be prohibitively hard to achieve accurate control in face of modeling errors, which are inevitable in com- plex robots, e.g., humanoid robots. In this paper, we suggest a learning approach for opertional space control as a direct inverse model learning problem. A first important insight for this paper is that a physically cor- rect solution to the inverse problem with redundant degrees-of-freedom does exist when learning of the inverse map is performed in a suitable piecewise linear way. The second crucial component for our work is based on the insight that many operational space controllers can be understood in terms of a constrained optimal control problem. The cost function as- sociated with this optimal control problem allows us to formulate a learn- ing algorithm that automatically synthesizes a globally consistent desired resolution of redundancy while learning the operational space controller. From the machine learning point of view, this learning problem corre- sponds to a reinforcement learning problem that maximizes an immediate reward. We employ an expectation-maximization policy search algorithm in order to solve this problem. Evaluations on a three degrees of freedom robot arm are used to illustrate the suggested approach. The applica- tion to a physically realistic simulator of the anthropomorphic SARCOS Master arm demonstrates feasibility for complex high degree-of-freedom robots. We also show that the proposed method works in the setting of learning resolved motion rate control on real, physical Mitsubishi PA-10 medical robotics arm.

link (url) DOI [BibTex]

link (url) DOI [BibTex]

2006


no image
A Kernel Method for the Two-Sample-Problem

Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B., Smola, A.

20th Annual Conference on Neural Information Processing Systems (NIPS), December 2006 (talk)

Abstract
We propose two statistical tests to determine if two samples are from different distributions. Our test statistic is in both cases the distance between the means of the two samples mapped into a reproducing kernel Hilbert space (RKHS). The first test is based on a large deviation bound for the test statistic, while the second is based on the asymptotic distribution of this statistic. We show that the test statistic can be computed in $O(m^2)$ time. We apply our approach to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where our test performs strongly. We also demonstrate excellent performance when comparing distributions over graphs, for which no alternative tests currently exist.

PDF [BibTex]

2006

PDF [BibTex]


no image
Ab-initio gene finding using machine learning

Schweikert, G., Zeller, G., Zien, A., Ong, C., de Bona, F., Sonnenburg, S., Phillips, P., Rätsch, G.

NIPS Workshop on New Problems and Methods in Computational Biology, December 2006 (talk)

Web [BibTex]

Web [BibTex]


no image
Reinforcement Learning by Reward-Weighted Regression

Peters, J.

NIPS Workshop: Towards a New Reinforcement Learning? , December 2006 (talk)

Web [BibTex]

Web [BibTex]


no image
Graph boosting for molecular QSAR analysis

Saigo, H., Kadowaki, T., Kudo, T., Tsuda, K.

NIPS Workshop on New Problems and Methods in Computational Biology, December 2006 (talk)

Abstract
We propose a new boosting method that systematically combines graph mining and mathematical programming-based machine learning. Informative and interpretable subgraph features are greedily found by a series of graph mining calls. Due to our mathematical programming formulation, subgraph features and pre-calculated real-valued features are seemlessly integrated. We tested our algorithm on a quantitative structure-activity relationship (QSAR) problem, which is basically a regression problem when given a set of chemical compounds. In benchmark experiments, the prediction accuracy of our method favorably compared with the best results reported on each dataset.

Web [BibTex]

Web [BibTex]


no image
Inferring Causal Directions by Evaluating the Complexity of Conditional Distributions

Sun, X., Janzing, D., Schölkopf, B.

NIPS Workshop on Causality and Feature Selection, December 2006 (talk)

Abstract
We propose a new approach to infer the causal structure that has generated the observed statistical dependences among n random variables. The idea is that the factorization of the joint measure of cause and effect into P(cause)P(effect|cause) leads typically to simpler conditionals than non-causal factorizations. To evaluate the complexity of the conditionals we have tried two methods. First, we have compared them to those which maximize the conditional entropy subject to the observed first and second moments since we consider the latter as the simplest conditionals. Second, we have fitted the data with conditional probability measures being exponents of functions in an RKHS space and defined the complexity by a Hilbert-space semi-norm. Such a complexity measure has several properties that are useful for our purpose. We describe some encouraging results with both methods applied to real-world data. Moreover, we have combined constraint-based approaches to causal discovery (i.e., methods using only information on conditional statistical dependences) with our method in order to distinguish between causal hypotheses which are equivalent with respect to the imposed independences. Furthermore, we compare the performance to Bayesian approaches to causal inference.

Web [BibTex]


no image
Structure validation of the Josephin domain of ataxin-3: Conclusive evidence for an open conformation

Nicastro, G., Habeck, M., Masino, L., Svergun, DI., Pastore, A.

Journal of Biomolecular NMR, 36(4):267-277, December 2006 (article)

Abstract
The availability of new and fast tools in structure determination has led to a more than exponential growth of the number of structures solved per year. It is therefore increasingly essential to assess the accuracy of the new structures by reliable approaches able to assist validation. Here, we discuss a specific example in which the use of different complementary techniques, which include Bayesian methods and small angle scattering, resulted essential for validating the two currently available structures of the Josephin domain of ataxin-3, a protein involved in the ubiquitin/proteasome pathway and responsible for neurodegenerative spinocerebellar ataxia of type 3. Taken together, our results demonstrate that only one of the two structures is compatible with the experimental information. Based on the high precision of our refined structure, we show that Josephin contains an open cleft which could be directly implicated in the interaction with polyubiquitin chains and other partners.

Web DOI [BibTex]

Web DOI [BibTex]


no image
A Unifying View of Wiener and Volterra Theory and Polynomial Kernel Regression

Franz, M., Schölkopf, B.

Neural Computation, 18(12):3097-3118, December 2006 (article)

Abstract
Volterra and Wiener series are perhaps the best understood nonlinear system representations in signal processing. Although both approaches have enjoyed a certain popularity in the past, their application has been limited to rather low-dimensional and weakly nonlinear systems due to the exponential growth of the number of terms that have to be estimated. We show that Volterra and Wiener series can be represented implicitly as elements of a reproducing kernel Hilbert space by utilizing polynomial kernels. The estimation complexity of the implicit representation is linear in the input dimensionality and independent of the degree of nonlinearity. Experiments show performance advantages in terms of convergence, interpretability, and system sizes that can be handled.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Learning Optimal EEG Features Across Time, Frequency and Space

Farquhar, J., Hill, J., Schölkopf, B.

NIPS Workshop on Current Trends in Brain-Computer Interfacing, December 2006 (talk)

PDF Web [BibTex]

PDF Web [BibTex]


no image
Semi-Supervised Learning

Zien, A.

Advanced Methods in Sequence Analysis Lectures, November 2006 (talk)

Web [BibTex]

Web [BibTex]


no image
Prediction of Protein Function from Networks

Shin, H., Tsuda, K.

In Semi-Supervised Learning, pages: 361-376, Adaptive Computation and Machine Learning, (Editors: Chapelle, O. , B. Schölkopf, A. Zien), MIT Press, Cambridge, MA, USA, November 2006 (inbook)

Abstract
In computational biology, it is common to represent domain knowledge using graphs. Frequently there exist multiple graphs for the same set of nodes, representing information from different sources, and no single graph is sufficient to predict class labels of unlabelled nodes reliably. One way to enhance reliability is to integrate multiple graphs, since individual graphs are partly independent and partly complementary to each other for prediction. In this chapter, we describe an algorithm to assign weights to multiple graphs within graph-based semi-supervised learning. Both predicting class labels and searching for weights for combining multiple graphs are formulated into one convex optimization problem. The graph-combining method is applied to functional class prediction of yeast proteins.When compared with individual graphs, the combined graph with optimized weights performs significantly better than any single graph.When compared with the semidefinite programming-based support vector machine (SDP/SVM), it shows comparable accuracy in a remarkably short time. Compared with a combined graph with equal-valued weights, our method could select important graphs without loss of accuracy, which implies the desirable property of integration with selectivity.

Web [BibTex]

Web [BibTex]