Header logo is ei


2013


no image
Comparative Classifier Evaluation for Web-Scale Taxonomies Using Power Law

Babbar, R., Partalas, I., Metzig, C., Gaussier, E., Amini, M.

In The Semantic Web: ESWC 2013 Satellite Events, Lecture Notes in Computer Science, Vol. 7955 , pages: 310-311, (Editors: P Cimiano and M Fernández and V Lopez and S Schlobach and J Völker), Springer, ESWC, 2013 (inproceedings)

Web [BibTex]

2013

Web [BibTex]


no image
Model-based Imitation Learning by Probabilistic Trajectory Matching

Englert, P., Paraschos, A., Peters, J., Deisenroth, M.

In Proceedings of 2013 IEEE International Conference on Robotics and Automation (ICRA 2013), pages: 1922-1927, 2013 (inproceedings)

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Towards neurofeedback for improving visual attention

Zander, T., Battes, B., Schölkopf, B., Grosse-Wentrup, M.

In Proceedings of the Fifth International Brain-Computer Interface Meeting: Defining the Future, pages: Article ID: 086, (Editors: J.d.R. Millán, S. Gao, R. Müller-Putz, J.R. Wolpaw, and J.E. Huggins), Verlag der Technischen Universität Graz, 5th International Brain-Computer Interface Meeting, 2013, Article ID: 086 (inproceedings)

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
A Guided Hybrid Genetic Algorithm for Feature Selection with Expensive Cost Functions

Jung, M., Zscheischler, J.

In Proceedings of the International Conference on Computational Science, 18, pages: 2337 - 2346, Procedia Computer Science, (Editors: Alexandrov, V and Lees, M and Krzhizhanovskaya, V and Dongarra, J and Sloot, PMA), Elsevier, Amsterdam, Netherlands, ICCS, 2013 (inproceedings)

Web DOI [BibTex]

Web DOI [BibTex]


no image
Learning responsive robot behavior by imitation

Ben Amor, H., Vogt, D., Ewerton, M., Berger, E., Jung, B., Peters, J.

In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2013), pages: 3257-3264, IEEE, 2013 (inproceedings)

DOI [BibTex]

DOI [BibTex]


no image
Learning Skills with Motor Primitives

Peters, J., Kober, J., Mülling, K., Kroemer, O., Neumann, G.

In Proceedings of the 16th Yale Workshop on Adaptive and Learning Systems, 2013 (inproceedings)

[BibTex]

[BibTex]


no image
Scalable Influence Estimation in Continuous-Time Diffusion Networks

Du, N., Song, L., Gomez Rodriguez, M., Zha, H.

In Advances in Neural Information Processing Systems 26, pages: 3147-3155, (Editors: C.J.C. Burges and L. Bottou and M. Welling and Z. Ghahramani and K.Q. Weinberger), 27th Annual Conference on Neural Information Processing Systems (NIPS), 2013 (inproceedings)

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Rapid Distance-Based Outlier Detection via Sampling

Sugiyama, M., Borgwardt, KM.

In Advances in Neural Information Processing Systems 26, pages: 467-475, (Editors: C.J.C. Burges and L. Bottou and M. Welling and Z. Ghahramani and K.Q. Weinberger), 27th Annual Conference on Neural Information Processing Systems (NIPS), 2013 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Probabilistic Movement Primitives

Paraschos, A., Daniel, C., Peters, J., Neumann, G.

In Advances in Neural Information Processing Systems 26, pages: 2616-2624, (Editors: C.J.C. Burges and L. Bottou and M. Welling and Z. Ghahramani and K.Q. Weinberger), 27th Annual Conference on Neural Information Processing Systems (NIPS), 2013 (inproceedings)

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Causal Inference on Time Series using Restricted Structural Equation Models

Peters, J., Janzing, D., Schölkopf, B.

In Advances in Neural Information Processing Systems 26, pages: 154-162, (Editors: C.J.C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger), 27th Annual Conference on Neural Information Processing Systems (NIPS), 2013 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Regression-tree Tuning in a Streaming Setting

Kpotufe, S., Orabona, F.

In Advances in Neural Information Processing Systems 26, pages: 1788-1796, (Editors: C.J.C. Burges and L. Bottou and M. Welling and Z. Ghahramani and K.Q. Weinberger), 27th Annual Conference on Neural Information Processing Systems (NIPS), 2013 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Density estimation from unweighted k-nearest neighbor graphs: a roadmap

von Luxburg, U., Alamgir, M.

In Advances in Neural Information Processing Systems 26, pages: 225-233, (Editors: C.J.C. Burges and L. Bottou and M. Welling and Z. Ghahramani and K.Q. Weinberger), 27th Annual Conference on Neural Information Processing Systems (NIPS), 2013 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
PAC-Bayes-Empirical-Bernstein Inequality

Tolstikhin, I. O., Seldin, Y.

In Advances in Neural Information Processing Systems 26, pages: 109-117, (Editors: C.J.C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger), 27th Annual Conference on Neural Information Processing Systems (NIPS), 2013 (inproceedings)

link (url) [BibTex]

link (url) [BibTex]


no image
Linear mixed models for genome-wide association studies

Lippert, C.

University of Tübingen, Germany, 2013 (phdthesis)

[BibTex]

[BibTex]


no image
PLAL: Cluster-based active learning

Urner, R., Wulff, S., Ben-David, S.

In Proceedings of the 26th Annual Conference on Learning Theory, 30, pages: 376-397, (Editors: Shalev-Shwartz, S. and Steinwart, I.), JMLR, COLT, 2013 (inproceedings)

link (url) [BibTex]

link (url) [BibTex]


no image
Monochromatic Bi-Clustering

Wulff, S., Urner, R., Ben-David, S.

In Proceedings of the 30th International Conference on Machine Learning, 28, pages: 145-153, (Editors: Dasgupta, S. and McAllester, D.), JMLR, ICML, 2013 (inproceedings)

link (url) [BibTex]

link (url) [BibTex]


no image
Significance of variable height-bandwidth group delay filters in the spectral reconstruction of speech

Devanshu, A., Raj, A., Hegde, R. M.

INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, pages: 1682-1686, 2013 (conference)

link (url) [BibTex]

link (url) [BibTex]


no image
Generative Multiple-Instance Learning Models For Quantitative Electromyography

Adel, T., Smith, B., Urner, R., Stashuk, D., Lizotte, D. J.

In Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence, AUAI Press, UAI, 2013 (inproceedings)

link (url) [BibTex]

link (url) [BibTex]


no image
Modeling and Learning Complex Motor Tasks: A case study on Robot Table Tennis

Mülling, K.

Technical University Darmstadt, Germany, 2013 (phdthesis)

[BibTex]


no image
Automatic Malaria Diagnosis system

Mehrjou, A., Abbasian, T., Izadi, M.

In First RSI/ISM International Conference on Robotics and Mechatronics (ICRoM), pages: 205-211, 2013 (inproceedings)

DOI [BibTex]

DOI [BibTex]


no image
Abstraction in Decision-Makers with Limited Information Processing Capabilities

Genewein, T, Braun, DA

pages: 1-9, NIPS Workshop Planning with Information Constraints for Control, Reinforcement Learning, Computational Neuroscience, Robotics and Games, December 2013 (conference)

Abstract
A distinctive property of human and animal intelligence is the ability to form abstractions by neglecting irrelevant information which allows to separate structure from noise. From an information theoretic point of view abstractions are desirable because they allow for very efficient information processing. In artificial systems abstractions are often implemented through computationally costly formations of groups or clusters. In this work we establish the relation between the free-energy framework for decision-making and rate-distortion theory and demonstrate how the application of rate-distortion for decision-making leads to the emergence of abstractions. We argue that abstractions are induced due to a limit in information processing capacity.

link (url) [BibTex]

link (url) [BibTex]


no image
Bounded Rational Decision-Making in Changing Environments

Grau-Moya, J, Braun, DA

pages: 1-9, NIPS Workshop Planning with Information Constraints for Control, Reinforcement Learning, Computational Neuroscience, Robotics and Games, December 2013 (conference)

Abstract
A perfectly rational decision-maker chooses the best action with the highest utility gain from a set of possible actions. The optimality principles that describe such decision processes do not take into account the computational costs of finding the optimal action. Bounded rational decision-making addresses this problem by specifically trading off information-processing costs and expected utility. Interestingly, a similar trade-off between energy and entropy arises when describing changes in thermodynamic systems. This similarity has been recently used to describe bounded rational agents. Crucially, this framework assumes that the environment does not change while the decision-maker is computing the optimal policy. When this requirement is not fulfilled, the decision-maker will suffer inefficiencies in utility, that arise because the current policy is optimal for an environment in the past. Here we borrow concepts from non-equilibrium thermodynamics to quantify these inefficiencies and illustrate with simulations its relationship with computational resources.

link (url) [BibTex]

link (url) [BibTex]

2006


no image
Global Biclustering of Microarray Data

Wolf, T., Brors, B., Hofmann, T., Georgii, E.

In ICDMW 2006, pages: 125-129, (Editors: Tsumoto, S. , C. W. Clifton, N. Zhong, X. Wu, J. Liu, B. W. Wah, Y.-M. Cheung), IEEE Computer Society, Los Alamitos, CA, USA, Sixth IEEE International Conference on Data Mining, December 2006 (inproceedings)

Abstract
We consider the problem of simultaneously clustering genes and conditions of a gene expression data matrix. A bicluster is defined as a subset of genes that show similar behavior within a subset of conditions. Finding biclusters can be useful for revealing groups of genes involved in the same molecular process as well as groups of conditions where this process takes place. Previous work either deals with local, bicluster-based criteria or assumes a very specific structure of the data matrix (e.g. checkerboard or block-diagonal) [11]. In contrast, our goal is to find a set of flexibly arranged biclusters which is optimal in regard to a global objective function. As this is a NP-hard combinatorial problem, we describe several techniques to obtain approximate solutions. We benchmarked our approach successfully on the Alizadeh B-cell lymphoma data set [1].

Web DOI [BibTex]

2006

Web DOI [BibTex]


no image
Conformal Multi-Instance Kernels

Blaschko, M., Hofmann, T.

In NIPS 2006 Workshop on Learning to Compare Examples, pages: 1-6, NIPS Workshop on Learning to Compare Examples, December 2006 (inproceedings)

Abstract
In the multiple instance learning setting, each observation is a bag of feature vectors of which one or more vectors indicates membership in a class. The primary task is to identify if any vectors in the bag indicate class membership while ignoring vectors that do not. We describe here a kernel-based technique that defines a parametric family of kernels via conformal transformations and jointly learns a discriminant function over bags together with the optimal parameter settings of the kernel. Learning a conformal transformation effectively amounts to weighting regions in the feature space according to their contribution to classification accuracy; regions that are discriminative will be weighted higher than regions that are not. This allows the classifier to focus on regions contributing to classification accuracy while ignoring regions that correspond to vectors found both in positive and in negative bags. We show how parameters of this transformation can be learned for support vector machines by posing the problem as a multiple kernel learning problem. The resulting multiple instance classifier gives competitive accuracy for several multi-instance benchmark datasets from different domains.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Some observations on the pedestal effect or dipper function

Henning, B., Wichmann, F.

Journal of Vision, 6(13):50, 2006 Fall Vision Meeting of the Optical Society of America, December 2006 (poster)

Abstract
The pedestal effect is the large improvement in the detectabilty of a sinusoidal “signal” grating observed when the signal is added to a masking or “pedestal” grating of the same spatial frequency, orientation, and phase. We measured the pedestal effect in both broadband and notched noise - noise from which a 1.5-octave band centred on the signal frequency had been removed. Although the pedestal effect persists in broadband noise, it almost disappears in the notched noise. Furthermore, the pedestal effect is substantial when either high- or low-pass masking noise is used. We conclude that the pedestal effect in the absence of notched noise results principally from the use of information derived from channels with peak sensitivities at spatial frequencies different from that of the signal and pedestal. The spatial-frequency components of the notched noise above and below the spatial frequency of the signal and pedestal prevent the use of information about changes in contrast carried in channels tuned to spatial frequencies that are very much different from that of the signal and pedestal. Thus the pedestal or dipper effect measured without notched noise is not a characteristic of individual spatial-frequency tuned channels.

Web DOI [BibTex]

Web DOI [BibTex]


no image
Information-theoretic Metric Learning

Davis, J., Kulis, B., Sra, S., Dhillon, I.

In NIPS 2006 Workshop on Learning to Compare Examples, pages: 1-5, NIPS Workshop on Learning to Compare Examples, December 2006 (inproceedings)

Abstract
We formulate the metric learning problem as that of minimizing the differential relative entropy between two multivariate Gaussians under constraints on the Mahalanobis distance function. Via a surprising equivalence, we show that this problem can be solved as a low-rank kernel learning problem. Specifically, we minimize the Burg divergence of a low-rank kernel to an input kernel, subject to pairwise distance constraints. Our approach has several advantages over existing methods. First, we present a natural information-theoretic formulation for the problem. Second, the algorithm utilizes the methods developed by Kulis et al. [6], which do not involve any eigenvector computation; in particular, the running time of our method is faster than most existing techniques. Third, the formulation offers insights into connections between metric learning and kernel learning.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Pattern Mining in Frequent Dynamic Subgraphs

Borgwardt, KM., Kriegel, H-P., Wackersreuther, P.

In pages: 818-822, (Editors: Clifton, C.W.), IEEE Computer Society, Los Alamitos, CA, USA, Sixth International Conference on Data Mining (ICDM), December 2006 (inproceedings)

Abstract
Graph-structured data is becoming increasingly abundant in many application domains. Graph mining aims at finding interesting patterns within this data that represent novel knowledge. While current data mining deals with static graphs that do not change over time, coming years will see the advent of an increasing number of time series of graphs. In this article, we investigate how pattern mining on static graphs can be extended to time series of graphs. In particular, we are considering dynamic graphs with edge insertions and edge deletions over time. We define frequency in this setting and provide algorithmic solutions for finding frequent dynamic subgraph patterns. Existing subgraph mining algorithms can be easily integrated into our framework to make them handle dynamic graphs. Experimental results on real-world data confirm the practical feasibility of our approach.

Web DOI [BibTex]

Web DOI [BibTex]


no image
3DString: a feature string kernel for 3D object classification on voxelized data

Assfalg, J., Borgwardt, KM., Kriegel, H-P.

In pages: 198-207, (Editors: Yu, P.S. , V.J. Tsotras, E.A. Fox, B. Liu), ACM Press, New York, NY, USA, 15th ACM International Conference on Information and Knowledge Management (CIKM), November 2006 (inproceedings)

Abstract
Classification of 3D objects remains an important task in many areas of data management such as engineering, medicine or biology. As a common preprocessing step in current approaches to classification of voxelized 3D objects, voxel representations are transformed into a feature vector description.In this article, we introduce an approach of transforming 3D objects into feature strings which represent the distribution of voxels over the voxel grid. Attractively, this feature string extraction can be performed in linear runtime with respect to the number of voxels. We define a similarity measure on these feature strings that counts common k-mers in two input strings, which is referred to as the spectrum kernel in the field of kernel methods. We prove that on our feature strings, this similarity measure can be computed in time linear to the number of different characters in these strings. This linear runtime behavior makes our kernel attractive even for large datasets that occur in many application domains. Furthermore, we explain that our similarity measure induces a metric which allows to combine it with an M-tree for handling of large volumes of data. Classification experiments on two published benchmark datasets show that our novel approach is competitive with the best state-of-the-art methods for 3D object classification.

DOI [BibTex]

DOI [BibTex]


no image
Adapting Spatial Filter Methods for Nonstationary BCIs

Tomioka, R., Hill, J., Blankertz, B., Aihara, K.

In IBIS 2006, pages: 65-70, 2006 Workshop on Information-Based Induction Sciences, November 2006 (inproceedings)

Abstract
A major challenge in applying machine learning methods to Brain-Computer Interfaces (BCIs) is to overcome the possible nonstationarity in the data from the datablock the method is trained on and that the method is applied to. Assuming the joint distributions of the whitened signal and the class label to be identical in two blocks, where the whitening is done in each block independently, we propose a simple adaptation formula that is applicable to a broad class of spatial filtering methods including ICA, CSP, and logistic regression classifiers. We characterize the class of linear transformations for which the above assumption holds. Experimental results on 60 BCI datasets show improved classification accuracy compared to (a) fixed spatial filter approach (no adaptation) and (b) fixed spatial pattern approach (proposed by Hill et al., 2006 [1]).

PDF [BibTex]

PDF [BibTex]


no image
Optimizing Spatial Filters for BCI: Margin- and Evidence-Maximization Approaches

Farquhar, J., Hill, N., Schölkopf, B.

Challenging Brain-Computer Interfaces: MAIA Workshop 2006, pages: 1, November 2006 (poster)

Abstract
We present easy-to-use alternatives to the often-used two-stage Common Spatial Pattern + classifier approach for spatial filtering and classification of Event-Related Desychnronization signals in BCI. We report two algorithms that aim to optimize the spatial filters according to a criterion more directly related to the ability of the algorithms to generalize to unseen data. Both are based upon the idea of treating the spatial filter coefficients as hyperparameters of a kernel or covariance function. We then optimize these hyper-parameters directly along side the normal classifier parameters with respect to our chosen learning objective function. The two objectives considered are margin maximization as used in Support-Vector Machines and the evidence maximization framework used in Gaussian Processes. Our experiments assessed generalization error as a function of the number of training points used, on 9 BCI competition data sets and 5 offline motor imagery data sets measured in Tubingen. Both our approaches sho w consistent improvements relative to the commonly used CSP+linear classifier combination. Strikingly, the improvement is most significant in the higher noise cases, when either few trails are used for training, or with the most poorly performing subjects. This a reversal of the usual "rich get richer" effect in the development of CSP extensions, which tend to perform best when the signal is strong enough to accurately find their additional parameters. This makes our approach particularly suitable for clinical application where high levels of noise are to be expected.

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Extraction of visual features from natural video data using Slow Feature Analysis

Nickisch, H.

Biologische Kybernetik, Technische Universität Berlin, Berlin, Germany, September 2006 (diplomathesis)

Abstract
Das Forschungsprojekt NeuRoBot hat das un{\"u}berwachte Erlernen einer neuronal inspirierten Steuerungsarchitektur zum Ziel, und zwar unter den Randbedingungen biologischer Plausibilit{\"a}t und der Benutzung einer Kamera als einzigen Sensor. Visuelle Merkmale, die ein angemessenes Abbild der Umgebung liefern, sind unerl{\"a}sslich, um das Ziel kollisionsfreier Navigation zu erreichen. Zeitliche Koh{\"a}renz ist ein neues Lernprinzip, das in der Lage ist, Erkenntnisse aus der Biologie des Sehens zu reproduzieren. Es wird durch die Beobachtung motiviert, dass die “Sensoren” der Retina auf deutlich k{\"u}rzeren Zeitskalen variieren als eine abstrakte Beschreibung. Zeitliche Langsamkeitsanalyse l{\"o}st das Problem, indem sie zeitlich langsam ver{\"a}nderliche Signale aus schnell ver{\"a}nderlichen Eingabesignalen extrahiert. Eine Verallgemeinerung auf Signale, die nichtlinear von den Eingaben abh{\"a}ngen, ist durch die Anwendung des Kernel-Tricks m{\"o}glich. Das einzig benutzte Vorwissen ist die zeitliche Glattheit der gewonnenen Signale. In der vorliegenden Diplomarbeit wird Langsamkeitsanalyse auf Bildausschnitte von Videos einer Roboterkamera und einer Simulationsumgebung angewendet. Zuallererst werden mittels Parameterexploration und Kreuzvalidierung die langsamst m{\"o}glichen Funktionen bestimmt. Anschließend werden die Merkmalsfunktionen analysiert und einige Ansatzpunkte f{\"u}r ihre Interpretation angegeben. Aufgrund der sehr großen Datens{\"a}tze und der umfangreichen Berechnungen behandelt ein Großteil dieser Arbeit auch Aufwandsbetrachtungen und Fragen der effizienten Berechnung. Kantendetektoren in verschiedenen Phasen und mit haupts{\"a}chlich horizontaler Orientierung stellen die wichtigsten aus der Analyse hervorgehenden Funktionen dar. Eine Anwendung auf konkrete Navigationsaufgaben des Roboters konnte bisher nicht erreicht werden. Eine visuelle Interpretation der erlernten Merkmale ist jedoch durchaus gegeben.

PDF [BibTex]

PDF [BibTex]


no image
A Linear Programming Approach for Molecular QSAR analysis

Saigo, H., Kadowaki, T., Tsuda, K.

In MLG 2006, pages: 85-96, (Editors: Gärtner, T. , G. C. Garriga, T. Meinl), International Workshop on Mining and Learning with Graphs, September 2006, Best Paper Award (inproceedings)

Abstract
Small molecules in chemistry can be represented as graphs. In a quantitative structure-activity relationship (QSAR) analysis, the central task is to find a regression function that predicts the activity of the molecule in high accuracy. Setting a QSAR as a primal target, we propose a new linear programming approach to the graph-based regression problem. Our method extends the graph classification algorithm by Kudo et al. (NIPS 2004), which is a combination of boosting and graph mining. Instead of sequential multiplicative updates, we employ the linear programming boosting (LP) for regression. The LP approach allows to include inequality constraints for the parameter vector, which turns out to be particularly useful in QSAR tasks where activity values are sometimes unavailable. Furthermore, the efficiency is improved significantly by employing multiple pricing.

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Incremental Aspect Models for Mining Document Streams

Surendran, A., Sra, S.

In PKDD 2006, pages: 633-640, (Editors: Fürnkranz, J. , T. Scheffer, M. Spiliopoulou), Springer, Berlin, Germany, 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, September 2006 (inproceedings)

Abstract
In this paper we introduce a novel approach for incrementally building aspect models, and use it to dynamically discover underlying themes from document streams. Using the new approach we present an application which we call “query-line tracking” i.e., we automatically discover and summarize different themes or stories that appear over time, and that relate to a particular query. We present evaluation on news corpora to demonstrate the strength of our method for both query-line tracking, online indexing and clustering.

Web DOI [BibTex]

Web DOI [BibTex]


no image
Learning Eye Movements

Kienzle, W., Wichmann, F., Schölkopf, B., Franz, M.

Sensory Coding And The Natural Environment, 2006, pages: 1, September 2006 (poster)

Abstract
The human visual system samples images through saccadic eye movements which rapidly change the point of fixation. Although the selection of eye movement targets depends on numerous top-down mechanisms, a number of recent studies have shown that low-level image features such as local contrast or edges play an important role. These studies typically used predefined image features which were afterwards experimentally verified. Here, we follow a complementary approach: instead of testing a set of candidate image features, we infer these hypotheses from the data, using methods from statistical learning. To this end, we train a non-linear classifier on fixated vs. randomly selected image patches without making any physiological assumptions. The resulting classifier can be essentially characterized by a nonlinear combination of two center-surround receptive fields. We find that the prediction performance of this simple model on our eye movement data is indistinguishable from the physiologically motivated model of Itti & Koch (2000) which is far more complex. In particular, we obtain a comparable performance without using any multi-scale representations, long-range interactions or oriented image features.

Web [BibTex]

Web [BibTex]


no image
PALMA: Perfect Alignments using Large Margin Algorithms

Rätsch, G., Hepp, B., Schulze, U., Ong, C.

In GCB 2006, pages: 104-113, (Editors: Huson, D. , O. Kohlbacher, A. Lupas, K. Nieselt, A. Zell), Gesellschaft für Informatik, Bonn, Germany, German Conference on Bioinformatics, September 2006 (inproceedings)

Abstract
Despite many years of research on how to properly align sequences in the presence of sequencing errors, alternative splicing and micro-exons, the correct alignment of mRNA sequences to genomic DNA is still a challenging task. We present a novel approach based on large margin learning that combines kernel based splice site predictions with common sequence alignment techniques. By solving a convex optimization problem, our algorithm -- called PALMA -- tunes the parameters of the model such that the true alignment scores higher than all other alignments. In an experimental study on the alignments of mRNAs containing artificially generated micro-exons, we show that our algorithm drastically outperforms all other methods: It perfectly aligns all 4358 sequences on an hold-out set, while the best other method misaligns at least 90 of them. Moreover, our algorithm is very robust against noise in the query sequence: when deleting, inserting, or mutating up to 50% of the query sequence, it still aligns 95% of all sequences correctly, while other methods achieve less than 36% accuracy. For datasets, additional results and a stand-alone alignment tool see http://www.fml.mpg.de/raetsch/projects/palma.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Graph Based Semi-Supervised Learning with Sharper Edges

Shin, H., Hill, N., Rätsch, G.

In ECML 2006, pages: 401-412, (Editors: Fürnkranz, J. , T. Scheffer, M. Spiliopoulou), Springer, Berlin, Germany, 17th European Conference on Machine Learning (ECML), September 2006 (inproceedings)

Abstract
In many graph-based semi-supervised learning algorithms, edge weights are assumed to be fixed and determined by the data points‘ (often symmetric)relationships in input space, without considering directionality. However, relationships may be more informative in one direction (e.g. from labelled to unlabelled) than in the reverse direction, and some relationships (e.g. strong weights between oppositely labelled points) are unhelpful in either direction. Undesirable edges may reduce the amount of influence an informative point can propagate to its neighbours -- the point and its outgoing edges have been ``blunted.‘‘ We present an approach to ``sharpening‘‘ in which weights are adjusted to meet an optimization criterion wherever they are directed towards labelled points. This principle can be applied to a wide variety of algorithms. In the current paper, we present one ad hoc solution satisfying the principle, in order to show that it can improve performance on a number of publicly available benchmark data sets.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Robust MEG Source Localization of Event Related Potentials: Identifying Relevant Sources by Non-Gaussianity

Breun, P., Grosse-Wentrup, M., Utschick, W., Buss, M.

In DAGM 2006, pages: 394-403, (Editors: Franke, K. , K.-R. Müller, B. Nickolay, R. Schäfer), Springer, Berlin, Germany, 28th Annual Symposium of the German Association for Pattern Recognition, September 2006 (inproceedings)

Abstract
Independent Component Analysis (ICA) is a frequently used preprocessing step in source localization of MEG and EEG data. By decomposing the measured data into maximally independent components (ICs), estimates of the time course and the topographies of neural sources are obtained. In this paper, we show that when using estimated source topographies for localization, correlations between neural sources introduce an error into the obtained source locations. This error can be avoided by reprojecting ICs onto the observation space, but requires the identification of relevant ICs. For Event Related Potentials (ERPs), we identify relevant ICs by estimating their non-Gaussianity. The efficacy of the approach is tested on auditory evoked potentials (AEPs) recorded by MEG. It is shown that ten trials are sufficient for reconstructing all important characteristics of the AEP, and source localization of the reconstructed ERP yields the same focus of activity as the average of 250 trials.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Finite-Horizon Optimal State-Feedback Control of Nonlinear Stochastic Systems Based on a Minimum Principle

Deisenroth, MP., Ohtsuka, T., Weissel, F., Brunn, D., Hanebeck, UD.

In MFI 2006, pages: 371-376, (Editors: Hanebeck, U. D.), IEEE Service Center, Piscataway, NJ, USA, 6th IEEE International Conference on Multisensor Fusion and Integration, September 2006 (inproceedings)

Abstract
In this paper, an approach to the finite-horizon optimal state-feedback control problem of nonlinear, stochastic, discrete-time systems is presented. Starting from the dynamic programming equation, the value function will be approximated by means of Taylor series expansion up to second-order derivatives. Moreover, the problem will be reformulated, such that a minimum principle can be applied to the stochastic problem. Employing this minimum principle, the optimal control problem can be rewritten as a two-point boundary-value problem to be solved at each time step of a shrinking horizon. To avoid numerical problems, the two-point boundary-value problem will be solved by means of a continuation method. Thus, the curse of dimensionality of dynamic programming is avoided, and good candidates for the optimal state-feedback controls are obtained. The proposed approach will be evaluated by means of a scalar example system.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Uniform Convergence of Adaptive Graph-Based Regularization

Hein, M.

In COLT 2006, pages: 50-64, (Editors: Lugosi, G. , H.-U. Simon), Springer, Berlin, Germany, 19th Annual Conference on Learning Theory, September 2006 (inproceedings)

Abstract
The regularization functional induced by the graph Laplacian of a random neighborhood graph based on the data is adaptive in two ways. First it adapts to an underlying manifold structure and second to the density of the data-generating probability measure. We identify in this paper the limit of the regularizer and show uniform convergence over the space of Hoelder functions. As an intermediate step we derive upper bounds on the covering numbers of Hoelder functions on compact Riemannian manifolds, which are of independent interest for the theoretical analysis of manifold-based learning methods.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Efficient Large Scale Linear Programming Support Vector Machines

Sra, S.

In ECML 2006, pages: 767-774, (Editors: Fürnkranz, J. , T. Scheffer, M. Spiliopoulou), Springer, Berlin, Germany, 17th European Conference on Machine Learning, September 2006 (inproceedings)

Abstract
This paper presents a decomposition method for efficiently constructing ℓ1-norm Support Vector Machines (SVMs). The decomposition algorithm introduced in this paper possesses many desirable properties. For example, it is provably convergent, scales well to large datasets, is easy to implement, and can be extended to handle support vector regression and other SVM variants. We demonstrate the efficiency of our algorithm by training on (dense) synthetic datasets of sizes up to 20 million points (in ℝ32). The results show our algorithm to be several orders of magnitude faster than a previously published method for the same task. We also present experimental results on real data sets—our method is seen to be not only very fast, but also highly competitive against the leading SVM implementations.

Web DOI [BibTex]

Web DOI [BibTex]


no image
Regularised CSP for Sensor Selection in BCI

Farquhar, J., Hill, N., Lal, T., Schölkopf, B.

In Proceedings of the 3rd International Brain-Computer Interface Workshop and Training Course 2006, pages: 14-15, (Editors: GR Müller-Putz and C Brunner and R Leeb and R Scherer and A Schlögl and S Wriessnegger and G Pfurtscheller), Verlag der Technischen Universität Graz, Graz, Austria, 3rd International Brain-Computer Interface Workshop and Training Course, September 2006 (inproceedings)

Abstract
The Common Spatial Pattern (CSP) algorithm is a highly successful method for efficiently calculating spatial filters for brain signal classification. Spatial filtering can improve classification performance considerably, but demands that a large number of electrodes be mounted, which is inconvenient in day-to-day BCI usage. The CSP algorithm is also known for its tendency to overfit, i.e. to learn the noise in the training set rather than the signal. Both problems motivate an approach in which spatial filters are sparsified. We briefly sketch a reformulation of the problem which allows us to do this, using 1-norm regularisation. Focusing on the electrode selection issue, we present preliminary results on EEG data sets that suggest that effective spatial filters may be computed with as few as 10--20 electrodes, hence offering the potential to simplify the practical realisation of BCI systems significantly.

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Time-Dependent Demixing of Task-Relevant EEG Signals

Hill, N., Farquhar, J., Lal, T., Schölkopf, B.

In Proceedings of the 3rd International Brain-Computer Interface Workshop and Training Course 2006, pages: 20-21, (Editors: GR Müller-Putz and C Brunner and R Leeb and R Scherer and A Schlögl and S Wriessnegger and G Pfurtscheller), Verlag der Technischen Universität Graz, Graz, Austria, 3rd International Brain-Computer Interface Workshop and Training Course, September 2006 (inproceedings)

Abstract
Given a spatial filtering algorithm that has allowed us to identify task-relevant EEG sources, we present a simple approach for monitoring the activity of these sources while remaining relatively robust to changes in other (task-irrelevant) brain activity. The idea is to keep spatial *patterns* fixed rather than spatial filters, when transferring from training to test sessions or from one time window to another. We show that a fixed spatial pattern (FSP) approach, using a moving-window estimate of signal covariances, can be more robust to non-stationarity than a fixed spatial filter (FSF) approach.

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Transductive Gaussian Process Regression with Automatic Model Selection

Le, Q., Smola, A., Gärtner, T., Altun, Y.

In Machine Learning: ECML 2006, pages: 306-317, (Editors: Fürnkranz, J. , T. Scheffer, M. Spiliopoulou), Springer, Berlin, Germany, 17th European Conference on Machine Learning (ECML), September 2006 (inproceedings)

Abstract
n contrast to the standard inductive inference setting of predictive machine learning, in real world learning problems often the test instances are already available at training time. Transductive inference tries to improve the predictive accuracy of learning algorithms by making use of the information contained in these test instances. Although this description of transductive inference applies to predictive learning problems in general, most transductive approaches consider the case of classification only. In this paper we introduce a transductive variant of Gaussian process regression with automatic model selection, based on approximate moment matching between training and test data. Empirical results show the feasibility and competitiveness of this approach.

Web DOI [BibTex]

Web DOI [BibTex]


no image
A Sober Look at Clustering Stability

Ben-David, S., von Luxburg, U., Pal, D.

In COLT 2006, pages: 5-19, (Editors: Lugosi, G. , H.-U. Simon), Springer, Berlin, Germany, 19th Annual Conference on Learning Theory, September 2006 (inproceedings)

Abstract
Stability is a common tool to verify the validity of sample based algorithms. In clustering it is widely used to tune the parameters of the algorithm, such as the number k of clusters. In spite of the popularity of stability in practical applications, there has been very little theoretical analysis of this notion. In this paper we provide a formal definition of stability and analyze some of its basic properties. Quite surprisingly, the conclusion of our analysis is that for large sample size, stability is fully determined by the behavior of the objective function which the clustering algorithm is aiming to minimize. If the objective function has a unique global minimizer, the algorithm is stable, otherwise it is unstable. In particular we conclude that stability is not a well-suited tool to determine the number of clusters - it is determined by the symmetries of the data which may be unrelated to clustering parameters. We prove our results for center-based clusterings and for spectral clustering, and support our conclusions by many examples in which the behavior of stability is counter-intuitive.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Information Marginalization on Subgraphs

Huang, J., Zhu, T., Rereiner, R., Zhou, D., Schuurmans, D.

In ECML/PKDD 2006, pages: 199-210, (Editors: Fürnkranz, J. , T. Scheffer, M. Spiliopoulou), Springer, Berlin, Germany, 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, September 2006 (inproceedings)

Abstract
Real-world data often involves objects that exhibit multiple relationships; for example, ‘papers’ and ‘authors’ exhibit both paper-author interactions and paper-paper citation relationships. A typical learning problem requires one to make inferences about a subclass of objects (e.g. ‘papers’), while using the remaining objects and relations to provide relevant information. We present a simple, unified mechanism for incorporating information from multiple object types and relations when learning on a targeted subset. In this scheme, all sources of relevant information are marginalized onto the target subclass via random walks. We show that marginalized random walks can be used as a general technique for combining multiple sources of information in relational data. With this approach, we formulate new algorithms for transduction and ranking in relational data, and quantify the performance of new schemes on real world data—achieving good results in many problems.

Web DOI [BibTex]

Web DOI [BibTex]


no image
Bayesian Active Learning for Sensitivity Analysis

Pfingsten, T.

In ECML 2006, pages: 353-364, (Editors: Fürnkranz, J. , T. Scheffer, M. Spiliopoulou), Springer, Berlin, Germany, 17th European Conference on Machine Learning, September 2006 (inproceedings)

Abstract
Designs of micro electro-mechanical devices need to be robust against fluctuations in mass production. Computer experiments with tens of parameters are used to explore the behavior of the system, and to compute sensitivity measures as expectations over the input distribution. Monte Carlo methods are a simple approach to estimate these integrals, but they are infeasible when the models are computationally expensive. Using a Gaussian processes prior, expensive simulation runs can be saved. This Bayesian quadrature allows for an active selection of inputs where the simulation promises to be most valuable, and the number of simulation runs can be reduced further. We present an active learning scheme for sensitivity analysis which is rigorously derived from the corresponding Bayesian expected loss. On three fully featured, high dimensional physical models of electro-mechanical sensors, we show that the learning rate in the active learning scheme is significantly better than for passive learning.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
An Online-Computation Approach to Optimal Finite-Horizon State-Feedback Control of Nonlinear Stochastic Systems

Deisenroth, MP.

Biologische Kybernetik, Universität Karlsruhe (TH), Karlsruhe, Germany, August 2006 (diplomathesis)

PDF [BibTex]

PDF [BibTex]


no image
Semi-supervised Hyperspectral Image Classification with Graphs

Bandos, T., Zhou, D., Camps-Valls, G.

In IGARSS 2006, pages: 3883-3886, IEEE Computer Society, Los Alamitos, CA, USA, IEEE International Conference on Geoscience and Remote Sensing, August 2006 (inproceedings)

Abstract
This paper presents a semi-supervised graph-based method for the classification of hyperspectral images. The method is designed to exploit the spatial/contextual information in the images through composite kernels. The proposed method produces smoother classifications with respect to the intrinsic structure collectively revealed by known labeled and unlabeled points. Good accuracy in high dimensional spaces and low number of labeled samples (ill-posed situations) are produced as compared to standard inductive support vector machines.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Supervised Probabilistic Principal Component Analysis

Yu, S., Yu, K., Tresp, V., Kriegel, H., Wu, M.

In KDD 2006, pages: 464-473, (Editors: Ungar, L. ), ACM Press, New York, NY, USA, 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2006 (inproceedings)

Abstract
Principal component analysis (PCA) has been extensively applied in data mining, pattern recognition and information retrieval for unsupervised dimensionality reduction. When labels of data are available, e.g.,~in a classification or regression task, PCA is however not able to use this information. The problem is more interesting if only part of the input data are labeled, i.e.,~in a semi-supervised setting. In this paper we propose a supervised PCA model called SPPCA and a semi-supervised PCA model called S$^2$PPCA, both of which are extensions of a probabilistic PCA model. The proposed models are able to incorporate the label information into the projection phase, and can naturally handle multiple outputs (i.e.,~in multi-task learning problems). We derive an efficient EM learning algorithm for both models, and also provide theoretical justifications of the model behaviors. SPPCA and S$^2$PPCA are compared with other supervised projection methods on various learning tasks, and show not only promising performance but also good scalability.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]