Header logo is ei


2007


no image
Towards compliant humanoids: an experimental assessment of suitable task space position/orientation controllers

Nakanishi, J., Mistry, M., Peters, J., Schaal, S.

In IROS 2007, 2007, pages: 2520-2527, (Editors: Grant, E. , T. C. Henderson), IEEE Service Center, Piscataway, NJ, USA, IEEE/RSJ International Conference on Intelligent Robots and Systems, November 2007 (inproceedings)

Abstract
Compliant control will be a prerequisite for humanoid robotics if these robots are supposed to work safely and robustly in human and/or dynamic environments. One view of compliant control is that a robot should control a minimal number of degrees-of-freedom (DOFs) directly, i.e., those relevant DOFs for the task, and keep the remaining DOFs maximally compliant, usually in the null space of the task. This view naturally leads to task space control. However, surprisingly few implementations of task space control can be found in actual humanoid robots. This paper makes a first step towards assessing the usefulness of task space controllers for humanoids by investigating which choices of controllers are available and what inherent control characteristics they have—this treatment will concern position and orientation control, where the latter is based on a quaternion formulation. Empirical evaluations on an anthropomorphic Sarcos master arm illustrate the robustness of the different controllers as well as the eas e of implementing and tuning them. Our extensive empirical results demonstrate that simpler task space controllers, e.g., classical resolved motion rate control or resolved acceleration control can be quite advantageous in face of inevitable modeling errors in model-based control, and that well chosen formulations are easy to implement and quite robust, such that they are useful for humanoids.

PDF Web DOI [BibTex]

2007

PDF Web DOI [BibTex]


no image
MR-Based PET Attenuation Correction: Method and Validation

Hofmann, M., Steinke, F., Scheel, V., Charpiat, G., Brady, M., Schölkopf, B., Pichler, B.

2007 IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS-MIC 2007), 2007(M16-6):1-2, November 2007 (poster)

Abstract
PET/MR combines the high soft tissue contrast of Magnetic Resonance Imaging (MRI) and the functional information of Positron Emission Tomography (PET). For quantitative PET information, correction of tissue photon attenuation is mandatory. Usually in conventional PET, the attenuation map is obtained from a transmission scan, which uses a rotating source, or from the CT scan in case of combined PET/CT. In the case of a PET/MR scanner, there is insufficient space for the rotating source and ideally one would want to calculate the attenuation map from the MR image instead. Since MR images provide information about proton density of the different tissue types, it is not trivial to use this data for PET attenuation correction. We present a method for predicting the PET attenuation map from a given the MR image, using a combination of atlas-registration and recognition of local patterns. Using "leave one out cross validation" we show on a database of 16 MR-CT image pairs that our method reliably allows estimating the CT image from the MR image. Subsequently, as in PET/CT, the PET attenuation map can be predicted from the CT image. On an additional dataset of MR/CT/PET triplets we quantitatively validate that our approach allows PET quantification with an error that is smaller than what would be clinically significant. We demonstrate our approach on T1-weighted human brain scans. However, the presented methods are more general and current research focuses on applying the established methods to human whole body PET/MRI applications.

PDF PDF [BibTex]

PDF PDF [BibTex]


no image
Estimating receptive fields without spike-triggering

Macke, J., Zeck, G., Bethge, M.

37th annual Meeting of the Society for Neuroscience (Neuroscience 2007), 37(768.1):1, November 2007 (poster)

Web [BibTex]

Web [BibTex]


no image
Evaluation of Deformable Registration Methods for MR-CT Atlas Alignment

Scheel, V., Hofmann, M., Rehfeld, N., Judenhofer, M., Claussen, C., Pichler, B.

2007 IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS-MIC 2007), 2007(M13-121):1, November 2007 (poster)

Abstract
Deformable registration methods are essential for multimodality imaging. Many different methods exist but due to the complexity of the deformed images a direct comparison of the methods is difficult. One particular application that requires high accuracy registration of MR-CT images is atlas-based attenuation correction for PET/MR. We compare four deformable registration algorithms for 3D image data included in the Open Source "National Library of Medicine Insight Segmentation and Registration Toolkit" (ITK). An interactive landmark based registration using MiraView (Siemens) has been used as gold standard. The automatic algorithms provided by ITK are based on the metrics Mattes mutual information as well as on normalized mutual information. The transformations are calculated by interpolating over a uniform B-Spline grid laying over the image to be warped. The algorithms were tested on head images from 10 subjects. We implemented a measure which segments head interior bone and air based on the CT images and l ow intensity classes of corresponding MRI images. The segmentation of bone is performed by individually calculating the lowest Hounsfield unit threshold for each CT image. The compromise is made by quantifying the number of overlapping voxels of the remaining structures. We show that the algorithms provided by ITK achieve similar or better accuracy than the time-consuming interactive landmark based registration. Thus, ITK provides an ideal platform to generate accurately fused datasets from different modalities, required for example for building training datasets for Atlas-based attenuation correction.

PDF [BibTex]

PDF [BibTex]


no image
Some Theoretical Aspects of Human Categorization Behavior: Similarity and Generalization

Jäkel, F.

Biologische Kybernetik, Eberhard-Karls-Universität Tübingen, Tübingen, Germany, November 2007, passed with "ausgezeichnet", summa cum laude, published online (phdthesis)

PDF [BibTex]

PDF [BibTex]


no image
Statistical Learning Theory Approaches to Clustering

Jegelka, S.

Biologische Kybernetik, Eberhard-Karls-Universität Tübingen, Tübingen, Germany, November 2007 (diplomathesis)

PDF [BibTex]

PDF [BibTex]


no image
Performance Stabilization and Improvement in Graph-based Semi-supervised Learning with Ensemble Method and Graph Sharpening

Choi, I., Shin, H.

In Korean Data Mining Society Conference, pages: 257-262, Korean Data Mining Society, Seoul, Korea, Korean Data Mining Society Conference, November 2007 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
A time/frequency decomposition of information transmission by LFPs and spikes in the primary visual cortex

Belitski, A., Gretton, A., Magri, C., Murayama, Y., Montemurro, M., Logothetis, N., Panzeri, S.

37th Annual Meeting of the Society for Neuroscience (Neuroscience 2007), 37, pages: 1, November 2007 (poster)

Web [BibTex]

Web [BibTex]


no image
Mining expression-dependent modules in the human interaction network

Georgii, E., Dietmann, S., Uno, T., Pagel, P., Tsuda, K.

BMC Bioinformatics, 8(Suppl. 8):S4, November 2007 (poster)

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Discriminative Subsequence Mining for Action Classification

Nowozin, S., BakIr, G., Tsuda, K.

In ICCV 2007, pages: 1919-1923, IEEE Computer Society, Los Alamitos, CA, USA, 11th IEEE International Conference on Computer Vision, October 2007 (inproceedings)

Abstract
Recent approaches to action classification in videos have used sparse spatio-temporal words encoding local appearance around interesting movements. Most of these approaches use a histogram representation, discarding the temporal order among features. But this ordering information can contain important information about the action itself, e.g. consider the sport disciplines of hurdle race and long jump, where the global temporal order of motions (running, jumping) is important to discriminate between the two. In this work we propose to use a sequential representation which retains this temporal order. Further, we introduce Discriminative Subsequence Mining to find optimal discriminative subsequence patterns. In combination with the LPBoost classifier, this amounts to simultaneously learning a classification function and performing feature selection in the space of all possible feature sequences. The resulting classifier linearly combines a small number of interpretable decision functions, each checking for the presence of a single discriminative pattern. The classifier is benchmarked on the KTH action classification data set and outperforms the best known results in the literature.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
A Hilbert Space Embedding for Distributions

Smola, A., Gretton, A., Song, L., Schölkopf, B.

Proceedings of the 10th International Conference on Discovery Science (DS 2007), 10, pages: 40-41, October 2007 (poster)

Abstract
While kernel methods are the basis of many popular techniques in supervised learning, they are less commonly used in testing, estimation, and analysis of probability distributions, where information theoretic approaches rule the roost. However it becomes difficult to estimate mutual information or entropy if the data are high dimensional.

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
A Hilbert Space Embedding for Distributions

Smola, A., Gretton, A., Song, L., Schölkopf, B.

In Algorithmic Learning Theory, Lecture Notes in Computer Science 4754 , pages: 13-31, (Editors: M Hutter and RA Servedio and E Takimoto), Springer, Berlin, Germany, 18th International Conference on Algorithmic Learning Theory (ALT), October 2007 (inproceedings)

Abstract
We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel Hilbert space. Applications of this technique can be found in two-sample tests, which are used for determining whether two sets of observations arise from the same distribution, covariate shift correction, local learning, measures of independence, and density estimation.

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Cluster Identification in Nearest-Neighbor Graphs

Maier, M., Hein, M., von Luxburg, U.

In ALT 2007, pages: 196-210, (Editors: Hutter, M. , R. A. Servedio, E. Takimoto), Springer, Berlin, Germany, 18th International Conference on Algorithmic Learning Theory, October 2007 (inproceedings)

Abstract
Assume we are given a sample of points from some underlying distribution which contains several distinct clusters. Our goal is to construct a neighborhood graph on the sample points such that clusters are ``identified‘‘: that is, the subgraph induced by points from the same cluster is connected, while subgraphs corresponding to different clusters are not connected to each other. We derive bounds on the probability that cluster identification is successful, and use them to predict ``optimal‘‘ values of k for the mutual and symmetric k-nearest-neighbor graphs. We point out different properties of the mutual and symmetric nearest-neighbor graphs related to the cluster identification problem.

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Inducing Metric Violations in Human Similarity Judgements

Laub, J., Macke, J., Müller, K., Wichmann, F.

In Advances in Neural Information Processing Systems 19, pages: 777-784, (Editors: Schölkopf, B. , J. Platt, T. Hofmann), MIT Press, Cambridge, MA, USA, Twentieth Annual Conference on Neural Information Processing Systems (NIPS), September 2007 (inproceedings)

Abstract
Attempting to model human categorization and similarity judgements is both a very interesting but also an exceedingly difficult challenge. Some of the difficulty arises because of conflicting evidence whether human categorization and similarity judgements should or should not be modelled as to operate on a mental representation that is essentially metric. Intuitively, this has a strong appeal as it would allow (dis)similarity to be represented geometrically as distance in some internal space. Here we show how a single stimulus, carefully constructed in a psychophysical experiment, introduces l2 violations in what used to be an internal similarity space that could be adequately modelled as Euclidean. We term this one influential data point a conflictual judgement. We present an algorithm of how to analyse such data and how to identify the crucial point. Thus there may not be a strict dichotomy between either a metric or a non-metric internal space but rather degrees to which potentially large subsets of stimuli are represented metrically with a small subset causing a global violation of metricity.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Cross-Validation Optimization for Large Scale Hierarchical Classification Kernel Methods

Seeger, M.

In Advances in Neural Information Processing Systems 19, pages: 1233-1240, (Editors: Schölkopf, B. , J. Platt, T. Hofmann), MIT Press, Cambridge, MA, USA, Twentieth Annual Conference on Neural Information Processing Systems (NIPS), September 2007 (inproceedings)

Abstract
We propose a highly efficient framework for kernel multi-class models with a large and structured set of classes. Kernel parameters are learned automatically by maximizing the cross-validation log likelihood, and predictive probabilities are estimated. We demonstrate our approach on large scale text classification tasks with hierarchical class structure, achieving state-of-the-art results in an order of magnitude less time than previous work.

PDF Web [BibTex]

PDF Web [BibTex]


no image
A Local Learning Approach for Clustering

Wu, M., Schölkopf, B.

In Advances in Neural Information Processing Systems 19, pages: 1529-1536, (Editors: B Schölkopf and J Platt and T Hofmann), MIT Press, Cambridge, MA, USA, 20th Annual Conference on Neural Information Processing Systems (NIPS), September 2007 (inproceedings)

Abstract
We present a local learning approach for clustering. The basic idea is that a good clustering result should have the property that the cluster label of each data point can be well predicted based on its neighboring data and their cluster labels, using current supervised learning methods. An optimization problem is formulated such that its solution has the above property. Relaxation and eigen-decomposition are applied to solve this optimization problem. We also briefly investigate the parameter selection issue and provide a simple parameter selection method for the proposed algorithm. Experimental results are provided to validate the effectiveness of the proposed approach.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Branch and Bound for Semi-Supervised Support Vector Machines

Chapelle, O., Sindhwani, V., Keerthi, S.

In Advances in Neural Information Processing Systems 19, pages: 217-224, (Editors: Schölkopf, B. , J. Platt, T. Hofmann), MIT Press, Cambridge, MA, USA, Twentieth Annual Conference on Neural Information Processing Systems (NIPS), September 2007 (inproceedings)

Abstract
Semi-supervised SVMs (S3VMs) attempt to learn low-density separators by maximizing the margin over labeled and unlabeled examples. The associated optimization problem is non-convex. To examine the full potential of S3VMs modulo local minima problems in current implementations, we apply branch and bound techniques for obtaining exact, globally optimal solutions. Empirical evidence suggests that the globally optimal solution can return excellent generalization performance in situations where other implementations fail completely. While our current implementation is only applicable to small datasets, we discuss variants that can potentially lead to practically useful algorithms.

PDF Web [BibTex]

PDF Web [BibTex]


no image
A Kernel Method for the Two-Sample-Problem

Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B., Smola, A.

In Advances in Neural Information Processing Systems 19, pages: 513-520, (Editors: B Schölkopf and J Platt and T Hofmann), MIT Press, Cambridge, MA, USA, 20th Annual Conference on Neural Information Processing Systems (NIPS), September 2007 (inproceedings)

Abstract
We propose two statistical tests to determine if two samples are from different distributions. Our test statistic is in both cases the distance between the means of the two samples mapped into a reproducing kernel Hilbert space (RKHS). The first test is based on a large deviation bound for the test statistic, while the second is based on the asymptotic distribution of this statistic. The test statistic can be computed in $O(m^2)$ time. We apply our approach to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where our test performs strongly. We also demonstrate excellent performance when comparing distributions over graphs, for which no alternative tests currently exist.

PDF Web [BibTex]

PDF Web [BibTex]


no image
An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models

Keerthi, S., Sindhwani, V., Chapelle, O.

In Advances in Neural Information Processing Systems 19, pages: 673-680, (Editors: Schölkopf, B. , J. Platt, T. Hofmann), MIT Press, Cambridge, MA, USA, Twentieth Annual Conference on Neural Information Processing Systems (NIPS), September 2007 (inproceedings)

Abstract
We consider the task of tuning hyperparameters in SVM models based on minimizing a smooth performance validation function, e.g., smoothed k-fold cross-validation error, using non-linear optimization techniques. The key computation in this approach is that of the gradient of the validation function with respect to hyperparameters. We show that for large-scale problems involving a wide choice of kernel-based models and validation functions, this computation can be very efficiently done; often within just a fraction of the training time. Empirical results show that a near-optimal set of hyperparameters can be identified by our approach with very few training rounds and gradient computations.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Learning Dense 3D Correspondence

Steinke, F., Schölkopf, B., Blanz, V.

In Advances in Neural Information Processing Systems 19, pages: 1313-1320, (Editors: B Schölkopf and J Platt and T Hofmann), MIT Press, Cambridge, MA, USA, 20th Annual Conference on Neural Information Processing Systems (NIPS), September 2007 (inproceedings)

Abstract
Establishing correspondence between distinct objects is an important and nontrivial task: correctness of the correspondence hinges on properties which are difficult to capture in an a priori criterion. While previous work has used a priori criteria which in some cases led to very good results, the present paper explores whether it is possible to learn a combination of features that, for a given training set of aligned human heads, characterizes the notion of correct correspondence. By optimizing this criterion, we are then able to compute correspondence and morphs for novel heads.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Optimal Dominant Motion Estimation using Adaptive Search of Transformation Space

Ulges, A., Lampert, CH., Keysers, D., Breuel, TM.

In DAGM 2007, pages: 204-215, (Editors: Hamprecht, F. A., C. Schnörr, B. Jähne), Springer, Berlin, Germany, 29th Annual Symposium of the German Association for Pattern Recognition, September 2007 (inproceedings)

Abstract
The extraction of a parametric global motion from a motion field is a task with several applications in video processing. We present two probabilistic formulations of the problem and carry out optimization using the RAST algorithm, a geometric matching method novel to motion estimation in video. RAST uses an exhaustive and adaptive search of transformation space and thus gives -- in contrast to local sampling optimization techniques used in the past -- a globally optimal solution. Among other applications, our framework can thus be used as a source of ground truth for benchmarking motion estimation algorithms. Our main contributions are: first, the novel combination of a state-of- the-art MAP criterion for dominant motion estimation with a search procedure that guarantees global optimality. Second, experimental re- sults that illustrate the superior performance of our approach on synthetic flow fields as well as real-world video streams. Third, a significant speedup of the search achieved by extending the mod el with an additional smoothness prior.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Studying the effects of noise correlations on population coding using a sampling method

Ecker, A., Berens, P., Bethge, M., Logothetis, N., Tolias, A.

Neural Coding, Computation and Dynamics (NCCD 07), 1, pages: 21, September 2007 (poster)

PDF [BibTex]

PDF [BibTex]


no image
Output Grouping using Dirichlet Mixtures of Linear Gaussian State-Space Models

Chiappa, S., Barber, D.

In ISPA 2007, pages: 446-451, IEEE Computer Society, Los Alamitos, CA, USA, 5th International Symposium on Image and Signal Processing and Analysis, September 2007 (inproceedings)

Abstract
We consider a model to cluster the components of a vector time-series. The task is to assign each component of the vector time-series to a single cluster, basing this assignment on the simultaneous dynamical similarity of the component to other components in the cluster. This is in contrast to the more familiar task of clustering a set of time-series based on global measures of their similarity. The model is based on a Dirichlet Mixture of Linear Gaussian State-Space models (LGSSMs), in which each LGSSM is treated with a prior to encourage the simplest explanation. The resulting model is approximated using a ‘collapsed’ variational Bayes implementation.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Manifold Denoising

Hein, M., Maier, M.

In Advances in Neural Information Processing Systems 19, pages: 561-568, (Editors: Schölkopf, B. , J. Platt, T. Hofmann), MIT Press, Cambridge, MA, USA, Twentieth Annual Conference on Neural Information Processing Systems (NIPS), September 2007 (inproceedings)

Abstract
We consider the problem of denoising a noisily sampled submanifold $M$ in $R^d$, where the submanifold $M$ is a priori unknown and we are only given a noisy point sample. The presented denoising algorithm is based on a graph-based diffusion process of the point sample. We analyze this diffusion process using recent results about the convergence of graph Laplacians. In the experiments we show that our method is capable of dealing with non-trivial high-dimensional noise. Moreover using the denoising algorithm as pre-processing method we can improve the results of a semi-supervised learning algorithm.

PDF Web [BibTex]

PDF Web [BibTex]


no image
How to Find Interesting Locations in Video: A Spatiotemporal Interest Point Detector Learned from Human Eye movements

Kienzle, W., Schölkopf, B., Wichmann, F., Franz, M.

In Pattern Recognition, pages: 405-414, (Editors: FA Hamprecht and C Schnörr and B Jähne), Springer, Berlin, Germany, 29th Annual Symposium of the German Association for Pattern Recognition (DAGM), September 2007 (inproceedings)

Abstract
Interest point detection in still images is a well-studied topic in computer vision. In the spatiotemporal domain, however, it is still unclear which features indicate useful interest points. In this paper we approach the problem by emph{learning} a detector from examples: we record eye movements of human subjects watching video sequences and train a neural network to predict which locations are likely to become eye movement targets. We show that our detector outperforms current spatiotemporal interest point architectures on a standard classification dataset.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Bayesian Inference for Sparse Generalized Linear Models

Seeger, M., Gerwinn, S., Bethge, M.

In ECML 2007, pages: 298-309, Lecture Notes in Computer Science ; 4701, (Editors: Kok, J. N., J. Koronacki, R. Lopez de Mantaras, S. Matwin, D. Mladenic, A. Skowron), Springer, Berlin, Germany, 18th European Conference on Machine Learning, September 2007 (inproceedings)

Abstract
We present a framework for efficient, accurate approximate Bayesian inference in generalized linear models (GLMs), based on the expectation propagation (EP) technique. The parameters can be endowed with a factorizing prior distribution, encoding properties such as sparsity or non-negativity. The central role of posterior log-concavity in Bayesian GLMs is emphasized and related to stability issues in EP. In particular, we use our technique to infer the parameters of a point process model for neuronal spiking data from multiple electrodes, demonstrating significantly superior predictive performance when a sparsity assumption is enforced via a Laplace prior distribution.

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Implicit Surfaces with Globally Regularised and Compactly Supported Basis Functions

Walder, C., Schölkopf, B., Chapelle, O.

In Advances in Neural Information Processing Systems 19, pages: 273-280, (Editors: B Schölkopf and J Platt and T Hofmann), MIT Press, Cambridge, MA, USA, 20th Annual Conference on Neural Information Processing Systems (NIPS), September 2007 (inproceedings)

Abstract
We consider the problem of constructing a function whose zero set is to represent a surface, given sample points with surface normal vectors. The contributions include a novel means of regularising multi-scale compactly supported basis functions that leads to the desirable properties previously only associated with fully supported bases, and show equivalence to a Gaussian process with modified covariance function. We also provide a regularisation framework for simpler and more direct treatment of surface normals, along with a corresponding generalisation of the representer theorem. We demonstrate the techniques on 3D problems of up to 14 million data points, as well as 4D time series data.

PDF Web [BibTex]

PDF Web [BibTex]


no image
A Nonparametric Approach to Bottom-Up Visual Saliency

Kienzle, W., Wichmann, F., Schölkopf, B., Franz, M.

In Advances in Neural Information Processing Systems 19, pages: 689-696, (Editors: B Schölkopf and J Platt and T Hofmann), MIT Press, Cambridge, MA, USA, 20th Annual Conference on Neural Information Processing Systems (NIPS), September 2007 (inproceedings)

Abstract
This paper addresses the bottom-up influence of local image information on human eye movements. Most existing computational models use a set of biologically plausible linear filters, e.g., Gabor or Difference-of-Gaussians filters as a front-end, the outputs of which are nonlinearly combined into a real number that indicates visual saliency. Unfortunately, this requires many design parameters such as the number, type, and size of the front-end filters, as well as the choice of nonlinearities, weighting and normalization schemes etc., for which biological plausibility cannot always be justified. As a result, these parameters have to be chosen in a more or less ad hoc way. Here, we propose to emph{learn} a visual saliency model directly from human eye movement data. The model is rather simplistic and essentially parameter-free, and therefore contrasts recent developments in the field that usually aim at higher prediction rates at the cost of additional parameters and increasing model complexity. Experimental results show that - despite the lack of any biological prior knowledge - our model performs comparably to existing approaches, and in fact learns image features that resemble findings from several previous studies. In particular, its maximally excitatory stimuli have center-surround structure, similar to receptive fields in the early human visual system.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Information Bottleneck for Non Co-Occurrence Data

Seldin, Y., Slonim, N., Tishby, N.

In Advances in Neural Information Processing Systems 19, pages: 1241-1248, (Editors: Schölkopf, B. , J. Platt, T. Hofmann), MIT Press, Cambridge, MA, USA, Twentieth Annual Conference on Neural Information Processing Systems (NIPS), September 2007 (inproceedings)

Abstract
We present a general model-independent approach to the analysis of data in cases when these data do not appear in the form of co-occurrence of two variables X, Y, but rather as a sample of values of an unknown (stochastic) function Z(X,Y). For example, in gene expression data, the expression level Z is a function of gene X and condition Y; or in movie ratings data the rating Z is a function of viewer X and movie Y . The approach represents a consistent extension of the Information Bottleneck method that has previously relied on the availability of co-occurrence statistics. By altering the relevance variable we eliminate the need in the sample of joint distribution of all input variables. This new formulation also enables simple MDL-like model complexity control and prediction of missing values of Z. The approach is analyzed and shown to be on a par with the best known clustering algorithms for a wide range of domains. For the prediction of missing values (collaborative filtering) it improves the currently best known results.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Learning with Hypergraphs: Clustering, Classification, and Embedding

Zhou, D., Huang, J., Schölkopf, B.

In Advances in Neural Information Processing Systems 19, pages: 1601-1608, (Editors: B Schölkopf and J Platt and T Hofmann), MIT Press, Cambridge, MA, USA, 20th Annual Conference on Neural Information Processing Systems (NIPS), September 2007 (inproceedings)

Abstract
We usually endow the investigated objects with pairwise relationships, which can be illustrated as graphs. In many real-world problems, however, relationships among the objects of our interest are more complex than pairwise. Naively squeezing the complex relationships into pairwise ones will inevitably lead to loss of information which can be expected valuable for our learning tasks however. Therefore we consider using hypergraphs instead to completely represent complex relationships among the objects of our interest, and thus the problem of learning with hypergraphs arises. Our main contribution in this paper is to generalize the powerful methodology of spectral clustering which originally operates on undirected graphs to hypergraphs, and further develop algorithms for hypergraph embedding and transductive classi¯cation on the basis of the spectral hypergraph clustering approach. Our experiments on a number of benchmarks showed the advantages of hypergraphs over usual graphs.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Correcting Sample Selection Bias by Unlabeled Data

Huang, J., Smola, A., Gretton, A., Borgwardt, K., Schölkopf, B.

In Advances in Neural Information Processing Systems 19, pages: 601-608, (Editors: B Schölkopf and J Platt and T Hofmann), MIT Press, Cambridge, MA, USA, 20th Annual Conference on Neural Information Processing Systems (NIPS), September 2007 (inproceedings)

Abstract
We consider the scenario where training and test data are drawn from different distributions, commonly referred to as sample selection bias. Most algorithms for this setting try to first recover sampling distributions and then make appropriate corrections based on the distribution estimate. We present a nonparametric method which directly produces resampling weights without distribution estimation. Our method works by matching distributions between training and testing sets in feature space. Experimental results demonstrate that our method works well in practice.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Near-Maximum Entropy Models for Binary Neural Representations of Natural Images

Berens, P., Bethge, M.

Neural Coding, Computation and Dynamics (NCCD 07), 1, pages: 19, September 2007 (poster)

Abstract
Maximum entropy analysis of binary variables provides an elegant way for studying the role of pairwise correlations in neural populations. Unfortunately, these approaches suffer from their poor scalability to high dimensions. In sensory coding, however, high-dimensional data is ubiquitous. Here, we introduce a new approach using a near-maximum entropy model, that makes this type of analysis feasible for very high-dimensional data---the model parameters can be derived in closed form and sampling is easy. We demonstrate its usefulness by studying a simple neural representation model of natural images. For the first time, we are able to directly compare predictions from a pairwise maximum entropy model not only in small groups of neurons, but also in larger populations of more than thousand units. Our results indicate that in such larger networks interactions exist that are not predicted by pairwise correlations, despite the fact that pairwise correlations explain the lower-dimensional marginal statistics extrem ely well up to the limit of dimensionality where estimation of the full joint distribution is feasible.

PDF [BibTex]

PDF [BibTex]


no image
Collaborative Filtering via Ensembles of Matrix Factorizations

Wu, M.

In KDD Cup and Workshop 2007, pages: 43-47, KDD Cup and Workshop, August 2007 (inproceedings)

Abstract
We present a Matrix Factorization(MF) based approach for the Netflix Prize competition. Currently MF based algorithms are popular and have proved successful for collaborative filtering tasks. For the Netflix Prize competition, we adopt three different types of MF algorithms: regularized MF, maximum margin MF and non-negative MF. Furthermore, for each MF algorithm, instead of selecting the optimal parameters, we combine the results obtained with several parameters. With this method, we achieve a performance that is more than 6% better than the Netflix‘s own system.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Online-Computation Approach to Optimal Control of Noise-Affected Nonlinear Systems with Continuous State and Control Spaces

Deisenroth, MP., Weissel, F., Ohtsuka, T., Hanebeck, UD.

In ECC‘07, pages: 3664-3671, 9th European Control Conference, July 2007 (inproceedings)

Abstract
A novel online-computation approach to optimal control of nonlinear, noise-affected systems with continuous state and control spaces is presented. In the proposed algorithm, system noise is explicitly incorporated into the control decision. This leads to superior results compared to state-of-the-art nonlinear controllers that neglect this influence. The solution of an optimal nonlinear controller for a corresponding deterministic system is employed to find a meaningful state space restriction. This restriction is obtained by means of approximate state prediction using the noisy system equation. Within this constrained state space, an optimal closed-loop solution for a finite decision-making horizon (prediction horizon) is determined within an adaptively restricted optimization space. Interleaving stochastic dynamic programming and value function approximation yields a solution to the considered optimal control problem. The enhanced performance of the proposed discrete-time controller is illustrated by means o f a scalar example system. Nonlinear model predictive control is applied to address approximate treatment of infinite-horizon problems by the finite-horizon controller.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Error Correcting Codes for the P300 Visual Speller

Biessmann, F.

Biologische Kybernetik, Eberhard-Karls-Universität Tübingen, Tübingen, Germany, July 2007 (diplomathesis)

Abstract
The aim of brain-computer interface (BCI) research is to establish a communication system based on intentional modulation of brain activity. This is accomplished by classifying patterns of brain ac- tivity, volitionally induced by the user. The BCI presented in this study is based on a classical paradigm as proposed by (Farwell and Donchin, 1988), the P300 visual speller. Recording electroencephalo- grams (EEG) from the scalp while presenting letters successively to the user, the speller can infer from the brain signal which letter the user was focussing on. Since EEG recordings are noisy, usually many repetitions are needed to detect the correct letter. The focus of this study was to improve the accuracy of the visual speller applying some basic principles from information theory: Stimulus sequences of the speller have been modified into error-correcting codes. Additionally a language model was incorporated into the probabilistic letter de- coder. Classification of single EEG epochs was less accurate using error correcting codes. However, the novel code could compensate for that such that overall, letter accuracies were as high as or even higher than for classical stimulus codes. In particular at high noise levels, error-correcting decoding achieved higher letter accuracies.

PDF [BibTex]

PDF [BibTex]


no image
A Kernel Approach to Comparing Distributions

Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B., Smola, A.

In Proceedings of the 22. AAAI Conference on Artificial Intelligence, pages: 1637-1641, AAAI Press, Menlo Park, CA, USA, Twenty-Second AAAI Conference on Artificial Intelligence (AAAI), July 2007 (inproceedings)

Abstract
We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a Reproducing Kernel Hilbert Space. We apply this technique to construct a two-sample test, which is used for determining whether two sets of observations arise from the same distribution. We use this test in attribute matching for databases using the Hungarian marriage method, where it performs strongly. We also demonstrate excellent performance when comparing distributions over graphs, for which no alternative tests currently exist.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Manifold Denoising as Preprocessing for Finding Natural Representations of Data

Hein, M., Maier, M.

In AAAI-07, pages: 1646-1649, AAAI Press, Menlo Park, CA, USA, Twenty-Second AAAI Conference on Artificial Intelligence (AAAI-07), July 2007 (inproceedings)

Abstract
A natural representation of data are the parameters which generated the data. If the parameter space is continuous we can regard it as a manifold. In practice we usually do not know this manifold but we just have some representation of the data, often in a very high-dimensional feature space. Since the number of internal parameters does not change with the representation, the data will effectively lie on a low-dimensional submanifold in feature space. Due to measurement errors this data is usually corrupted by noise which particularly in high-dimensional feature spaces makes it almost impossible to find the manifold structure. This paper reviews a method called Manifold Denoising which projects the data onto the submanifold using a diffusion process on a graph generated by the data. We will demonstrate that the method is capable of dealing with non-trival high-dimensional noise. Moreover we will show that using the method as a preprocessing step one can significantly improve the results of a semi-supervised learning algorithm.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Learning the Influence of Spatio-Temporal Variations in Local Image Structure on Visual Saliency

Kienzle, W., Wichmann, F., Schölkopf, B., Franz, M.

10th T{\"u}binger Wahrnehmungskonferenz (TWK 2007), 10, pages: 1, July 2007 (poster)

Abstract
Computational models for bottom-up visual attention traditionally consist of a bank of Gabor-like or Difference-of-Gaussians filters and a nonlinear combination scheme which combines the filter responses into a real-valued saliency measure [1]. Recently it was shown that a standard machine learning algorithm can be used to derive a saliency model from human eye movement data with a very small number of additional assumptions. The learned model is much simpler than previous models, but nevertheless has state-of-the-art prediction performance [2]. A central result from this study is that DoG-like center-surround filters emerge as the unique solution to optimizing the predictivity of the model. Here we extend the learning method to the temporal domain. While the previous model [2] predicts visual saliency based on local pixel intensities in a static image, our model also takes into account temporal intensity variations. We find that the learned model responds strongly to temporal intensity changes ocurring 200-250ms before a saccade is initiated. This delay coincides with the typical saccadic latencies, indicating that the learning algorithm has extracted a meaningful statistic from the training data. In addition, we show that the model correctly predicts a significant proportion of human eye movements on previously unseen test data.

Web [BibTex]

Web [BibTex]


no image
Supervised Feature Selection via Dependence Estimation

Song, L., Smola, A., Gretton, A., Borgwardt, K., Bedo, J.

In Proceedings of the 24th Annual International Conference on Machine Learning (ICML 2007), pages: 823-830, (Editors: Ghahramani, Z. ), ACM Press, New York, NY, USA, Twenty-Fourth Annual International Conference on Machine Learning (ICML), June 2007 (inproceedings)

Abstract
We introduce a framework for filtering features that employs the Hilbert-Schmidt Independence Criterion (HSIC) as a measure of dependence between the features and the labels. The key idea is that good features should maximise such dependence. Feature selection for various supervised learning problems (including classification and regression) is unified under this framework, and the solutions can be approximated using a backward-elimination algorithm. We demonstrate the usefulness of our method on both artificial and real world datasets.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
A Kernel-Based Causal Learning Algorithm

Sun, X., Janzing, D., Schölkopf, B., Fukumizu, K.

In Proceedings of the 24th International Conference on Machine Learning, pages: 855-862, (Editors: Z Ghahramani), ACM Press, New York, NY, USA, ICML, June 2007 (inproceedings)

Abstract
We describe a causal learning method, which employs measuring the strength of statistical dependences in terms of the Hilbert-Schmidt norm of kernel-based cross-covariance operators. Following the line of the common faithfulness assumption of constraint-based causal learning, our approach assumes that a variable Z is likely to be a common effect of X and Y, if conditioning on Z increases the dependence between X and Y. Based on this assumption, we collect "votes" for hypothetical causal directions and orient the edges by the majority principle. In most experiments with known causal structures, our method provided plausible results and outperformed the conventional constraint-based PC algorithm.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Entire Regularization Paths for Graph Data

Tsuda, K.

In ICML 2007, pages: 919-926, (Editors: Ghahramani, Z. ), ACM Press, New York, NY, USA, 24th Annual International Conference on Machine Learning, June 2007 (inproceedings)

Abstract
Graph data such as chemical compounds and XML documents are getting more common in many application domains. A main difficulty of graph data processing lies in the intrinsic high dimensionality of graphs, namely, when a graph is represented as a binary feature vector of indicators of all possible subgraph patterns, the dimensionality gets too large for usual statistical methods. We propose an efficient method to select a small number of salient patterns by regularization path tracking. The generation of useless patterns is minimized by progressive extension of the search space. In experiments, it is shown that our technique is considerably more efficient than a simpler approach based on frequent substructure mining.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Weighted Substructure Mining for Image Analysis

Nowozin, S., Tsuda, K., Uno, T., Kudo, T., BakIr, G.

In CVPR 2007, pages: 1-8, IEEE Computer Society, Los Alamitos, CA, USA, 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2007 (inproceedings)

Abstract
In web-related applications of image categorization, it is desirable to derive an interpretable classification rule with high accuracy. Using the bag-of-words representation and the linear support vector machine, one can partly fulfill the goal, but the accuracy of linear classifiers is not high and the obtained features are not informative for users. We propose to combine item set mining and large margin classifiers to select features from the power set of all visual words. Our resulting classification rule is easier to browse and simpler to understand, because each feature has richer information. As a next step, each image is represented as a graph where nodes correspond to local image features and edges encode geometric relations between features. Combining graph mining and boosting, we can obtain a classification rule based on subgraph features that contain more information than the set features. We evaluate our algorithm in a web-retrieval ranking task where the goal is to reject outliers from a set of images returned for a keyword query. Furthermore, it is evaluated on the supervised classification tasks with the challenging VOC2005 data set. Our approach yields excellent accuracy in the unsupervised ranking task compared to a recently proposed probabilistic model and competitive results in the supervised classification task.

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Local Learning Projections

Wu, M., Yu, K., Yu, S., Schölkopf, B.

In Proceedings of the 24th International Conference on Machine Learning, pages: 1039-1046, (Editors: Z Ghahramani), ACM Press, New York, NY, USA, ICML, June 2007 (inproceedings)

Abstract
This paper presents a Local Learning Projection (LLP) approach for linear dimensionality reduction. We first point out that the well known Principal Component Analysis (PCA) essentially seeks the projection that has the minimal global estimation error. Then we propose a dimensionality reduction algorithm that leads to the projection with the minimal local estimation error, and elucidate its advantages for classification tasks. We also indicate that LLP keeps the local information in the sense that the projection value of each point can be well estimated based on its neighbors and their projection values. Experimental results are provided to validate the effectiveness of the proposed algorithm.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Training and Approximation of a Primal Multiclass Support Vector Machine

Zien, A., Bona, F., Ong, C.

In ASMDA 2007, pages: 1-8, (Editors: Skiadas, C. H.), 12th International Conference on Applied Stochastic Models and Data Analysis, June 2007 (inproceedings)

Abstract
We revisit the multiclass support vector machine (SVM) and generalize the formulation to convex loss functions and joint feature maps. Motivated by recent work [Chapelle, 2006] we use logistic loss and softmax to enable gradient based primal optimization. Kernels are incorporated via kernel principal component analysis (KPCA), which naturally leads to approximation methods for large scale problems. We investigate similarities and differences to previous multiclass SVM approaches. Experimental comparisons to previous approaches and to the popular one-vs-rest SVM are presented on several different datasets.

PDF PostScript Web [BibTex]

PDF PostScript Web [BibTex]


no image
Information-theoretic Metric Learning

Davis, J., Kulis, B., Jain, P., Sra, S., Dhillon, I.

In ICML 2007, pages: 209-216, (Editors: Ghahramani, Z. ), ACM Press, New York, NY, USA, 24th Annual International Conference on Machine Learning, June 2007 (inproceedings)

Abstract
In this paper, we present an information-theoretic approach to learning a Mahalanobis distance function. We formulate the problem as that of minimizing the differential relative entropy between two multivariate Gaussians under constraints on the distance function. We express this problem as a particular Bregman optimization problem---that of minimizing the LogDet divergence subject to linear constraints. Our resulting algorithm has several advantages over existing methods. First, our method can handle a wide variety of constraints and can optionally incorporate a prior on the distance function. Second, it is fast and scalable. Unlike most existing methods, no eigenvalue computations or semi-definite programming are required. We also present an online version and derive regret bounds for the resulting algorithm. Finally, we evaluate our method on a recent error reporting system for software called Clarify, in the context of metric learning for nearest neighbor classification, as well as on standard data sets.

Web DOI [BibTex]

Web DOI [BibTex]


no image
A Dependence Maximization View of Clustering

Song, L., Smola, A., Gretton, A., Borgwardt, K.

In Proceedings of the 24th Annual International Conference on Machine Learning (ICML 2007), pages: 815-822, (Editors: Ghahramani, Z. ), ACM Press, New York, NY, USA, Twenty-Fourth Annual International Conference on Machine Learning (ICML), June 2007 (inproceedings)

Abstract
We propose a family of clustering algorithms based on the maximization of dependence between the input variables and their cluster labels, as expressed by the Hilbert-Schmidt Independence Criterion (HSIC). Under this framework, we unify the geometric, spectral, and statistical dependence views of clustering, and subsume many existing algorithms as special cases (e.g. k-means and spectral clustering). Distinctive to our framework is that kernels can also be applied on the labels, which can endow them with particular structures. We also obtain a perturbation bound on the change in k-means clustering.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Multiclass Multiple Kernel Learning

Zien, A., Ong, C.

In ICML 2007, pages: 1191-1198, (Editors: Ghahramani, Z. ), ACM Press, New York, NY, USA, 24th International Conference on Machine Learning, June 2007 (inproceedings)

Abstract
In many applications it is desirable to learn from several kernels. “Multiple kernel learning” (MKL) allows the practitioner to optimize over linear combinations of kernels. By enforcing sparse coefficients, it also generalizes feature selection to kernel selection. We propose MKL for joint feature maps. This provides a convenient and principled way for MKL with multiclass problems. In addition, we can exploit the joint feature map to learn kernels on output spaces. We show the equivalence of several different primal formulations including different regularizers. We present several optimization methods, and compare a convex quadratically constrained quadratic program (QCQP) and two semi-infinite linear programs (SILPs) toy data, showing that the SILPs are faster than the QCQP. We then demonstrate the utility of our method by applying the SILP to three real world datasets.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Transductive Support Vector Machines for Structured Variables

Zien, A., Brefeld, U., Scheffer, T.

In ICML 2007, pages: 1183-1190, (Editors: Ghahramani, Z. ), ACM Press, New York, NY, USA, 24th International Conference on Machine Learning, June 2007 (inproceedings)

Abstract
We study the problem of learning kernel machines transductively for structured output variables. Transductive learning can be reduced to combinatorial optimization problems over all possible labelings of the unlabeled data. In order to scale transductive learning to structured variables, we transform the corresponding non-convex, combinatorial, constrained optimization problems into continuous, unconstrained optimization problems. The discrete optimization parameters are eliminated and the resulting differentiable problems can be optimized efficiently. We study the effectiveness of the generalized TSVM on multiclass classification and label-sequence learning problems empirically.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Change-Point Detection using Krylov Subspace Learning

Ide, T., Tsuda, K.

In SDM 2007, pages: 515-520, (Editors: Apte, C. ), Society for Industrial and Applied Mathematics, Pittsburgh, PA, USA, SIAM International Conference on Data Mining, April 2007 (inproceedings)

Abstract
We propose an efficient algorithm for principal component analysis (PCA) that is applicable when only the inner product with a given vector is needed. We show that Krylov subspace learning works well both in matrix compression and implicit calculation of the inner product by taking full advantage of the arbitrariness of the seed vector. We apply our algorithm to a PCA-based change-point detection algorithm, and show that it results in about 50 times improvement in computational time.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Nonparametric Bayesian Discrete Latent Variable Models for Unsupervised Learning

Görür, D.

Biologische Kybernetik, Technische Universität Berlin, Berlin, Germany, April 2007, published online (phdthesis)

PDF PDF [BibTex]

PDF PDF [BibTex]