Header logo is ei


2006


no image
Inference with the Universum

Weston, J., Collobert, R., Sinz, F., Bottou, L., Vapnik, V.

In ICML 2006, pages: 1009-1016, (Editors: Cohen, W. W., A. Moore), ACM Press, New York, NY, USA, 23rd International Conference on Machine Learning, June 2006 (inproceedings)

Abstract
WIn this paper we study a new framework introduced by Vapnik (1998) and Vapnik (2006) that is an alternative capacity concept to the large margin approach. In the particular case of binary classification, we are given a set of labeled examples, and a collection of "non-examples" that do not belong to either class of interest. This collection, called the Universum, allows one to encode prior knowledge by representing meaningful concepts in the same domain as the problem at hand. We describe an algorithm to leverage the Universum by maximizing the number of observed contradictions, and show experimentally that this approach delivers accuracy improvements over using labeled data alone.

PDF Web DOI [BibTex]

2006

PDF Web DOI [BibTex]


no image
Statistical Convergence of Kernel CCA

Fukumizu, K., Bach, F., Gretton, A.

In Advances in neural information processing systems 18, pages: 387-394, (Editors: Weiss, Y. , B. Schölkopf, J. Platt), MIT Press, Cambridge, MA, USA, Nineteenth Annual Conference on Neural Information Processing Systems (NIPS), May 2006 (inproceedings)

Abstract
While kernel canonical correlation analysis (kernel CCA) has been applied in many problems, the asymptotic convergence of the functions estimated from a finite sample to the true functions has not yet been established. This paper gives a rigorous proof of the statistical convergence of kernel CCA and a related method (NOCCO), which provides a theoretical justification for these methods. The result also gives a sufficient condition on the decay of the regularization coefficient in the methods to ensure convergence.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Maximum Margin Semi-Supervised Learning for Structured Variables

Altun, Y., McAllester, D., Belkin, M.

In Advances in neural information processing systems 18, pages: 33-40, (Editors: Weiss, Y. , B. Schölkopf, J. Platt), MIT Press, Cambridge, MA, USA, Nineteenth Annual Conference on Neural Information Processing Systems (NIPS), May 2006 (inproceedings)

Abstract
Many real-world classification problems involve the prediction of multiple inter-dependent variables forming some structural dependency. Recent progress in machine learning has mainly focused on supervised classification of such structured variables. In this paper, we investigate structured classification in a semi-supervised setting. We present a discriminative approach that utilizes the intrinsic geometry of input patterns revealed by unlabeled data points and we derive a maximum-margin formulation of semi-supervised learning for structured variables. Unlike transductive algorithms, our formulation naturally extends to new test points.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Generalized Nonnegative Matrix Approximations with Bregman Divergences

Dhillon, I., Sra, S.

In Advances in neural information processing systems 18, pages: 283-290, (Editors: Weiss, Y. , B. Schölkopf, J. Platt), MIT Press, Cambridge, MA, USA, Nineteenth Annual Conference on Neural Information Processing Systems (NIPS), May 2006 (inproceedings)

Abstract
Nonnegative matrix approximation (NNMA) is a recent technique for dimensionality reduction and data analysis that yields a parts based, sparse nonnegative representation for nonnegative input data. NNMA has found a wide variety of applications, including text analysis, document clustering, face/image recognition, language modeling, speech processing and many others. Despite these numerous applications, the algorithmic development for computing the NNMA factors has been relatively efficient. This paper makes algorithmic progress by modeling and solving (using multiplicative updates) new generalized NNMA problems that minimize Bregman divergences between the input matrix and its lowrank approximation. The multiplicative update formulae in the pioneering work by Lee and Seung [11] arise as a special case of our algorithms. In addition, the paper shows how to use penalty functions for incorporating constraints other than nonnegativity into the problem. Further, some interesting extensions to the use of "link" functions for modeling nonlinear relationships are also discussed.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Fast Gaussian Process Regression using KD-Trees

Shen, Y., Ng, A., Seeger, M.

In Advances in neural information processing systems 18, pages: 1225-1232, (Editors: Weiss, Y. , B. Schölkopf, J. Platt), MIT Press, Cambridge, MA, USA, Nineteenth Annual Conference on Neural Information Processing Systems (NIPS), May 2006 (inproceedings)

Abstract
The computation required for Gaussian process regression with n training examples is about O(n3) during training and O(n) for each prediction. This makes Gaussian process regression too slow for large datasets. In this paper, we present a fast approximation method, based on kd-trees, that significantly reduces both the prediction and the training times of Gaussian process regression.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Products of "Edge-perts"

Gehler, PV., Welling, M.

In Advances in neural information processing systems 18, pages: 419-426, (Editors: Weiss, Y. , B. Schölkopf, J. Platt), MIT Press, Cambridge, MA, USA, Nineteenth Annual Conference on Neural Information Processing Systems (NIPS), May 2006 (inproceedings)

Abstract
Images represent an important and abundant source of data. Understanding their statistical structure has important applications such as image compression and restoration. In this paper we propose a particular kind of probabilistic model, dubbed the “products of edge-perts model” to describe the structure of wavelet transformed images. We develop a practical denoising algorithm based on a single edge-pert and show state-ofthe-art denoising performance on benchmark images.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Assessing Approximations for Gaussian Process Classification

Kuss, M., Rasmussen, C.

In Advances in neural information processing systems 18, pages: 699-706, (Editors: Weiss, Y. , B. Schölkopf, J. Platt), MIT Press, Cambridge, MA, USA, Nineteenth Annual Conference on Neural Information Processing Systems (NIPS), May 2006 (inproceedings)

Abstract
Gaussian processes are attractive models for probabilistic classification but unfortunately exact inference is analytically intractable. We compare Laplace‘s method and Expectation Propagation (EP) focusing on marginal likelihood estimates and predictive performance. We explain theoretically and corroborate empirically that EP is superior to Laplace. We also compare to a sophisticated MCMC scheme and show that EP is surprisingly accurate.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Worst-Case Bounds for Gaussian Process Models

Kakade, S., Seeger, M., Foster, D.

In Advances in neural information processing systems 18, pages: 619-626, (Editors: Weiss, Y. , B. Schölkopf, J. Platt), MIT Press, Cambridge, MA, USA, Nineteenth Annual Conference on Neural Information Processing Systems (NIPS), May 2006 (inproceedings)

Abstract
We present a competitive analysis of some non-parametric Bayesian algorithms in a worst-case online learning setting, where no probabilistic assumptions about the generation of the data are made. We consider models which use a Gaussian process prior (over the space of all functions) and provide bounds on the regret (under the log loss) for commonly used non-parametric Bayesian algorithms - including Gaussian regression and logistic regression - which show how these algorithms can perform favorably under rather general conditions. These bounds explicitly handle the infinite dimensionality of these non-parametric classes in a natural way. We also make formal connections to the minimax and emph{minimum description length} (MDL) framework. Here, we show precisely how Bayesian Gaussian regression is a minimax strategy.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Row-Action Methods for Compressed Sensing

Sra, S., Tropp, J.

In ICASSP 2006, pages: 868-871, IEEE Operations Center, Piscataway, NJ, USA, IEEE International Conference on Acoustics, Speech and Signal Processing, May 2006 (inproceedings)

Abstract
Compressed Sensing uses a small number of random, linear measurements to acquire a sparse signal. Nonlinear algorithms, such as l1 minimization, are used to reconstruct the signal from the measured data. This paper proposes rowaction methods as a computational approach to solving the l1 optimization problem. This paper presents a specific rowaction method and provides extensive empirical evidence that it is an effective technique for signal reconstruction. This approach offers several advantages over interior-point methods, including minimal storage and computational requirements, scalability, and robustness.

Web DOI [BibTex]

Web DOI [BibTex]


no image
Learning an Interest Operator from Human Eye Movements

Kienzle, W., Wichmann, F., Schölkopf, B., Franz, M.

In CVPWR 2006, pages: page 24, (Editors: C Schmid and S Soatto and C Tomasi), IEEE Computer Society, Los Alamitos, CA, USA, 2006 Conference on Computer Vision and Pattern Recognition Workshop, April 2006 (inproceedings)

Abstract
We present an approach for designing interest operators that are based on human eye movement statistics. In contrast to existing methods which use hand-crafted saliency measures, we use machine learning methods to infer an interest operator directly from eye movement data. That way, the operator provides a measure of biologically plausible interestingness. We describe the data collection, training, and evaluation process, and show that our learned saliency measure significantly accounts for human eye movements. Furthermore, we illustrate connections to existing interest operators, and present a multi-scale interest point detector based on the learned function.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Evaluating Predictive Uncertainty Challenge

Quinonero Candela, J., Rasmussen, C., Sinz, F., Bousquet, O., Schölkopf, B.

In Machine Learning Challenges: Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment, pages: 1-27, (Editors: J Quiñonero Candela and I Dagan and B Magnini and F d’Alché-Buc), Springer, Berlin, Germany, First PASCAL Machine Learning Challenges Workshop (MLCW), April 2006 (inproceedings)

Abstract
This Chapter presents the PASCAL Evaluating Predictive Uncertainty Challenge, introduces the contributed Chapters by the participants who obtained outstanding results, and provides a discussion with some lessons to be learnt. The Challenge was set up to evaluate the ability of Machine Learning algorithms to provide good “probabilistic predictions”, rather than just the usual “point predictions” with no measure of uncertainty, in regression and classification problems. Parti-cipants had to compete on a number of regression and classification tasks, and were evaluated by both traditional losses that only take into account point predictions and losses we proposed that evaluate the quality of the probabilistic predictions.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Estimating Predictive Variances with Kernel Ridge Regression

Cawley, G., Talbot, N., Chapelle, O.

In MLCW 2005, pages: 56-77, (Editors: Quinonero-Candela, J. , I. Dagan, B. Magnini, F. D‘Alché-Buc), Springer, Berlin, Germany, First PASCAL Machine Learning Challenges Workshop, April 2006 (inproceedings)

Abstract
In many regression tasks, in addition to an accurate estimate of the conditional mean of the target distribution, an indication of the predictive uncertainty is also required. There are two principal sources of this uncertainty: the noise process contaminating the data and the uncertainty in estimating the model parameters based on a limited sample of training data. Both of them can be summarised in the predictive variance which can then be used to give confidence intervals. In this paper, we present various schemes for providing predictive variances for kernel ridge regression, especially in the case of a heteroscedastic regression, where the variance of the noise process contaminating the data is a smooth function of the explanatory variables. The use of leave-one-out cross-validation is shown to eliminate the bias inherent in estimates of the predictive variance. Results obtained on all three regression tasks comprising the predictive uncertainty challenge demonstrate the value of this approach.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
ICA by PCA Approach: Relating Higher-Order Statistics to Second-Order Moments

Zhang, K., Chan, L.

In Independent Component Analysis and Blind Signal Separation, pages: 311-318, (Editors: J P Rosca and D Erdogmus and J C Príncipe and S Haykin), Springer, 6th International Conference on Independent Component Analysis and Blind Signal Separation (ICA), March 2006, Series: Lecture Notes in Computer Science, Vol. 3889 (inproceedings)

Web [BibTex]

Web [BibTex]


no image
Machine Learning Methods For Estimating Operator Equations

Steinke, F., Schölkopf, B.

In Proceedings of the 14th IFAC Symposium on System Identification (SYSID 2006), pages: 6, (Editors: B Ninness and H Hjalmarsson), Elsevier, Oxford, United Kingdom, 14th IFAC Symposium on System Identification (SYSID), March 2006 (inproceedings)

Abstract
We consider the problem of fitting a linear operator induced equation to point sampled data. In order to do so we systematically exploit the duality between minimizing a regularization functional derived from an operator and kernel regression methods. Standard machine learning model selection algorithms can then be interpreted as a search of the equation best fitting given data points. For many kernels this operator induced equation is a linear differential equation. Thus, we link a continuous-time system identification task with common machine learning methods. The presented link opens up a wide variety of methods to be applied to this system identification problem. In a series of experiments we demonstrate an example algorithm working on non-uniformly spaced data, giving special focus to the problem of identifying one system from multiple data recordings.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Implicit Volterra and Wiener Series for Higher-Order Image Analysis

Franz, M., Schölkopf, B.

In Advances in Data Analysis: Proceedings of the 30th Annual Conference of The Gesellschaft für Klassifikation, 30, pages: 1, March 2006 (inproceedings)

Abstract
The computation of classical higher-order statistics such as higher-order moments or spectra is difficult for images due to the huge number of terms to be estimated and interpreted. We propose an alternative approach in which multiplicative pixel interactions are described by a series of Wiener functionals. Since the functionals are estimated implicitly via polynomial kernels, the combinatorial explosion associated with the classical higher-order statistics is avoided. In addition, the kernel framework allows for estimating infinite series expansions and for the regularized estimation of the Wiener series. First results show that image structures such as lines or corners can be predicted correctly, and that pixel interactions up to the order of five play an important role in natural images.

PDF [BibTex]

PDF [BibTex]


no image
Class prediction from time series gene expression profiles using dynamical systems kernels

Borgwardt, KM., Vishwanathan, SVN., Kriegel, H-P.

In pages: 547-558, (Editors: Altman, R.B. A.K. Dunker, L. Hunter, T. Murray, T.E. Klein), World Scientific, Singapore, Pacific Symposium on Biocomputing (PSB), January 2006 (inproceedings)

Abstract
We present a kernel-based approach to the classification of time series of gene expression profiles. Our method takes into account the dynamic evolution over time as well as the temporal characteristics of the data. More specifically, we model the evolution of the gene expression profiles as a Linear Time Invariant (LTI) dynamical system and estimate its model parameters. A kernel on dynamical systems is then used to classify these time series. We successfully test our approach on a published dataset to predict response to drug therapy in Multiple Sclerosis patients. For pharmacogenomics, our method offers a huge potential for advanced computational tools in disease diagnosis, and disease and drug therapy outcome prognosis.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Causal Inference by Choosing Graphs with Most Plausible Markov Kernels

Sun, X., Janzing, D., Schölkopf, B.

In Proceedings of the 9th International Symposium on Artificial Intelligence and Mathematics, pages: 1-11, ISAIM, January 2006 (inproceedings)

Abstract
We propose a new inference rule for estimating causal structure that underlies the observed statistical dependencies among n random variables. Our method is based on comparing the conditional distributions of variables given their direct causes (the so-called Markov kernels") for all hypothetical causal directions and choosing the most plausible one. We consider those Markov kernels most plausible, which maximize the (conditional) entropies constrained by their observed first moment (expectation) and second moments (variance and covariance with its direct causes) based on their given domain. In this paper, we discuss our inference rule for causal relationships between two variables in detail, apply it to a real-world temperature data set with known causality and show that our method provides a correct result for the example.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Extensions of ICA for Causality Discovery in the Hong Kong Stock Market

Zhang, K., Chan, L.

In Neural Information Processing, 13th International Conference, ICONIP 2006, pages: 400-409, (Editors: I King and J Wang and L Chan and D L Wang), Springer, 13th International Conference on Neural Information Processing (ICONIP), 2006, Lecture Notes in Computer Science, 2006, Volume 4234/2006 (inproceedings)

Web DOI [BibTex]

Web DOI [BibTex]


no image
Enhancement of source independence for blind source separation

Zhang, K., Chan, L.

In Independent Component Analysis and Blind Signal Separation, LNCS 3889, pages: 731-738, (Editors: J. Rosca and D. Erdogmus and JC Príncipe und S. Haykin), Springer, Berlin, Germany, 6th International Conference on Independent Component Analysis and Blind Signal Separation (ICA), 2006, Lecture Notes in Computer Science, 2006, Volume 3889/2006 (inproceedings)

Web DOI [BibTex]

Web DOI [BibTex]


no image
ICA with Sparse Connections

Zhang, K., Chan, L.

In Intelligent Data Engineering and Automated Learning – IDEAL 2006, pages: 530-537, (Editors: E Corchado and H Yin and V Botti und Colin Fyfe), Springer, 7th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL), 2006, Lecture Notes in Computer Science, 2006, Volume 4224/2006 (inproceedings)

Web DOI [BibTex]

Web DOI [BibTex]


no image
How to choose the covariance for Gaussian process regression independently of the basis

Franz, M., Gehler, P.

In Proceedings of the Workshop Gaussian Processes in Practice, Workshop Gaussian Processes in Practice (GPIP), 2006 (inproceedings)

pdf [BibTex]

pdf [BibTex]


no image
Learning operational space control

Peters, J., Schaal, S.

In Robotics: Science and Systems II (RSS 2006), pages: 255-262, (Editors: Gaurav S. Sukhatme and Stefan Schaal and Wolfram Burgard and Dieter Fox), Cambridge, MA: MIT Press, RSS , 2006, clmc (inproceedings)

Abstract
While operational space control is of essential importance for robotics and well-understood from an analytical point of view, it can be prohibitively hard to achieve accurate control in face of modeling errors, which are inevitable in complex robots, e.g., humanoid robots. In such cases, learning control methods can offer an interesting alternative to analytical control algorithms. However, the resulting learning problem is ill-defined as it requires to learn an inverse mapping of a usually redundant system, which is well known to suffer from the property of non-covexity of the solution space, i.e., the learning system could generate motor commands that try to steer the robot into physically impossible configurations. A first important insight for this paper is that, nevertheless, a physically correct solution to the inverse problem does exits when learning of the inverse map is performed in a suitable piecewise linear way. The second crucial component for our work is based on a recent insight that many operational space controllers can be understood in terms of a constraint optimal control problem. The cost function associated with this optimal control problem allows us to formulate a learning algorithm that automatically synthesizes a globally consistent desired resolution of redundancy while learning the operational space controller. From the view of machine learning, the learning problem corresponds to a reinforcement learning problem that maximizes an immediate reward and that employs an expectation-maximization policy search algorithm. Evaluations on a three degrees of freedom robot arm illustrate the feasability of our suggested approach.

link (url) [BibTex]

link (url) [BibTex]


no image
Reinforcement Learning for Parameterized Motor Primitives

Peters, J., Schaal, S.

In Proceedings of the 2006 International Joint Conference on Neural Networks, pages: 73-80, IJCNN, 2006, clmc (inproceedings)

Abstract
One of the major challenges in both action generation for robotics and in the understanding of human motor control is to learn the "building blocks of movement generation", called motor primitives. Motor primitives, as used in this paper, are parameterized control policies such as splines or nonlinear differential equations with desired attractor properties. While a lot of progress has been made in teaching parameterized motor primitives using supervised or imitation learning, the self-improvement by interaction of the system with the environment remains a challenging problem. In this paper, we evaluate different reinforcement learning approaches for improving the performance of parameterized motor primitives. For pursuing this goal, we highlight the difficulties with current reinforcement learning methods, and outline both established and novel algorithms for the gradient-based improvement of parameterized policies. We compare these algorithms in the context of motor primitive learning, and show that our most modern algorithm, the Episodic Natural Actor-Critic outperforms previous algorithms by at least an order of magnitude. We demonstrate the efficiency of this reinforcement learning method in the application of learning to hit a baseball with an anthropomorphic robot arm.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


Thumb xl screen shot 2012 06 06 at 11.30.03 am
The rate adapting poisson model for information retrieval and object recognition

Gehler, P. V., Holub, A. D., Welling, M.

In Proceedings of the 23rd international conference on Machine learning, pages: 337-344, ICML ’06, ACM, New York, NY, USA, 2006 (inproceedings)

project page pdf DOI [BibTex]

project page pdf DOI [BibTex]


no image
Policy gradient methods for robotics

Peters, J., Schaal, S.

In Proceedings of the IEEE International Conference on Intelligent Robotics Systems, pages: 2219-2225, IROS, 2006, clmc (inproceedings)

Abstract
The aquisition and improvement of motor skills and control policies for robotics from trial and error is of essential importance if robots should ever leave precisely pre-structured environments. However, to date only few existing reinforcement learning methods have been scaled into the domains of highdimensional robots such as manipulator, legged or humanoid robots. Policy gradient methods remain one of the few exceptions and have found a variety of applications. Nevertheless, the application of such methods is not without peril if done in an uninformed manner. In this paper, we give an overview on learning with policy gradient methods for robotics with a strong focus on recent advances in the field. We outline previous applications to robotics and show how the most recently developed methods can significantly improve learning performance. Finally, we evaluate our most promising algorithm in the application of hitting a baseball with an anthropomorphic arm.

link (url) DOI [BibTex]

link (url) DOI [BibTex]

2002


no image
Gender Classification of Human Faces

Graf, A., Wichmann, F.

In Biologically Motivated Computer Vision, pages: 1-18, (Editors: Bülthoff, H. H., S.W. Lee, T. A. Poggio and C. Wallraven), Springer, Berlin, Germany, Second International Workshop on Biologically Motivated Computer Vision (BMCV), November 2002 (inproceedings)

Abstract
This paper addresses the issue of combining pre-processing methods—dimensionality reduction using Principal Component Analysis (PCA) and Locally Linear Embedding (LLE)—with Support Vector Machine (SVM) classification for a behaviorally important task in humans: gender classification. A processed version of the MPI head database is used as stimulus set. First, summary statistics of the head database are studied. Subsequently the optimal parameters for LLE and the SVM are sought heuristically. These values are then used to compare the original face database with its processed counterpart and to assess the behavior of a SVM with respect to changes in illumination and perspective of the face images. Overall, PCA was superior in classification performance and allowed linear separability.

PDF PDF DOI [BibTex]

2002

PDF PDF DOI [BibTex]


no image
Insect-Inspired Estimation of Self-Motion

Franz, MO., Chahl, JS.

In Biologically Motivated Computer Vision, (2525):171-180, LNCS, (Editors: Bülthoff, H.H. , S.W. Lee, T.A. Poggio, C. Wallraven), Springer, Berlin, Germany, Second International Workshop on Biologically Motivated Computer Vision (BMCV), November 2002 (inproceedings)

Abstract
The tangential neurons in the fly brain are sensitive to the typical optic flow patterns generated during self-motion. In this study, we examine whether a simplified linear model of these neurons can be used to estimate self-motion from the optic flow. We present a theory for the construction of an optimal linear estimator incorporating prior knowledge about the environment. The optimal estimator is tested on a gantry carrying an omnidirectional vision sensor. The experiments show that the proposed approach leads to accurate and robust estimates of rotation rates, whereas translation estimates turn out to be less reliable.

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Combining sensory Information to Improve Visualization

Ernst, M., Banks, M., Wichmann, F., Maloney, L., Bülthoff, H.

In Proceedings of the Conference on Visualization ‘02 (VIS ‘02), pages: 571-574, (Editors: Moorhead, R. , M. Joy), IEEE, Piscataway, NJ, USA, IEEE Conference on Visualization (VIS '02), October 2002 (inproceedings)

Abstract
Seemingly effortlessly the human brain reconstructs the three-dimensional environment surrounding us from the light pattern striking the eyes. This seems to be true across almost all viewing and lighting conditions. One important factor for this apparent easiness is the redundancy of information provided by the sensory organs. For example, perspective distortions, shading, motion parallax, or the disparity between the two eyes' images are all, at least partly, redundant signals which provide us with information about the three-dimensional layout of the visual scene. Our brain uses all these different sensory signals and combines the available information into a coherent percept. In displays visualizing data, however, the information is often highly reduced and abstracted, which may lead to an altered perception and therefore a misinterpretation of the visualized data. In this panel we will discuss mechanisms involved in the combination of sensory information and their implications for simulations using computer displays, as well as problems resulting from current display technology such as cathode-ray tubes.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Sampling Techniques for Kernel Methods

Achlioptas, D., McSherry, F., Schölkopf, B.

In Advances in neural information processing systems 14 , pages: 335-342, (Editors: TG Dietterich and S Becker and Z Ghahramani), MIT Press, Cambridge, MA, USA, 15th Annual Neural Information Processing Systems Conference (NIPS), September 2002 (inproceedings)

Abstract
We propose randomized techniques for speeding up Kernel Principal Component Analysis on three levels: sampling and quantization of the Gram matrix in training, randomized rounding in evaluating the kernel expansions, and random projections in evaluating the kernel itself. In all three cases, we give sharp bounds on the accuracy of the obtained approximations.

PDF Web [BibTex]

PDF Web [BibTex]


no image
The Infinite Hidden Markov Model

Beal, MJ., Ghahramani, Z., Rasmussen, CE.

In Advances in Neural Information Processing Systems 14, pages: 577-584, (Editors: Dietterich, T.G. , S. Becker, Z. Ghahramani), MIT Press, Cambridge, MA, USA, Fifteenth Annual Neural Information Processing Systems Conference (NIPS), September 2002 (inproceedings)

Abstract
We show that it is possible to extend hidden Markov models to have a countably infinite number of hidden states. By using the theory of Dirichlet processes we can implicitly integrate out the infinitely many transition parameters, leaving only three hyperparameters which can be learned from data. These three hyperparameters define a hierarchical Dirichlet process capable of capturing a rich set of transition dynamics. The three hyperparameters control the time scale of the dynamics, the sparsity of the underlying state-transition matrix, and the expected number of distinct hidden states in a finite sequence. In this framework it is also natural to allow the alphabet of emitted symbols to be infinite - consider, for example, symbols being possible words appearing in English text.

PDF Web [BibTex]

PDF Web [BibTex]


no image
A new discriminative kernel from probabilistic models

Tsuda, K., Kawanabe, M., Rätsch, G., Sonnenburg, S., Müller, K.

In Advances in Neural Information Processing Systems 14, pages: 977-984, (Editors: Dietterich, T.G. , S. Becker, Z. Ghahramani), MIT Press, Cambridge, MA, USA, Fifteenth Annual Neural Information Processing Systems Conference (NIPS), September 2002 (inproceedings)

Abstract
Recently, Jaakkola and Haussler proposed a method for constructing kernel functions from probabilistic models. Their so called \Fisher kernel" has been combined with discriminative classi ers such as SVM and applied successfully in e.g. DNA and protein analysis. Whereas the Fisher kernel (FK) is calculated from the marginal log-likelihood, we propose the TOP kernel derived from Tangent vectors Of Posterior log-odds. Furthermore, we develop a theoretical framework on feature extractors from probabilistic models and use it for analyzing the TOP kernel. In experiments our new discriminative TOP kernel compares favorably to the Fisher kernel.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Incorporating Invariances in Non-Linear Support Vector Machines

Chapelle, O., Schölkopf, B.

In Advances in Neural Information Processing Systems 14, pages: 609-616, (Editors: TG Dietterich and S Becker and Z Ghahramani), MIT Press, Cambridge, MA, USA, 15th Annual Neural Information Processing Systems Conference (NIPS), September 2002 (inproceedings)

Abstract
The choice of an SVM kernel corresponds to the choice of a representation of the data in a feature space and, to improve performance, it should therefore incorporate prior knowledge such as known transformation invariances. We propose a technique which extends earlier work and aims at incorporating invariances in nonlinear kernels. We show on a digit recognition task that the proposed approach is superior to the Virtual Support Vector method, which previously had been the method of choice.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Kernel feature spaces and nonlinear blind source separation

Harmeling, S., Ziehe, A., Kawanabe, M., Müller, K.

In Advances in Neural Information Processing Systems 14, pages: 761-768, (Editors: Dietterich, T. G., S. Becker, Z. Ghahramani), MIT Press, Cambridge, MA, USA, Fifteenth Annual Neural Information Processing Systems Conference (NIPS), September 2002 (inproceedings)

Abstract
In kernel based learning the data is mapped to a kernel feature space of a dimension that corresponds to the number of training data points. In practice, however, the data forms a smaller submanifold in feature space, a fact that has been used e.g. by reduced set techniques for SVMs. We propose a new mathematical construction that permits to adapt to the intrinsic dimension and to find an orthonormal basis of this submanifold. In doing so, computations get much simpler and more important our theoretical framework allows to derive elegant kernelized blind source separation (BSS) algorithms for arbitrary invertible nonlinear mixings. Experiments demonstrate the good performance and high computational efficiency of our kTDSEP algorithm for the problem of nonlinear BSS.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Algorithms for Learning Function Distinguishable Regular Languages

Fernau, H., Radl, A.

In Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition, pages: 64-73, (Editors: Caelli, T. , A. Amin, R. P.W. Duin, M. Kamel, D. de Ridder), Springer, Berlin, Germany, Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition, August 2002 (inproceedings)

Abstract
Function distinguishable languages were introduced as a new methodology of defining characterizable subclasses of the regular languages which are learnable from text. Here, we give details on the implementation and the analysis of the corresponding learning algorithms. We also discuss problems which might occur in practical applications.

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Decision Boundary Pattern Selection for Support Vector Machines

Shin, H., Cho, S.

In Proc. of the Korean Data Mining Conference, pages: 33-41, Korean Data Mining Conference, May 2002 (inproceedings)

[BibTex]

[BibTex]


no image
k-NN based Pattern Selection for Support Vector Classifiers

Shin, H., Cho, S.

In Proc. of the Korean Industrial Engineers Conference, pages: 645-651, Korean Industrial Engineers Conference, May 2002 (inproceedings)

[BibTex]

[BibTex]


no image
Microarrays: How Many Do You Need?

Zien, A., Fluck, J., Zimmer, R., Lengauer, T.

In RECOMB 2002, pages: 321-330, ACM Press, New York, NY, USA, Sixth Annual International Conference on Research in Computational Molecular Biology, April 2002 (inproceedings)

Abstract
We estimate the number of microarrays that is required in order to gain reliable results from a common type of study: the pairwise comparison of different classes of samples. Current knowlegde seems to suffice for the construction of models that are realistic with respect to searches for individual differentially expressed genes. Such models allow to investigate the dependence of the required number of samples on the relevant parameters: the biological variability of the samples within each class; the fold changes in expression; the detection sensitivity of the microarrays; and the acceptable error rates of the results. We supply experimentalists with general conclusions as well as a freely accessible Java applet at http://cartan.gmd.de/~zien/classsize/ for fine tuning simulations to their particular actualities. Since the situation can be assumed to be very similar for large scale proteomics and metabolomics studies, our methods and results might also apply there.

Web DOI [BibTex]

Web DOI [BibTex]


no image
Pattern Selection for Support Vector Classifiers

Shin, H., Cho, S.

In Ideal 2002, pages: 97-103, (Editors: Yin, H. , N. Allinson, R. Freeman, J. Keane, S. Hubbard), Springer, Berlin, Germany, Third International Conference on Intelligent Data Engineering and Automated Learning, January 2002 (inproceedings)

Abstract
SVMs tend to take a very long time to train with a large data set. If "redundant" patterns are identified and deleted in pre-processing, the training time could be reduced significantly. We propose a k-nearest neighbors(k-NN) based pattern selection method. The method tries to select the patterns that are near the decision boundary and that are correctly labeled. The simulations over synthetic data sets showed promising results: (1) By converting a non-separable problem to a separable one, the search for an optimal error tolerance parameter became unnecessary. (2) SVM training time decreased by two orders of magnitude without any loss of accuracy. (3) The redundant SVs were substantially reduced.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
The leave-one-out kernel

Tsuda, K., Kawanabe, M.

In Artificial Neural Networks -- ICANN 2002, 2415, pages: 727-732, LNCS, (Editors: Dorronsoro, J. R.), Artificial Neural Networks -- ICANN, 2002 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Localized Rademacher Complexities

Bartlett, P., Bousquet, O., Mendelson, S.

In Proceedings of the 15th annual conference on Computational Learning Theory, pages: 44-58, Proceedings of the 15th annual conference on Computational Learning Theory, 2002 (inproceedings)

Abstract
We investigate the behaviour of global and local Rademacher averages. We present new error bounds which are based on the local averages and indicate how data-dependent local averages can be estimated without {it a priori} knowledge of the class at hand.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Film Cooling: A Comparative Study of Different Heaterfoil Configurations for Liquid Crystals Experiments

Vogel, G., Graf, ABA., Weigand, B.

In ASME TURBO EXPO 2002, Amsterdam, GT-2002-30552, ASME TURBO EXPO, Amsterdam, 2002 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Some Local Measures of Complexity of Convex Hulls and Generalization Bounds

Bousquet, O., Koltchinskii, V., Panchenko, D.

In Proceedings of the 15th annual conference on Computational Learning Theory, Proceedings of the 15th annual conference on Computational Learning Theory, 2002 (inproceedings)

Abstract
We investigate measures of complexity of function classes based on continuity moduli of Gaussian and Rademacher processes. For Gaussian processes, we obtain bounds on the continuity modulus on the convex hull of a function class in terms of the same quantity for the class itself. We also obtain new bounds on generalization error in terms of localized Rademacher complexities. This allows us to prove new results about generalization performance for convex hulls in terms of characteristics of the base class. As a byproduct, we obtain a simple proof of some of the known bounds on the entropy of convex hulls.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
A kernel approach for learning from almost orthogonal patterns

Schölkopf, B., Weston, J., Eskin, E., Leslie, C., Noble, W.

In Principles of Data Mining and Knowledge Discovery, Lecture Notes in Computer Science, 2430/2431, pages: 511-528, Lecture Notes in Computer Science, (Editors: T Elomaa and H Mannila and H Toivonen), Springer, Berlin, Germany, 13th European Conference on Machine Learning (ECML) and 6th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'2002), 2002 (inproceedings)

PostScript DOI [BibTex]

PostScript DOI [BibTex]


no image
Infinite Mixtures of Gaussian Process Experts

Rasmussen, CE., Ghahramani, Z.

In (Editors: Dietterich, Thomas G.; Becker, Suzanna; Ghahramani, Zoubin), 2002 (inproceedings)

Abstract
We present an extension to the Mixture of Experts (ME) model, where the individual experts are Gaussian Process (GP) regression models. Using a input-dependent adaptation of the Dirichlet Process, we implement a gating network for an infinite number of Experts. Inference in this model may be done efficiently using a Markov Chain relying on Gibbs sampling. The model allows the effective covariance function to vary with the inputs, and may handle large datasets -- thus potentially overcoming two of the biggest hurdles with GP models. Simulations show the viability of this approach.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Marginalized kernels for RNA sequence data analysis

Kin, T., Tsuda, K., Asai, K.

In Genome Informatics 2002, pages: 112-122, (Editors: Lathtop, R. H.; Nakai, K.; Miyano, S.; Takagi, T.; Kanehisa, M.), Genome Informatics, 2002, (Best Paper Award) (inproceedings)

Web [BibTex]

Web [BibTex]


no image
Luminance Artifacts on CRT Displays

Wichmann, F.

In IEEE Visualization, pages: 571-574, (Editors: Moorhead, R.; Gross, M.; Joy, K. I.), IEEE Visualization, 2002 (inproceedings)

Abstract
Most visualization panels today are still built around cathode-ray tubes (CRTs), certainly on personal desktops at work and at home. Whilst capable of producing pleasing images for common applications ranging from email writing to TV and DVD presentation, it is as well to note that there are a number of nonlinear transformations between input (voltage) and output (luminance) which distort the digital and/or analogue images send to a CRT. Some of them are input-independent and hence easy to fix, e.g. gamma correction, but others, such as pixel interactions, depend on the content of the input stimulus and are thus harder to compensate for. CRT-induced image distortions cause problems not only in basic vision research but also for applications where image fidelity is critical, most notably in medicine (digitization of X-ray images for diagnostic purposes) and in forms of online commerce, such as the online sale of images, where the image must be reproduced on some output device which will not have the same transfer function as the customer's CRT. I will present measurements from a number of CRTs and illustrate how some of their shortcomings may be problematic for the aforementioned applications.

[BibTex]

[BibTex]

1995


no image
View-based cognitive map learning by an autonomous robot

Mallot, H., Bülthoff, H., Georg, P., Schölkopf, B., Yasuhara, K.

In Proceedings International Conference on Artificial Neural Networks, vol. 2, pages: 381-386, (Editors: Fogelman-Soulié, F.), EC2, Paris, France, Conférence Internationale sur les Réseaux de Neurones Artificiels (ICANN '95), October 1995 (inproceedings)

Abstract
This paper presents a view-based approach to map learning and navigation in mazes. By means of graph theory we have shown that the view-graph is a sufficient representation for map behaviour such as path planning. A neural network for unsupervised learning of the view-graph from sequences of views is constructed. We use a modified Kohonen (1988) learning rule that transforms temporal sequence (rather than featural similarity) into connectedness. In the main part of the paper, we present a robot implementation of the scheme. The results show that the proposed network is able to support map behaviour in simple environments.

PDF [BibTex]

1995

PDF [BibTex]


no image
Extracting support data for a given task

Schölkopf, B., Burges, C., Vapnik, V.

In First International Conference on Knowledge Discovery & Data Mining (KDD-95), pages: 252-257, (Editors: UM Fayyad and R Uthurusamy), AAAI Press, Menlo Park, CA, USA, August 1995 (inproceedings)

Abstract
We report a novel possibility for extracting a small subset of a data base which contains all the information necessary to solve a given classification task: using the Support Vector Algorithm to train three different types of handwritten digit classifiers, we observed that these types of classifiers construct their decision surface from strongly overlapping small (k: 4%) subsets of the data base. This finding opens up the possibiiity of compressing data bases significantly by disposing of the data which is not important for the solution of a given task. In addition, we show that the theory allows us to predict the classifier that will have the best generalization ability, based solely on performance on the training set and characteristics of the learning machines. This finding is important for cases where the amount of available data is limited.

PDF [BibTex]

PDF [BibTex]