Header logo is ei


2004


no image
Minimum Sum-Squared Residue based clustering of Gene Expression Data

Cho, H., Guan, Y., Dhillon, I., Sra, S.

In SIAM Data Mining, pages: 00-00, SDM, April 2004 (inproceedings)

GZIP [BibTex]

2004

GZIP [BibTex]


no image
Preservation of Neighborhood Relation under Input to Feature Space Mapping in SVM Training

Shin, H., Cho, S.

In Proc. of the 33rd International Conference on Computers and Industrial Engineering (C&IE 2004), pages: 1-10, the 33rd International Conference on Computers and Industrial Engineering (C&IE), April 2004, in CD (inproceedings)

[BibTex]

[BibTex]


no image
Local Alignment Kernels for Biological Sequences

Vert, J., Saigo, H., Akutsu, T.

In Kernel Methods in Computational Biology, pages: 131-153, MIT Press, Cambridge, MA,, 2004 (inbook)

Web [BibTex]

Web [BibTex]


no image
Unifying Colloborative and Content-Based Filtering.

Basilico, J., Hofmann, T.

In ACM International Conference Proceeding Series, pages: 65 , (Editors: Greiner, R. , D. Schuurmans), ACM Press, New York, USA, ICLM, 2004 (inproceedings)

Abstract
Collaborative and content-based filtering are two paradigms that have been applied in the context of recommender systems and user preference prediction. This paper proposes a novel, unified approach that systematically integrates all available training information such as past user-item ratings as well as attributes of items or users to learn a prediction function. The key ingredient of our method is the design of a suitable kernel or similarity function between user-item pairs that allows simultaneous generalization across the user and item dimensions. We propose an on-line algorithm (JRank) that generalizes perceptron learning. Experimental results on the EachMovie data set show significant improvements over standard approaches.

PDF [BibTex]

PDF [BibTex]


no image
Clustering Protein Sequence and Structure Space with Infinite Gaussian Mixture Models

Dubey, A., Hwang, S., Rangel, C., Rasmussen, CE., Ghahramani, Z., Wild, DL.

In Pacific Symposium on Biocomputing 2004; Vol. 9, pages: 399-410, World Scientific Publishing, Singapore, Pacific Symposium on Biocomputing, 2004 (inproceedings)

Abstract
We describe a novel approach to the problem of automatically clustering protein sequences and discovering protein families, subfamilies etc., based on the thoery of infinite Gaussian mixture models. This method allows the data itself to dictate how many mixture components are required to model it, and provides a measure of the probability that two proteins belong to the same cluster. We illustrate our methods with application to three data sets: globin sequences, globin sequences with known tree-dimensional structures and G-pretein coupled receptor sequences. The consistency of the clusters indicate that that our methods is producing biologically meaningful results, which provide a very good indication of the underlying families and subfamilies. With the inclusion of secondary structure and residue solvent accessibility information, we obtain a classification of sequences of known structure which reflects and extends their SCOP classifications. A supplementary web site containing larger versions of the figures is available at http://public.kgi.edu/~wild/PSB04

PDF [BibTex]

PDF [BibTex]


no image
Efficient Approximations for Support Vector Machines in Object Detection

Kienzle, W., BakIr, G., Franz, M., Schölkopf, B.

In DAGM 2004, pages: 54-61, (Editors: CE Rasmussen and HH Bülthoff and B Schölkopf and MA Giese), Springer, Berlin, Germany, Pattern Recognition, Proceedings of the 26th DAGM Symposium, 2004 (inproceedings)

Abstract
We present a new approximation scheme for support vector decision functions in object detection. In the present approach we are building on an existing algorithm where the set of support vectors is replaced by a smaller so-called reduced set of synthetic points. Instead of finding the reduced set via unconstrained optimization, we impose a structural constraint on the synthetic vectors such that the resulting approximation can be evaluated via separable filters. Applications that require scanning an entire image can benefit from this representation: when using separable filters, the average computational complexity for evaluating a reduced set vector on a test patch of size (h x w) drops from O(hw) to O(h+w). We show experimental results on handwritten digits and face detection.

PDF [BibTex]

PDF [BibTex]


no image
Kernel Methods for Manifold Estimation

Schölkopf, B.

In Proceedings in Computational Statistics, pages: 441-452, (Editors: J Antoch), Physica-Verlag/Springer, Heidelberg, Germany, COMPSTAT, 2004 (inproceedings)

[BibTex]

[BibTex]


no image
A Regularization Framework for Learningfrom Graph Data

Zhou, D., Schölkopf, B.

In ICML Workshop on Statistical Relational Learning and Its Connections to Other Fields, pages: 132-137, ICML, 2004 (inproceedings)

Abstract
The data in many real-world problems can be thought of as a graph, such as the web, co-author networks, and biological networks. We propose a general regularization framework on graphs, which is applicable to the classification, ranking, and link prediction problems. We also show that the method can be explained as lazy random walks. We evaluate the method on a number of experiments.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Gaussian Processes in Machine Learning

Rasmussen, CE.

In 3176, pages: 63-71, Lecture Notes in Computer Science, (Editors: Bousquet, O., U. von Luxburg and G. Rätsch), Springer, Heidelberg, 2004, Copyright by Springer (inbook)

Abstract
We give a basic introduction to Gaussian Process regression models. We focus on understanding the role of the stochastic process and how it is used to define a distribution over functions. We present the simple equations for incorporating training data and examine how to learn the hyperparameters using the marginal likelihood. We explain the practical advantages of Gaussian Process and end with conclusions and a look at the current trends in GP work.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
A kernel view of the dimensionality reduction of manifolds

Ham, J., Lee, D., Mika, S., Schölkopf, B.

In Proceedings of the Twenty-First International Conference on Machine Learning, pages: 369-376, (Editors: CE Brodley), ACM, New York, NY, USA, ICML, 2004, also appeared as MPI-TR 110 (inproceedings)

Abstract
We interpret several well-known algorithms for dimensionality reduction of manifolds as kernel methods. Isomap, graph Laplacian eigenmap, and locally linear embedding (LLE) all utilize local neighborhood information to construct a global embedding of the manifold. We show how all three algorithms can be described as kernel PCA on specially constructed Gram matrices, and illustrate the similarities and differences between the algorithms with representative examples.

PDF [BibTex]

PDF [BibTex]


no image
Protein Functional Class Prediction with a Combined Graph

Shin, H., Tsuda, K., Schölkopf, B.

In Proceedings of the Korean Data Mining Conference, pages: 200-219, Proceedings of the Korean Data Mining Conference, 2004 (inproceedings)

Abstract
In bioinformatics, there exist multiple descriptions of graphs for the same set of genes or proteins. For instance, in yeast systems, graph edges can represent different relationships such as protein-protein interactions, genetic interactions, or co-participation in a protein complex, etc. Relying on similarities between nodes, each graph can be used independently for prediction of protein function. However, since different graphs contain partly independent and partly complementary information about the problem at hand, one can enhance the total information extracted by combining all graphs. In this paper, we propose a method for integrating multiple graphs within a framework of semi-supervised learning. The method alternates between minimizing the objective function with respect to network output and with respect to combining weights. We apply the method to the task of protein functional class prediction in yeast. The proposed method performs significantly better than the same algorithm trained on any single graph.

PDF [BibTex]

PDF [BibTex]


no image
Learning from Labeled and Unlabeled Data Using Random Walks

Zhou, D., Schölkopf, B.

In Pattern Recognition, Proceedings of the 26th DAGM Symposium, pages: 237-244, (Editors: Rasmussen, C.E., H.H. Bülthoff, M.A. Giese and B. Schölkopf), Pattern Recognition, Proceedings of the 26th DAGM Symposium, 2004 (inproceedings)

Abstract
We consider the general problem of learning from labeled and unlabeled data. Given a set of points, some of them are labeled, and the remaining points are unlabeled. The goal is to predict the labels of the unlabeled points. Any supervised learning algorithm can be applied to this problem, for instance, Support Vector Machines (SVMs). The problem of our interest is if we can implement a classifier which uses the unlabeled data information in some way and has higher accuracy than the classifiers which use the labeled data only. Recently we proposed a simple algorithm, which can substantially benefit from large amounts of unlabeled data and demonstrates clear superiority to supervised learning methods. In this paper we further investigate the algorithm using random walks and spectral graph theory, which shed light on the key steps in this algorithm.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Multivariate Regression via Stiefel Manifold Constraints

BakIr, G., Gretton, A., Franz, M., Schölkopf, B.

In Lecture Notes in Computer Science, Vol. 3175, pages: 262-269, (Editors: CE Rasmussen and HH Bülthoff and B Schölkopf and MA Giese), Springer, Berlin, Germany, Pattern Recognition, Proceedings of the 26th DAGM Symposium, 2004 (inproceedings)

Abstract
We introduce a learning technique for regression between high-dimensional spaces. Standard methods typically reduce this task to many one-dimensional problems, with each output dimension considered independently. By contrast, in our approach the feature construction and the regression estimation are performed jointly, directly minimizing a loss function that we specify, subject to a rank constraint. A major advantage of this approach is that the loss is no longer chosen according to the algorithmic requirements, but can be tailored to the characteristics of the task at hand; the features will then be optimal with respect to this objective, and dependence between the outputs can be exploited.

PostScript [BibTex]

PostScript [BibTex]


no image
Implicit estimation of Wiener series

Franz, M., Schölkopf, B.

In Machine Learning for Signal Processing XIV, Proc. 2004 IEEE Signal Processing Society Workshop, pages: 735-744, (Editors: A Barros and J Principe and J Larsen and T Adali and S Douglas), IEEE, New York, Machine Learning for Signal Processing XIV, Proc. IEEE Signal Processing Society Workshop, 2004 (inproceedings)

Abstract
The Wiener series is one of the standard methods to systematically characterize the nonlinearity of a system. The classical estimation method of the expansion coefficients via cross-correlation suffers from severe problems that prevent its application to high-dimensional and strongly nonlinear systems. We propose an implicit estimation method based on regression in a reproducing kernel Hilbert space that alleviates these problems. Experiments show performance advantages in terms of convergence, interpretability, and system sizes that can be handled.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Hilbertian Metrics on Probability Measures and their Application in SVM’s

Hein, H., Lal, T., Bousquet, O.

In Pattern Recognition, Proceedings of th 26th DAGM Symposium, 3175, pages: 270-277, Lecture Notes in Computer Science, (Editors: Rasmussen, C. E., H. H. Bülthoff, M. Giese and B. Schölkopf), Pattern Recognition, Proceedings of th 26th DAGM Symposium, 2004 (inproceedings)

Abstract
The goal of this article is to investigate the field of Hilbertian metrics on probability measures. Since they are very versatile and can therefore be applied in various problems they are of great interest in kernel methods. Quit recently Tops{o}e and Fuglede introduced a family of Hilbertian metrics on probability measures. We give basic properties of the Hilbertian metrics of this family and other used metrics in the literature. Then we propose an extension of the considered metrics which incorporates structural information of the probability space into the Hilbertian metric. Finally we compare all proposed metrics in an image and text classification problem using histogram data.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Gasussian process model based predictive control

Kocijan, J., Murray-Smith, R., Rasmussen, CE., Girard, A.

In Proceedings of the ACC 2004, pages: 2214-2219, Proceedings of the ACC, 2004 (inproceedings)

Abstract
Gaussian process models provide a probabilistic non-parametric modelling approach for black-box identi cation of non-linear dynamic systems. The Gaussian processes can highlight areas of the input space where prediction quality is poor, due to the lack of data or its complexity, by indicating the higher variance around the predicted mean. Gaussian process models contain noticeably less coef cients to be optimised. This paper illustrates possible application of Gaussian process models within model-based predictive control. The extra information provided within Gaussian process model is used in predictive control, where optimisation of control signal takes the variance information into account. The predictive control principle is demonstrated on control of pH process benchmark.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
A New Variational Framework for Rigid-Body Alignment

Kato, T., Tsuda, K., Tomii, K., Asai, K.

In Joint IAPR International Workshops on Syntactical and Structural Pattern Recognition (SSPR 2004) and Statistical Pattern Recognition (SPR 2004), pages: 171-179, (Editors: Fred, A.,T. Caelli, R.P.W. Duin, A. Campilho and D. de Ridder), Joint IAPR International Workshops on Syntactical and Structural Pattern Recognition (SSPR) and Statistical Pattern Recognition (SPR), 2004 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Protein Classification via Kernel Matrix Completion

Kin, T., Kato, T., Tsuda, K.

In pages: 261-274, (Editors: Schoelkopf, B., K. Tsuda and J.P. Vert), MIT Press, Cambridge, MA; USA, 2004 (inbook)

PDF [BibTex]

PDF [BibTex]


no image
Practical Method for Blind Inversion of Wiener Systems

Zhang, K., Chan, L.

In Proceedings of International Joint Conference on Neural Networks (IJCNN 2004), pages: 2163-2168, International Joint Conference on Neural Networks (IJCNN), 2004, Volume 3 (inproceedings)

DOI [BibTex]

DOI [BibTex]


no image
Semi-supervised kernel regression using whitened function classes

Franz, M., Kwon, Y., Rasmussen, C., Schölkopf, B.

In Pattern Recognition, Proceedings of the 26th DAGM Symposium, Lecture Notes in Computer Science, Vol. 3175, LNCS 3175, pages: 18-26, (Editors: CE Rasmussen and HH Bülthoff and MA Giese and B Schölkopf), Springer, Berlin, Gerrmany, 26th DAGM Symposium, 2004 (inproceedings)

Abstract
The use of non-orthonormal basis functions in ridge regression leads to an often undesired non-isotropic prior in function space. In this study, we investigate an alternative regularization technique that results in an implicit whitening of the basis functions by penalizing directions in function space with a large prior variance. The regularization term is computed from unlabelled input data that characterizes the input distribution. Tests on two datasets using polynomial basis functions showed an improved average performance compared to standard ridge regression.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Maximal Margin Classification for Metric Spaces

Hein, M., Bousquet, O.

In Learning Theory and Kernel Machines, pages: 72-86, (Editors: Schölkopf, B. and Warmuth, M. K.), Springer, Heidelberg, Germany, 16. Annual Conference on Computational Learning Theory / COLT Kernel, 2004 (inproceedings)

Abstract
In this article we construct a maximal margin classification algorithm for arbitrary metric spaces. At first we show that the Support Vector Machine (SVM) is a maximal margin algorithm for the class of metric spaces where the negative squared distance is conditionally positive definite (CPD). This means that the metric space can be isometrically embedded into a Hilbert space, where one performs linear maximal margin separation. We will show that the solution only depends on the metric, but not on the kernel. Following the framework we develop for the SVM, we construct an algorithm for maximal margin classification in arbitrary metric spaces. The main difference compared with SVM is that we no longer embed isometrically into a Hilbert space, but a Banach space. We further give an estimate of the capacity of the function class involved in this algorithm via Rademacher averages. We recover an algorithm of Graepel et al. [6].

PDF PostScript PDF DOI [BibTex]

PDF PostScript PDF DOI [BibTex]


no image
On the Convergence of Spectral Clustering on Random Samples: The Normalized Case

von Luxburg, U., Bousquet, O., Belkin, M.

In Proceedings of the 17th Annual Conference on Learning Theory, pages: 457-471, Proceedings of the 17th Annual Conference on Learning Theory, 2004 (inproceedings)

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Introduction to Statistical Learning Theory

Bousquet, O., Boucheron, S., Lugosi, G.

In Lecture Notes in Artificial Intelligence 3176, pages: 169-207, (Editors: Bousquet, O., U. von Luxburg and G. Rätsch), Springer, Heidelberg, Germany, 2004 (inbook)

PDF [BibTex]

PDF [BibTex]


no image
A Primer on Kernel Methods

Vert, J., Tsuda, K., Schölkopf, B.

In Kernel Methods in Computational Biology, pages: 35-70, (Editors: B Schölkopf and K Tsuda and JP Vert), MIT Press, Cambridge, MA, USA, 2004 (inbook)

PDF [BibTex]

PDF [BibTex]


no image
Concentration Inequalities

Boucheron, S., Lugosi, G., Bousquet, O.

In Lecture Notes in Artificial Intelligence 3176, pages: 208-240, (Editors: Bousquet, O., U. von Luxburg and G. Rätsch), Springer, Heidelberg, Germany, 2004 (inbook)

PDF [BibTex]

PDF [BibTex]


no image
Kernels for graphs

Kashima, H., Tsuda, K., Inokuchi, A.

In pages: 155-170, (Editors: Schoelkopf, B., K. Tsuda and J.P. Vert), MIT Press, Cambridge, MA; USA, 2004 (inbook)

PDF [BibTex]

PDF [BibTex]


no image
A primer on molecular biology

Zien, A.

In pages: 3-34, (Editors: Schoelkopf, B., K. Tsuda and J. P. Vert), MIT Press, Cambridge, MA, USA, 2004 (inbook)

Abstract
Modern molecular biology provides a rich source of challenging machine learning problems. This tutorial chapter aims to provide the necessary biological background knowledge required to communicate with biologists and to understand and properly formalize a number of most interesting problems in this application domain. The largest part of the chapter (its first section) is devoted to the cell as the basic unit of life. Four aspects of cells are reviewed in sequence: (1) the molecules that cells make use of (above all, proteins, RNA, and DNA); (2) the spatial organization of cells (``compartmentalization''); (3) the way cells produce proteins (``protein expression''); and (4) cellular communication and evolution (of cells and organisms). In the second section, an overview is provided of the most frequent measurement technologies, data types, and data sources. Finally, important open problems in the analysis of these data (bioinformatics challenges) are briefly outlined.

PDF PostScript Web [BibTex]

PDF PostScript Web [BibTex]


no image
Learning Movement Primitives

Schaal, S., Peters, J., Nakanishi, J., Ijspeert, A.

In 11th International Symposium on Robotics Research (ISRR2003), pages: 561-572, (Editors: Dario, P. and Chatila, R.), Springer, ISRR, 2004, clmc (inproceedings)

Abstract
This paper discusses a comprehensive framework for modular motor control based on a recently developed theory of dynamic movement primitives (DMP). DMPs are a formulation of movement primitives with autonomous nonlinear differential equations, whose time evolution creates smooth kinematic control policies. Model-based control theory is used to convert the outputs of these policies into motor commands. By means of coupling terms, on-line modifications can be incorporated into the time evolution of the differential equations, thus providing a rather flexible and reactive framework for motor planning and execution. The linear parameterization of DMPs lends itself naturally to supervised learning from demonstration. Moreover, the temporal, scale, and translation invariance of the differential equations with respect to these parameters provides a useful means for movement recognition. A novel reinforcement learning technique based on natural stochastic policy gradients allows a general approach of improving DMPs by trial and error learning with respect to almost arbitrary optimization criteria. We demonstrate the different ingredients of the DMP approach in various examples, involving skill learning from demonstration on the humanoid robot DB, and learning biped walking from demonstration in simulation, including self-improvement of the movement patterns towards energy efficiency through resonance tuning.

link (url) DOI [BibTex]

link (url) DOI [BibTex]

2002


no image
Gender Classification of Human Faces

Graf, A., Wichmann, F.

In Biologically Motivated Computer Vision, pages: 1-18, (Editors: Bülthoff, H. H., S.W. Lee, T. A. Poggio and C. Wallraven), Springer, Berlin, Germany, Second International Workshop on Biologically Motivated Computer Vision (BMCV), November 2002 (inproceedings)

Abstract
This paper addresses the issue of combining pre-processing methods—dimensionality reduction using Principal Component Analysis (PCA) and Locally Linear Embedding (LLE)—with Support Vector Machine (SVM) classification for a behaviorally important task in humans: gender classification. A processed version of the MPI head database is used as stimulus set. First, summary statistics of the head database are studied. Subsequently the optimal parameters for LLE and the SVM are sought heuristically. These values are then used to compare the original face database with its processed counterpart and to assess the behavior of a SVM with respect to changes in illumination and perspective of the face images. Overall, PCA was superior in classification performance and allowed linear separability.

PDF PDF DOI [BibTex]

2002

PDF PDF DOI [BibTex]


no image
Insect-Inspired Estimation of Self-Motion

Franz, MO., Chahl, JS.

In Biologically Motivated Computer Vision, (2525):171-180, LNCS, (Editors: Bülthoff, H.H. , S.W. Lee, T.A. Poggio, C. Wallraven), Springer, Berlin, Germany, Second International Workshop on Biologically Motivated Computer Vision (BMCV), November 2002 (inproceedings)

Abstract
The tangential neurons in the fly brain are sensitive to the typical optic flow patterns generated during self-motion. In this study, we examine whether a simplified linear model of these neurons can be used to estimate self-motion from the optic flow. We present a theory for the construction of an optimal linear estimator incorporating prior knowledge about the environment. The optimal estimator is tested on a gantry carrying an omnidirectional vision sensor. The experiments show that the proposed approach leads to accurate and robust estimates of rotation rates, whereas translation estimates turn out to be less reliable.

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Combining sensory Information to Improve Visualization

Ernst, M., Banks, M., Wichmann, F., Maloney, L., Bülthoff, H.

In Proceedings of the Conference on Visualization ‘02 (VIS ‘02), pages: 571-574, (Editors: Moorhead, R. , M. Joy), IEEE, Piscataway, NJ, USA, IEEE Conference on Visualization (VIS '02), October 2002 (inproceedings)

Abstract
Seemingly effortlessly the human brain reconstructs the three-dimensional environment surrounding us from the light pattern striking the eyes. This seems to be true across almost all viewing and lighting conditions. One important factor for this apparent easiness is the redundancy of information provided by the sensory organs. For example, perspective distortions, shading, motion parallax, or the disparity between the two eyes' images are all, at least partly, redundant signals which provide us with information about the three-dimensional layout of the visual scene. Our brain uses all these different sensory signals and combines the available information into a coherent percept. In displays visualizing data, however, the information is often highly reduced and abstracted, which may lead to an altered perception and therefore a misinterpretation of the visualized data. In this panel we will discuss mechanisms involved in the combination of sensory information and their implications for simulations using computer displays, as well as problems resulting from current display technology such as cathode-ray tubes.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Sampling Techniques for Kernel Methods

Achlioptas, D., McSherry, F., Schölkopf, B.

In Advances in neural information processing systems 14 , pages: 335-342, (Editors: TG Dietterich and S Becker and Z Ghahramani), MIT Press, Cambridge, MA, USA, 15th Annual Neural Information Processing Systems Conference (NIPS), September 2002 (inproceedings)

Abstract
We propose randomized techniques for speeding up Kernel Principal Component Analysis on three levels: sampling and quantization of the Gram matrix in training, randomized rounding in evaluating the kernel expansions, and random projections in evaluating the kernel itself. In all three cases, we give sharp bounds on the accuracy of the obtained approximations.

PDF Web [BibTex]

PDF Web [BibTex]


no image
The Infinite Hidden Markov Model

Beal, MJ., Ghahramani, Z., Rasmussen, CE.

In Advances in Neural Information Processing Systems 14, pages: 577-584, (Editors: Dietterich, T.G. , S. Becker, Z. Ghahramani), MIT Press, Cambridge, MA, USA, Fifteenth Annual Neural Information Processing Systems Conference (NIPS), September 2002 (inproceedings)

Abstract
We show that it is possible to extend hidden Markov models to have a countably infinite number of hidden states. By using the theory of Dirichlet processes we can implicitly integrate out the infinitely many transition parameters, leaving only three hyperparameters which can be learned from data. These three hyperparameters define a hierarchical Dirichlet process capable of capturing a rich set of transition dynamics. The three hyperparameters control the time scale of the dynamics, the sparsity of the underlying state-transition matrix, and the expected number of distinct hidden states in a finite sequence. In this framework it is also natural to allow the alphabet of emitted symbols to be infinite - consider, for example, symbols being possible words appearing in English text.

PDF Web [BibTex]

PDF Web [BibTex]


no image
A new discriminative kernel from probabilistic models

Tsuda, K., Kawanabe, M., Rätsch, G., Sonnenburg, S., Müller, K.

In Advances in Neural Information Processing Systems 14, pages: 977-984, (Editors: Dietterich, T.G. , S. Becker, Z. Ghahramani), MIT Press, Cambridge, MA, USA, Fifteenth Annual Neural Information Processing Systems Conference (NIPS), September 2002 (inproceedings)

Abstract
Recently, Jaakkola and Haussler proposed a method for constructing kernel functions from probabilistic models. Their so called \Fisher kernel" has been combined with discriminative classi ers such as SVM and applied successfully in e.g. DNA and protein analysis. Whereas the Fisher kernel (FK) is calculated from the marginal log-likelihood, we propose the TOP kernel derived from Tangent vectors Of Posterior log-odds. Furthermore, we develop a theoretical framework on feature extractors from probabilistic models and use it for analyzing the TOP kernel. In experiments our new discriminative TOP kernel compares favorably to the Fisher kernel.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Incorporating Invariances in Non-Linear Support Vector Machines

Chapelle, O., Schölkopf, B.

In Advances in Neural Information Processing Systems 14, pages: 609-616, (Editors: TG Dietterich and S Becker and Z Ghahramani), MIT Press, Cambridge, MA, USA, 15th Annual Neural Information Processing Systems Conference (NIPS), September 2002 (inproceedings)

Abstract
The choice of an SVM kernel corresponds to the choice of a representation of the data in a feature space and, to improve performance, it should therefore incorporate prior knowledge such as known transformation invariances. We propose a technique which extends earlier work and aims at incorporating invariances in nonlinear kernels. We show on a digit recognition task that the proposed approach is superior to the Virtual Support Vector method, which previously had been the method of choice.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Kernel feature spaces and nonlinear blind source separation

Harmeling, S., Ziehe, A., Kawanabe, M., Müller, K.

In Advances in Neural Information Processing Systems 14, pages: 761-768, (Editors: Dietterich, T. G., S. Becker, Z. Ghahramani), MIT Press, Cambridge, MA, USA, Fifteenth Annual Neural Information Processing Systems Conference (NIPS), September 2002 (inproceedings)

Abstract
In kernel based learning the data is mapped to a kernel feature space of a dimension that corresponds to the number of training data points. In practice, however, the data forms a smaller submanifold in feature space, a fact that has been used e.g. by reduced set techniques for SVMs. We propose a new mathematical construction that permits to adapt to the intrinsic dimension and to find an orthonormal basis of this submanifold. In doing so, computations get much simpler and more important our theoretical framework allows to derive elegant kernelized blind source separation (BSS) algorithms for arbitrary invertible nonlinear mixings. Experiments demonstrate the good performance and high computational efficiency of our kTDSEP algorithm for the problem of nonlinear BSS.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Algorithms for Learning Function Distinguishable Regular Languages

Fernau, H., Radl, A.

In Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition, pages: 64-73, (Editors: Caelli, T. , A. Amin, R. P.W. Duin, M. Kamel, D. de Ridder), Springer, Berlin, Germany, Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition, August 2002 (inproceedings)

Abstract
Function distinguishable languages were introduced as a new methodology of defining characterizable subclasses of the regular languages which are learnable from text. Here, we give details on the implementation and the analysis of the corresponding learning algorithms. We also discuss problems which might occur in practical applications.

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Decision Boundary Pattern Selection for Support Vector Machines

Shin, H., Cho, S.

In Proc. of the Korean Data Mining Conference, pages: 33-41, Korean Data Mining Conference, May 2002 (inproceedings)

[BibTex]

[BibTex]


no image
k-NN based Pattern Selection for Support Vector Classifiers

Shin, H., Cho, S.

In Proc. of the Korean Industrial Engineers Conference, pages: 645-651, Korean Industrial Engineers Conference, May 2002 (inproceedings)

[BibTex]

[BibTex]


no image
Microarrays: How Many Do You Need?

Zien, A., Fluck, J., Zimmer, R., Lengauer, T.

In RECOMB 2002, pages: 321-330, ACM Press, New York, NY, USA, Sixth Annual International Conference on Research in Computational Molecular Biology, April 2002 (inproceedings)

Abstract
We estimate the number of microarrays that is required in order to gain reliable results from a common type of study: the pairwise comparison of different classes of samples. Current knowlegde seems to suffice for the construction of models that are realistic with respect to searches for individual differentially expressed genes. Such models allow to investigate the dependence of the required number of samples on the relevant parameters: the biological variability of the samples within each class; the fold changes in expression; the detection sensitivity of the microarrays; and the acceptable error rates of the results. We supply experimentalists with general conclusions as well as a freely accessible Java applet at http://cartan.gmd.de/~zien/classsize/ for fine tuning simulations to their particular actualities. Since the situation can be assumed to be very similar for large scale proteomics and metabolomics studies, our methods and results might also apply there.

Web DOI [BibTex]

Web DOI [BibTex]


no image
Pattern Selection for Support Vector Classifiers

Shin, H., Cho, S.

In Ideal 2002, pages: 97-103, (Editors: Yin, H. , N. Allinson, R. Freeman, J. Keane, S. Hubbard), Springer, Berlin, Germany, Third International Conference on Intelligent Data Engineering and Automated Learning, January 2002 (inproceedings)

Abstract
SVMs tend to take a very long time to train with a large data set. If "redundant" patterns are identified and deleted in pre-processing, the training time could be reduced significantly. We propose a k-nearest neighbors(k-NN) based pattern selection method. The method tries to select the patterns that are near the decision boundary and that are correctly labeled. The simulations over synthetic data sets showed promising results: (1) By converting a non-separable problem to a separable one, the search for an optimal error tolerance parameter became unnecessary. (2) SVM training time decreased by two orders of magnitude without any loss of accuracy. (3) The redundant SVs were substantially reduced.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
The leave-one-out kernel

Tsuda, K., Kawanabe, M.

In Artificial Neural Networks -- ICANN 2002, 2415, pages: 727-732, LNCS, (Editors: Dorronsoro, J. R.), Artificial Neural Networks -- ICANN, 2002 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Localized Rademacher Complexities

Bartlett, P., Bousquet, O., Mendelson, S.

In Proceedings of the 15th annual conference on Computational Learning Theory, pages: 44-58, Proceedings of the 15th annual conference on Computational Learning Theory, 2002 (inproceedings)

Abstract
We investigate the behaviour of global and local Rademacher averages. We present new error bounds which are based on the local averages and indicate how data-dependent local averages can be estimated without {it a priori} knowledge of the class at hand.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Film Cooling: A Comparative Study of Different Heaterfoil Configurations for Liquid Crystals Experiments

Vogel, G., Graf, ABA., Weigand, B.

In ASME TURBO EXPO 2002, Amsterdam, GT-2002-30552, ASME TURBO EXPO, Amsterdam, 2002 (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Some Local Measures of Complexity of Convex Hulls and Generalization Bounds

Bousquet, O., Koltchinskii, V., Panchenko, D.

In Proceedings of the 15th annual conference on Computational Learning Theory, Proceedings of the 15th annual conference on Computational Learning Theory, 2002 (inproceedings)

Abstract
We investigate measures of complexity of function classes based on continuity moduli of Gaussian and Rademacher processes. For Gaussian processes, we obtain bounds on the continuity modulus on the convex hull of a function class in terms of the same quantity for the class itself. We also obtain new bounds on generalization error in terms of localized Rademacher complexities. This allows us to prove new results about generalization performance for convex hulls in terms of characteristics of the base class. As a byproduct, we obtain a simple proof of some of the known bounds on the entropy of convex hulls.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
A kernel approach for learning from almost orthogonal patterns

Schölkopf, B., Weston, J., Eskin, E., Leslie, C., Noble, W.

In Principles of Data Mining and Knowledge Discovery, Lecture Notes in Computer Science, 2430/2431, pages: 511-528, Lecture Notes in Computer Science, (Editors: T Elomaa and H Mannila and H Toivonen), Springer, Berlin, Germany, 13th European Conference on Machine Learning (ECML) and 6th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'2002), 2002 (inproceedings)

PostScript DOI [BibTex]

PostScript DOI [BibTex]


no image
Infinite Mixtures of Gaussian Process Experts

Rasmussen, CE., Ghahramani, Z.

In (Editors: Dietterich, Thomas G.; Becker, Suzanna; Ghahramani, Zoubin), 2002 (inproceedings)

Abstract
We present an extension to the Mixture of Experts (ME) model, where the individual experts are Gaussian Process (GP) regression models. Using a input-dependent adaptation of the Dirichlet Process, we implement a gating network for an infinite number of Experts. Inference in this model may be done efficiently using a Markov Chain relying on Gibbs sampling. The model allows the effective covariance function to vary with the inputs, and may handle large datasets -- thus potentially overcoming two of the biggest hurdles with GP models. Simulations show the viability of this approach.

PDF PostScript [BibTex]

PDF PostScript [BibTex]


no image
Marginalized kernels for RNA sequence data analysis

Kin, T., Tsuda, K., Asai, K.

In Genome Informatics 2002, pages: 112-122, (Editors: Lathtop, R. H.; Nakai, K.; Miyano, S.; Takagi, T.; Kanehisa, M.), Genome Informatics, 2002, (Best Paper Award) (inproceedings)

Web [BibTex]

Web [BibTex]


no image
Luminance Artifacts on CRT Displays

Wichmann, F.

In IEEE Visualization, pages: 571-574, (Editors: Moorhead, R.; Gross, M.; Joy, K. I.), IEEE Visualization, 2002 (inproceedings)

Abstract
Most visualization panels today are still built around cathode-ray tubes (CRTs), certainly on personal desktops at work and at home. Whilst capable of producing pleasing images for common applications ranging from email writing to TV and DVD presentation, it is as well to note that there are a number of nonlinear transformations between input (voltage) and output (luminance) which distort the digital and/or analogue images send to a CRT. Some of them are input-independent and hence easy to fix, e.g. gamma correction, but others, such as pixel interactions, depend on the content of the input stimulus and are thus harder to compensate for. CRT-induced image distortions cause problems not only in basic vision research but also for applications where image fidelity is critical, most notably in medicine (digitization of X-ray images for diagnostic purposes) and in forms of online commerce, such as the online sale of images, where the image must be reproduced on some output device which will not have the same transfer function as the customer's CRT. I will present measurements from a number of CRTs and illustrate how some of their shortcomings may be problematic for the aforementioned applications.

[BibTex]

[BibTex]

1999


no image
Engineering Support Vector Machine Kernels That Recognize Translation Initiation Sites in DNA

Zien, A., Rätsch, G., Mika, S., Schölkopf, B., Lemmen, C., Smola, A., Lengauer, T., Müller, K.

In German Conference on Bioinformatics (GCB 1999), October 1999 (inproceedings)

Abstract
In order to extract protein sequences from nucleotide sequences, it is an important step to recognize points from which regions encoding pro­ teins start, the so­called translation initiation sites (TIS). This can be modeled as a classification prob­ lem. We demonstrate the power of support vector machines (SVMs) for this task, and show how to suc­ cessfully incorporate biological prior knowledge by engineering an appropriate kernel function.

Web [BibTex]

1999

Web [BibTex]