Header logo is ei


2009


no image
Efficient Graphlet Kernels for Large Graph Comparison

Shervashidze, N., Vishwanathan, S., Petri, T., Mehlhorn, K., Borgwardt, K.

In JMLR Workshop and Conference Proceedings Volume 5: AISTATS 2009, pages: 488-495, (Editors: Van Dyk, D. , M. Welling), MIT Press, Cambridge, MA, USA, Twelfth International Conference on Artificial Intelligence and Statistics, April 2009 (inproceedings)

Abstract
State-of-the-art graph kernels do not scale to large graphs with hundreds of nodes and thousands of edges. In this article we propose to compare graphs by counting {it graphlets}, ie subgraphs with $k$ nodes where $k in { 3, 4, 5 }$. Exhaustive enumeration of all graphlets being prohibitively expensive, we introduce two theoretically grounded speedup schemes, one based on sampling and the second one specifically designed for bounded degree graphs. In our experimental evaluation, our novel kernels allow us to efficiently compare large graphs that cannot be tackled by existing graph kernels.

PDF Web [BibTex]

2009

PDF Web [BibTex]


no image
Online blind deconvolution for astronomical imaging

Harmeling, S., Hirsch, M., Sra, S., Schölkopf, B.

In Proceedings of the First IEEE International Conference Computational Photography (ICCP 2009), pages: 1-7, IEEE, Piscataway, NJ, USA, First IEEE International Conference on Computational Photography (ICCP), April 2009 (inproceedings)

Abstract
Atmospheric turbulences blur astronomical images taken by earth-based telescopes. Taking many short-time exposures in such a situation provides noisy images of the same object, where each noisy image has a different blur. Commonly astronomers apply a technique called “Lucky Imaging” that selects a few of the recorded frames that fulfill certain criteria, such as reaching a certain peak intensity (“Strehl ratio”). The selected frames are then averaged to obtain a better image. In this paper we introduce and analyze a new method that exploits all the frames and generates an improved image in an online fashion. Our initial experiments with controlled artificial data and real-world astronomical datasets yields promising results.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
A kernel method for unsupervised structured network inference

Lippert, C., Stegle, O., Ghahramani, Z., Borgwardt, KM.

In JMLR Workshop and Conference Proceedings Volume 5: AISTATS 2009, pages: 368-375, (Editors: Van Dyk, D. , M. Welling), MIT Press, Cambridge, MA, USA, Twelfth International Conference on Artificial Intelligence and Statistics, April 2009 (inproceedings)

Abstract
Network inference is the problem of inferring edges between a set of real-world objects, for instance, interactions between pairs of proteins in bioinformatics. Current kernel-based approaches to this problem share a set of common features: (i) they are supervised and hence require labeled training data; (ii) edges in the network are treated as mutually independent and hence topological properties are largely ignored; (iii) they lack a statistical interpretation. We argue that these common assumptions are often undesirable for network inference, and propose (i) an unsupervised kernel method (ii) that takes the global structure of the network into account and (iii) is statistically motivated. We show that our approach can explain commonly used heuristics in statistical terms. In experiments on social networks, different variants of our method demonstrate appealing predictive performance.

PDF Web [BibTex]

PDF Web [BibTex]


no image
PAC-Bayesian Generalization Bound for Density Estimation with Application to Co-clustering

Seldin, Y., Tishby, N.

In JMLR Workshop and Conference Proceedings Volume 5: AISTATS 2009, pages: 472-479, MIT Press, Cambridge, MA, USA, 12th International Conference on Artificial Intelligence and Statistics, April 2009 (inproceedings)

Abstract
We derive a PAC-Bayesian generalization bound for density estimation. Similar to the PAC-Bayesian generalization bound for classification, the result has the appealingly simple form of a tradeoff between empirical performance and the KL-divergence of the posterior from the prior. Moreover, the PAC-Bayesian generalization bound for classification can be derived as a special case of the bound for density estimation. To illustrate a possible application of our bound we derive a generalization bound for co-clustering. The bound provides a criterion to evaluate the ability of co-clustering to predict new co-occurrences, thus introducing a supervised flavor to this traditionally unsupervised task.

PDF Web [BibTex]

PDF Web [BibTex]


no image
ICA with Sparse Connections: Revisited

Zhang, K., Peng, H., Chan, L., Hyvärinen, A.

In Independent Component Analysis and Signal Separation, pages: 195-202, (Editors: Adali, T. , Christian Jutten, J.M. Travassos Romano, A. Kardec Barros), Springer, Berlin, Germany, 8th International Conference on Independent Component Analysis and Signal Separation (ICA), March 2009 (inproceedings)

Abstract
When applying independent component analysis (ICA), sometimes we expect the connections between the observed mixtures and the recovered independent components (or the original sources) to be sparse, to make the interpretation easier or to reduce the random effect in the results. In this paper we propose two methods to tackle this problem. One is based on adaptive Lasso, which exploits the L 1 penalty with data-adaptive weights. We show the relationship between this method and the classic information criteria such as BIC and AIC. The other is based on optimal brain surgeon, and we show how its stopping criterion is related to the information criteria. This method produces the solution path of the transformation matrix, with different number of zero entries. These methods involve low computational loads. Moreover, in each method, the parameter controlling the sparsity level of the transformation matrix has clear interpretations. By setting such parameters to certain values, the results of the proposed methods are consistent with those produced by classic information criteria.

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Tech-note: Iterative design and test of a multimodal experience

Reckter, H., Geiger, C., Singer, J., Streuber, S.

In Proceedings of the IEEE Symposium on 3D User Interfaces (3DUI 2009), pages: 99-102, (Editors: Kiyokawa, K. , S. Coquillart, R. Balakrishnan), IEEE Service Center, Piscataway, NJ, USA, IEEE Symposium on 3D User Interfaces (3DUI), March 2009 (inproceedings)

Abstract
The goal of the Turtle surf project described in this tech-note is to design, implement and evaluate a multimodal installation that should provide a good user experience in a virtual 3D world. For this purpose we combine audio-visual media forms and different types of haptic/tactile feedback. For the latter, we focus on the application of vibrational feedback, wind and water spray and heat. We follow a user-centered design approach and try to get user feedback as early as possible during the iterative design process. We present the conceptual idea of the Turtle surf project, and the iterative design and test of prototypes that helped us to refine the final design based on collected user feedback.

Web DOI [BibTex]

Web DOI [BibTex]


no image
Efficient Bregman Range Search

Cayton, L.

In Advances in Neural Information Processing Systems 22, pages: 243-251, (Editors: Bengio, Y. , D. Schuurmans, J. Lafferty, C. Williams, A. Culotta), Curran, Red Hook, NY, USA, 23rd Annual Conference on Neural Information Processing Systems (NIPS), 2009 (inproceedings)

Abstract
We develop an algorithm for efficient range search when the notion of dissimilarity is given by a Bregman divergence. The range search task is to return all points in a potentially large database that are within some specified distance of a query. It arises in many learning algorithms such as locally-weighted regression, kernel density estimation, neighborhood graph-based algorithms, and in tasks like outlier detection and information retrieval. In metric spaces, efficient range search-like algorithms based on spatial data structures have been deployed on a variety of statistical tasks. Here we describe an algorithm for range search for an arbitrary Bregman divergence. This broad class of dissimilarity measures includes the relative entropy, Mahalanobis distance, Itakura-Saito divergence, and a variety of matrix divergences. Metric methods cannot be directly applied since Bregman divergences do not in general satisfy the triangle inequality. We derive geometric properties of Bregman divergences that yield an efficient algorithm for range search based on a recently proposed space decomposition for Bregman divergences.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions

Sriperumbudur, B., Fukumizu, K., Gretton, A., Lanckriet, G., Schölkopf, B.

In Advances in Neural Information Processing Systems 22, pages: 1750-1758, (Editors: Y Bengio and D Schuurmans and J Lafferty and C Williams and A Culotta), Curran, Red Hook, NY, USA, 23rd Annual Conference on Neural Information Processing Systems (NIPS), 2009 (inproceedings)

Abstract
Embeddings of probability measures into reproducing kernel Hilbert spaces have been proposed as a straightforward and practical means of representing and comparing probabilities. In particular, the distance between embeddings (the maximum mean discrepancy, or MMD) has several key advantages over many classical metrics on distributions, namely easy computability, fast convergence and low bias of finite sample estimates. An important requirement of the embedding RKHS is that it be characteristic: in this case, the MMD between two distributions is zero if and only if the distributions coincide. Three new results on the MMD are introduced in the present study. First, it is established that MMD corresponds to the optimal risk of a kernel classifier, thus forming a natural link between the distance between distributions and their ease of classification. An important consequence is that a kernel must be characteristic to guarantee classifiability between distributions in the RKHS. Second, the class of characteristic kernels is broadened to incorporate all strictly positive definite kernels: these include non-translation invariant kernels and kernels on non-compact domains. Third, a generalization of the MMD is proposed for families of kernels, as the supremum over MMDs on a class of kernels (for instance the Gaussian kernels with different bandwidths). This extension is necessary to obtain a single distance measure if a large selection or class of characteristic kernels is potentially appropriate. This generalization is reasonable, given that it corresponds to the problem of learning the kernel by minimizing the risk of the corresponding kernel classifier. The generalized MMD is shown to have consistent finite sample estimates, and its performance is demonstrated on a homogeneity testing example.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Nonlinear directed acyclic structure learning with weakly additive noise models

Tillman, R., Gretton, A., Spirtes, P.

In Advances in Neural Information Processing Systems 22, pages: 1847-1855, (Editors: Bengio, Y. , D. Schuurmans, J. Lafferty, C. Williams, A. Culotta), Curran, Red Hook, NY, USA, 23rd Annual Conference on Neural Information Processing Systems (NIPS), 2009 (inproceedings)

Abstract
The recently proposed emph{additive noise model} has advantages over previous structure learning algorithms, when attempting to recover some true data generating mechanism, since it (i) does not assume linearity or Gaussianity and (ii) can recover a unique DAG rather than an equivalence class. However, its original extension to the multivariate case required enumerating all possible DAGs, and for some special distributions, e.g. linear Gaussian, the model is invertible and thus cannot be used for structure learning. We present a new approach which combines a PC style search using recent advances in kernel measures of conditional dependence with local searches for additive noise models in substructures of the equivalence class. This results in a more computationally efficient approach that is useful for arbitrary distributions even when additive noise models are invertible. Experiments with synthetic and real data show that this method is more accurate than previous methods when data are nonlinear and/or non-Gaussian.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Graphical models for decoding in BCI visual speller systems

Martens, S., Farquhar, J., Hill, J., Schölkopf, B.

In pages: 470-473, IEEE, 4th International IEEE EMBS Conference on Neural Engineering (NER), 2009 (inproceedings)

DOI [BibTex]

DOI [BibTex]


no image
A Fast, Consistent Kernel Two-Sample Test

Gretton, A., Fukumizu, K., Harchaoui, Z., Sriperumbudur, B.

In Advances in Neural Information Processing Systems 22, pages: 673-681, (Editors: Bengio, Y. , D. Schuurmans, J. Lafferty, C. Williams, A. Culotta), Curran, Red Hook, NY, USA, 23rd Annual Conference on Neural Information Processing Systems (NIPS), 2009 (inproceedings)

Abstract
A kernel embedding of probability distributions into reproducing kernel Hilbert spaces (RKHS) has recently been proposed, which allows the comparison of two probability measures P and Q based on the distance between their respective embeddings: for a sufficiently rich RKHS, this distance is zero if and only if P and Q coincide. In using this distance as a statistic for a test of whether two samples are from different distributions, a major difficulty arises in computing the significance threshold, since the empirical statistic has as its null distribution (where P = Q) an infinite weighted sum of x2 random variables. Prior finite sample approximations to the null distribution include using bootstrap resampling, which yields a consistent estimate but is computationally costly; and fitting a parametric model with the low order moments of the test statistic, which can work well in practice but has no consistency or accuracy guarantees. The main result of the present work is a novel estimate of the null distribution, computed from the eigenspectrum of the Gram matrix on the aggregate sample from P and Q, and having lower computational cost than the bootstrap. A proof of consistency of this estimate is provided. The performance of the null distribution estimate is compared with the bootstrap and parametric approaches on an artificial example, high dimensional multivariate data, and text.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Augmenting Feature-driven fMRI Analyses: Semi-supervised learning and resting state activity

Blaschko, M., Shelton, J., Bartels, A.

In Advances in Neural Information Processing Systems 22, pages: 126-134, (Editors: Bengio, Y. , D. Schuurmans, J. Lafferty, C. Williams, A. Culotta), Curran, Red Hook, NY, USA, 23rd Annual Conference on Neural Information Processing Systems (NIPS), 2009 (inproceedings)

Abstract
Resting state activity is brain activation that arises in the absence of any task, and is usually measured in awake subjects during prolonged fMRI scanning sessions where the only instruction given is to close the eyes and do nothing. It has been recognized in recent years that resting state activity is implicated in a wide variety of brain function. While certain networks of brain areas have different levels of activation at rest and during a task, there is nevertheless significant similarity between activations in the two cases. This suggests that recordings of resting state activity can be used as a source of unlabeled data to augment discriminative regression techniques in a semi-supervised setting. We evaluate this setting empirically yielding three main results: (i) regression tends to be improved by the use of Laplacian regularization even when no additional unlabeled data are available, (ii) resting state data seem to have a similar marginal distribution to that recorded during the execution of a visual processing task implying largely similar types of activation, and (iii) this source of information can be broadly exploited to improve the robustness of empirical inference in fMRI studies, an inherently data poor domain.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Fast subtree kernels on graphs

Shervashidze, N., Borgwardt, K.

In Advances in Neural Information Processing Systems 22, pages: 1660-1668, (Editors: Bengio, Y. , D. Schuurmans, J. Lafferty, C. Williams, A. Culotta), Curran, Red Hook, NY, USA, 23rd Annual Conference on Neural Information Processing Systems (NIPS), 2009 (inproceedings)

Abstract
In this article, we propose fast subtree kernels on graphs. On graphs with n nodes and m edges and maximum degree d, these kernels comparing subtrees of height h can be computed in O(mh), whereas the classic subtree kernel by Ramon & G{\"a}rtner scales as O(n24dh). Key to this efficiency is the observation that the Weisfeiler-Lehman test of isomorphism from graph theory elegantly computes a subtree kernel as a byproduct. Our fast subtree kernels can deal with labeled graphs, scale up easily to large graphs and outperform state-of-the-art graph kernels on several classification benchmark datasets in terms of accuracy and runtime.

PDF Web [BibTex]

PDF Web [BibTex]


On feature combination for multiclass object classification
On feature combination for multiclass object classification

Gehler, P., Nowozin, S.

In Proceedings of the Twelfth IEEE International Conference on Computer Vision, pages: 221-228, ICCV, 2009, oral presentation (inproceedings)

project page, code, data GoogleScholar pdf DOI [BibTex]

project page, code, data GoogleScholar pdf DOI [BibTex]

2004


no image
Attentional Modulation of Auditory Event-Related Potentials in a Brain-Computer Interface

Hill, J., Lal, T., Bierig, K., Birbaumer, N., Schölkopf, B.

In BioCAS04, (S3/5/INV- S3/17-20):4, IEEE Computer Society, Los Alamitos, CA, USA, 2004 IEEE International Workshop on Biomedical Circuits and Systems, December 2004 (inproceedings)

Abstract
Motivated by the particular problems involved in communicating with "locked-in" paralysed patients, we aim to develop a brain-computer interface that uses auditory stimuli. We describe a paradigm that allows a user to make a binary decision by focusing attention on one of two concurrent auditory stimulus sequences. Using Support Vector Machine classification and Recursive Channel Elimination on the independent components of averaged event-related potentials, we show that an untrained user‘s EEG data can be classified with an encouragingly high level of accuracy. This suggests that it is possible for users to modulate EEG signals in a single trial by the conscious direction of attention, well enough to be useful in BCI.

PDF Web DOI [BibTex]

2004

PDF Web DOI [BibTex]


no image
Discrete vs. Continuous: Two Sides of Machine Learning

Zhou, D.

October 2004 (talk)

Abstract
We consider the problem of transductive inference. In many real-world problems, unlabeled data is far easier to obtain than labeled data. Hence transductive inference is very significant in many practical problems. According to Vapnik's point of view, one should predict the function value only on the given points directly rather than a function defined on the whole space, the latter being a more complicated problem. Inspired by this idea, we develop discrete calculus on finite discrete spaces, and then build discrete regularization. A family of transductive algorithms is naturally derived from this regularization framework. We validate the algorithms on both synthetic and real-world data from text/web categorization to bioinformatics problems. A significant by-product of this work is a powerful way of ranking data based on examples including images, documents, proteins and many other kinds of data. This talk is mainly based on the followiing contribution: (1) D. Zhou and B. Sch{\"o}lkopf: Transductive Inference with Graphs, MPI Technical report, August, 2004; (2) D. Zhou, B. Sch{\"o}lkopf and T. Hofmann. Semi-supervised Learning on Directed Graphs. NIPS 2004; (3) D. Zhou, O. Bousquet, T.N. Lal, J. Weston and B. Sch{\"o}lkopf. Learning with Local and Global Consistency. NIPS 2003.

PDF [BibTex]


no image
Using kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method

Honkela, A., Harmeling, S., Lundqvist, L., Valpola, H.

In ICA 2004, pages: 790-797, (Editors: Puntonet, C. G., A. Prieto), Springer, Berlin, Germany, Fifth International Conference on Independent Component Analysis and Blind Signal Separation, October 2004 (inproceedings)

Abstract
The variational Bayesian nonlinear blind source separation method introduced by Lappalainen and Honkela in 2000 is initialised with linear principal component analysis (PCA). Because of the multilayer perceptron (MLP) network used to model the nonlinearity, the method is susceptible to local minima and therefore sensitive to the initialisation used. As the method is used for nonlinear separation, the linear initialisation may in some cases lead it astray. In this paper we study the use of kernel PCA (KPCA) in the initialisation. KPCA is a rather straightforward generalisation of linear PCA and it is much faster to compute than the variational Bayesian method. The experiments show that it can produce significantly better initialisations than linear PCA. Additionally, the model comparison methods provided by the variational Bayesian framework can be easily applied to compare different kernels.

DOI [BibTex]

DOI [BibTex]


no image
Robust ICA for Super-Gaussian Sources

Meinecke, F., Harmeling, S., Müller, K.

In ICA 2004, pages: 217-224, (Editors: Puntonet, C. G., A. Prieto), Springer, Berlin, Germany, Fifth International Conference on Independent Component Analysis and Blind Signal Separation, October 2004 (inproceedings)

Abstract
Most ICA algorithms are sensitive to outliers. Instead of robustifying existing algorithms by outlier rejection techniques, we show how a simple outlier index can be used directly to solve the ICA problem for super-Gaussian source signals. This ICA method is outlier-robust by construction and can be used for standard ICA as well as for over-complete ICA (i.e. more source signals than observed signals (mixtures)).

DOI [BibTex]

DOI [BibTex]


no image
Modelling Spikes with Mixtures of Factor Analysers

Görür, D., Rasmussen, C., Tolias, A., Sinz, F., Logothetis, N.

In Pattern Recognition, pages: 391-398, LNCS 3175, (Editors: Rasmussen, C. E. , H.H. Bülthoff, B. Schölkopf, M.A. Giese), Springer, Berlin, Germany, 26th DAGM Symposium, September 2004 (inproceedings)

Abstract
Identifying the action potentials of individual neurons from extracellular recordings, known as spike sorting, is a challenging problem. We consider the spike sorting problem using a generative model,mixtures of factor analysers, which concurrently performs clustering and feature extraction. The most important advantage of this method is that it quantifies the certainty with which the spikes are classified. This can be used as a means for evaluating the quality of clustering and therefore spike isolation. Using this method, nearly simultaneously occurring spikes can also be modelled which is a hard task for many of the spike sorting methods. Furthermore, modelling the data with a generative model allows us to generate simulated data.

PDF PDF DOI [BibTex]

PDF PDF DOI [BibTex]


no image
Learning Depth From Stereo

Sinz, F., Candela, J., BakIr, G., Rasmussen, C., Franz, M.

In 26th DAGM Symposium, pages: 245-252, LNCS 3175, (Editors: Rasmussen, C. E., H. H. Bülthoff, B. Schölkopf, M. A. Giese), Springer, Berlin, Germany, 26th DAGM Symposium, September 2004 (inproceedings)

Abstract
We compare two approaches to the problem of estimating the depth of a point in space from observing its image position in two different cameras: 1.~The classical photogrammetric approach explicitly models the two cameras and estimates their intrinsic and extrinsic parameters using a tedious calibration procedure; 2.~A generic machine learning approach where the mapping from image to spatial coordinates is directly approximated by a Gaussian Process regression. Our results show that the generic learning approach, in addition to simplifying the procedure of calibration, can lead to higher depth accuracies than classical calibration although no specific domain knowledge is used.

PDF PostScript Web [BibTex]

PDF PostScript Web [BibTex]


no image
Grundlagen von Support Vector Maschinen und Anwendungen in der Bildverarbeitung

Eichhorn, J.

September 2004 (talk)

Abstract
Invited talk at the workshop "Numerical, Statistical and Discrete Methods in Image Processing" at the TU M{\"u}nchen (in GERMAN)

PDF [BibTex]


no image
Stability of Hausdorff-based Distance Measures

Shapiro, MD., Blaschko, MB.

In VIIP, pages: 1-6, VIIP, September 2004 (inproceedings)

[BibTex]

[BibTex]


no image
Kernel Methods in Computational Biology

Schölkopf, B., Tsuda, K., Vert, J.

pages: 410, Computational Molecular Biology, MIT Press, Cambridge, MA, USA, August 2004 (book)

Abstract
Modern machine learning techniques are proving to be extremely valuable for the analysis of data in computational biology problems. One branch of machine learning, kernel methods, lends itself particularly well to the difficult aspects of biological data, which include high dimensionality (as in microarray measurements), representation as discrete and structured data (as in DNA or amino acid sequences), and the need to combine heterogeneous sources of information. This book provides a detailed overview of current research in kernel methods and their applications to computational biology. Following three introductory chapters—an introduction to molecular and computational biology, a short review of kernel methods that focuses on intuitive concepts rather than technical details, and a detailed survey of recent applications of kernel methods in computational biology—the book is divided into three sections that reflect three general trends in current research. The first part presents different ideas for the design of kernel functions specifically adapted to various biological data; the second part covers different approaches to learning from heterogeneous data; and the third part offers examples of successful applications of support vector machine methods.

Web [BibTex]

Web [BibTex]


no image
The benefit of liquid Helium cooling for Cryo-Electron Tomography: A quantitative comparative study

Schweikert, G., Luecken, U., Pfeifer, G., Baumeister, W., Plitzko, J.

The thirteenth European Microscopy Congress, August 2004 (talk)

[BibTex]

[BibTex]


no image
Learning to Find Graph Pre-Images

BakIr, G., Zien, A., Tsuda, K.

In Pattern Recognition, pages: 253-261, (Editors: Rasmussen, C. E., H. H. Bülthoff, B. Schölkopf, M. A. Giese), Springer, Berlin, Germany, 26th DAGM Symposium, August 2004 (inproceedings)

Abstract
The recent development of graph kernel functions has made it possible to apply well-established machine learning methods to graphs. However, to allow for analyses that yield a graph as a result, it is necessary to solve the so-called pre-image problem: to reconstruct a graph from its feature space representation induced by the kernel. Here, we suggest a practical solution to this problem.

PostScript PDF DOI [BibTex]

PostScript PDF DOI [BibTex]


no image
Gaussian Process Classification for Segmenting and Annotating Sequences

Altun, Y., Hofmann, T., Smola, A.

In Proceedings of the 21st International Conference on Machine Learning (ICML 2004), pages: 25-32, (Editors: Greiner, R. , D. Schuurmans), ACM Press, New York, USA, 21st International Conference on Machine Learning (ICML), July 2004 (inproceedings)

Abstract
Many real-world classification tasks involve the prediction of multiple, inter-dependent class labels. A prototypical case of this sort deals with prediction of a sequence of labels for a sequence of observations. Such problems arise naturally in the context of annotating and segmenting observation sequences. This paper generalizes Gaussian Process classification to predict multiple labels by taking dependencies between neighboring labels into account. Our approach is motivated by the desire to retain rigorous probabilistic semantics, while overcoming limitations of parametric methods like Conditional Random Fields, which exhibit conceptual and computational difficulties in high-dimensional input spaces. Experiments on named entity recognition and pitch accent prediction tasks demonstrate the competitiveness of our approach.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Learning with Non-Positive Kernels

Ong, CS., Mary, X., Canu, S., Smola, AJ.

In ICML 2004, pages: 81-81, ACM Press, New York, NY, USA, Twenty-First International Conference on Machine Learning, July 2004 (inproceedings)

Abstract
n this paper we show that many kernel methods can be adapted to deal with indefinite kernels, that is, kernels which are not positive semidefinite. They do not satisfy Mercer‘s condition and they induce associated functional spaces called Reproducing Kernel Kre&icaron;n Spaces (RKKS), a generalization of Reproducing Kernel Hilbert Spaces (RKHS).Machine learning in RKKS shares many "nice" properties of learning in RKHS, such as orthogonality and projection. However, since the kernels are indefinite, we can no longer minimize the loss, instead we stabilize it. We show a general representer theorem for constrained stabilization and prove generalization bounds by computing the Rademacher averages of the kernel class. We list several examples of indefinite kernels and investigate regularization methods to solve spline interpolation. Some preliminary experiments with indefinite kernels for spline smoothing are reported for truncated spectral factorization, Landweber-Fridman iterations, and MR-II.

PDF Web DOI [BibTex]

PDF Web DOI [BibTex]


no image
Exponential Families for Conditional Random Fields

Altun, Y., Smola, A., Hofmann, T.

In Proceedings of the 20th Annual Conference on Uncertainty in Artificial Intelligence (UAI 2004), pages: 2-9, (Editors: Chickering, D.M. , J.Y. Halpern), Morgan Kaufmann, San Francisco, CA, USA, 20th Annual Conference on Uncertainty in Artificial Intelligence (UAI), July 2004 (inproceedings)

Abstract
In this paper we define conditional random fields in reproducing kernel Hilbert spaces and show connections to Gaussian Process classification. More specifically, we prove decomposition results for undirected graphical models and we give constructions for kernels. Finally we present efficient means of solving the optimization problem using reduced rank decompositions and we show how stationarity can be exploited efficiently in the optimization process.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Using Conditional Random Fields to Predict Pitch Accent in Conversational Speech

Gregory, M., Altun, Y.

In pages: 677-684, (Editors: Scott, D. , W. Daelemans, M. Walker), ACL, East Stroudsburg, PA, USA, 42nd Annual Meeting of the Association for Computational Linguistics (ACL), July 2004 (inproceedings)

Abstract
The detection of prosodic characteristics is an important aspect of both speech synthesis and speech recognition. Correct placement of pitch accents aids in more natural sounding speech, while automatic detection of accents can contribute to better wordlevel recognition and better textual understanding. In this paper we investigate probabilistic, contextual, and phonological factors that influence pitch accent placement in natural, conversational speech in a sequence labeling setting. We introduce Conditional Random Fields (CRFs) to pitch accent prediction task in order to incorporate these factors efficiently in a sequence model. We demonstrate the usefulness and the incremental effect of these factors in a sequence model by performing experiments on hand labeled data from the Switchboard Corpus. Our model outperforms the baseline and previous models of pitch accent prediction on the Switchboard Corpus.

Web [BibTex]

Web [BibTex]


no image
Support vector machine learning for interdependent and structured output spaces

Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.

In pages: 1-8, (Editors: Greiner, R. , D. Schuurmans), AAAI Press, Menlo Park, CA, USA, Twenty-first International Conference on Machine Learning (ICML), July 2004 (inproceedings)

Web DOI [BibTex]

Web DOI [BibTex]


no image
Riemannian Geometry on Graphs and its Application to Ranking and Classification

Zhou, D.

June 2004 (talk)

Abstract
We consider the problem of transductive inference. In many real-world problems, unlabeled data is far easier to obtain than labeled data. Hence transductive inference is very significant in many practical problems. According to Vapnik's point of view, one should predict the function value only on the given points directly rather than a function defined on the whole space, the latter being a more complicated problem. Inspired by this idea, we develop discrete calculus on finite discrete spaces, and then build discrete regularization. A family of transductive algorithms is naturally derived from this regularization framework. We validate the algorithms on both synthetic and real-world data from text/web categorization to bioinformatics problems. A significant by-product of this work is a powerful way of ranking data based on examples including images, documents, proteins and many other kinds of data.

PDF [BibTex]


no image
PAC-Bayesian Generic Chaining

Audibert, J., Bousquet, O.

In Advances in Neural Information Processing Systems 16, pages: 1125-1132 , (Editors: Thrun, S., L.K. Saul, B. Schölkopf), MIT Press, Cambridge, MA, USA, Seventeenth Annual Conference on Neural Information Processing Systems (NIPS), June 2004 (inproceedings)

Abstract
There exist many different generalization error bounds for classification. Each of these bounds contains an improvement over the others for certain situations. Our goal is to combine these different improvements into a single bound. In particular we combine the PAC-Bayes approach introduced by McAllester, which is interesting for averaging classifiers, with the optimal union bound provided by the generic chaining technique developed by Fernique and Talagrand. This combination is quite natural since the generic chaining is based on the notion of majorizing measures, which can be considered as priors on the set of classifiers, and such priors also arise in the PAC-bayesian setting.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Prediction on Spike Data Using Kernel Algorithms

Eichhorn, J., Tolias, A., Zien, A., Kuss, M., Rasmussen, C., Weston, J., Logothetis, N., Schölkopf, B.

In Advances in Neural Information Processing Systems 16, pages: 1367-1374, (Editors: S Thrun and LK Saul and B Schölkopf), MIT Press, Cambridge, MA, USA, 17th Annual Conference on Neural Information Processing Systems (NIPS), June 2004 (inproceedings)

Abstract
We report and compare the performance of different learning algorithms based on data from cortical recordings. The task is to predict the orientation of visual stimuli from the activity of a population of simultaneously recorded neurons. We compare several ways of improving the coding of the input (i.e., the spike data) as well as of the output (i.e., the orientation), and report the results obtained using different kernel algorithms.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Warped Gaussian Processes

Snelson, E., Rasmussen, CE., Ghahramani, Z.

In Advances in Neural Information Processing Systems 16, pages: 337-344, (Editors: Thrun, S., L.K. Saul, B. Schölkopf), MIT Press, Cambridge, MA, USA, Seventeenth Annual Conference on Neural Information Processing Systems (NIPS), June 2004 (inproceedings)

Abstract
We generalise the Gaussian process (GP) framework for regression by learning a nonlinear transformation of the GP outputs. This allows for non-Gaussian processes and non-Gaussian noise. The learning algorithm chooses a nonlinear transformation such that transformed data is well-modelled by a GP. This can be seen as including a preprocessing transformation as an integral part of the probabilistic modelling problem, rather than as an ad-hoc step. We demonstrate on several real regression problems that learning the transformation can lead to significantly better performance than using a regular GP, or a GP with a fixed transformation.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Ranking on Data Manifolds

Zhou, D., Weston, J., Gretton, A., Bousquet, O., Schölkopf, B.

In Advances in neural information processing systems 16, pages: 169-176, (Editors: S Thrun and L Saul and B Schölkopf), MIT Press, Cambridge, MA, USA, 17th Annual Conference on Neural Information Processing Systems (NIPS), June 2004 (inproceedings)

Abstract
The Google search engine has enjoyed a huge success with its web page ranking algorithm, which exploits global, rather than local, hyperlink structure of the web using random walks. Here we propose a simple universal ranking algorithm for data lying in the Euclidean space, such as text or image data. The core idea of our method is to rank the data with respect to the intrinsic manifold structure collectively revealed by a great amount of data. Encouraging experimental results from synthetic, image, and text data illustrate the validity of our method.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Gaussian Processes in Reinforcement Learning

Rasmussen, C., Kuss, M.

In Advances in Neural Information Processing Systems 16, pages: 751-759, (Editors: Thrun, S., L. K. Saul, B. Schölkopf), MIT Press, Cambridge, MA, USA, Seventeenth Annual Conference on Neural Information Processing Systems (NIPS), June 2004 (inproceedings)

Abstract
We exploit some useful properties of Gaussian process (GP) regression models for reinforcement learning in continuous state spaces and discrete time. We demonstrate how the GP model allows evaluation of the value function in closed form. The resulting policy iteration algorithm is demonstrated on a simple problem with a two dimensional state space. Further, we speculate that the intrinsic ability of GP models to characterise distributions of functions would allow the method to capture entire distributions over future values instead of merely their expectation, which has traditionally been the focus of much of reinforcement learning.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Learning with Local and Global Consistency

Zhou, D., Bousquet, O., Lal, T., Weston, J., Schölkopf, B.

In Advances in Neural Information Processing Systems 16, pages: 321-328, (Editors: S Thrun and LK Saul and B Schölkopf), MIT Press, Cambridge, MA, USA, 17th Annual Conference on Neural Information Processing Systems (NIPS), June 2004 (inproceedings)

Abstract
We consider the general problem of learning from labeled and unlabeled data, which is often called semi-supervised learning or transductive inference. A principled approach to semi-supervised learning is to design a classifying function which is sufficiently smooth with respect to the intrinsic structure collectively revealed by known labeled and unlabeled points. We present a simple algorithm to obtain such a smooth solution. Our method yields encouraging experimental results on a number of classification problems and demonstrates effective use of unlabeled data.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Learning to Find Pre-Images

Bakir, G., Weston, J., Schölkopf, B.

In Advances in Neural Information Processing Systems 16, pages: 449-456, (Editors: S Thrun and LK Saul and B Schölkopf), MIT Press, Cambridge, MA, USA, 17th Annual Conference on Neural Information Processing Systems (NIPS), June 2004 (inproceedings)

Abstract
We consider the problem of reconstructing patterns from a feature map. Learning algorithms using kernels to operate in a reproducing kernel Hilbert space (RKHS) express their solutions in terms of input points mapped into the RKHS. We introduce a technique based on kernel principal component analysis and regression to reconstruct corresponding patterns in the input space (aka pre-images) and review its performance in several applications requiring the construction of pre-images. The introduced technique avoids difficult and/or unstable numerical optimization, is easy to implement and, unlike previous methods, permits the computation of pre-images in discrete input spaces.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Measure Based Regularization

Bousquet, O., Chapelle, O., Hein, M.

In Advances in Neural Information Processing Systems 16, pages: 1221-1228, (Editors: Thrun, S., L. Saul, B. Schölkopf), MIT Press, Cambridge, MA, USA, Seventeenth Annual Conference on Neural Information Processing Systems (NIPS), June 2004 (inproceedings)

Abstract
We address in this paper the question of how the knowledge of the marginal distribution $P(x)$ can be incorporated in a learning algorithm. We suggest three theoretical methods for taking into account this distribution for regularization and provide links to existing graph-based semi-supervised learning algorithms. We also propose practical implementations.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Insights from Machine Learning Applied to Human Visual Classification

Graf, A., Wichmann, F.

In Advances in Neural Information Processing Systems 16, pages: 905-912, (Editors: Thrun, S., L. Saul, B. Schölkopf), MIT Press, Cambridge, MA, USA, Seventeenth Annual Conference on Neural Information Processing Systems (NIPS), June 2004 (inproceedings)

Abstract
We attempt to understand visual classification in humans using both psychophysical and machine learning techniques. Frontal views of human faces were used for a gender classification task. Human subjects classified the faces and their gender judgment, reaction time and confidence rating were recorded. Several hyperplane learning algorithms were used on the same classification task using the Principal Components of the texture and flowfield representation of the faces. The classification performance of the learning algorithms was estimated using the face database with the true gender of the faces as labels, and also with the gender estimated by the subjects. We then correlated the human responses to the distance of the stimuli to the separating hyperplane of the learning algorithms. Our results suggest that human classification can be modeled by some hyperplane algorithms in the feature space we used. For classification, the brain needs more processing for stimuli close to that hyperplane than for those further away.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Image Construction by Linear Programming

Tsuda, K., Rätsch, G.

In Advances in Neural Information Processing Systems 16, pages: 57-64, (Editors: Thrun, S., L.K. Saul, B. Schölkopf), MIT Press, Cambridge, MA, USA, Seventeenth Annual Conference on Neural Information Processing Systems (NIPS), June 2004 (inproceedings)

Abstract
A common way of image denoising is to project a noisy image to the subspace of admissible images made for instance by PCA. However, a major drawback of this method is that all pixels are updated by the projection, even when only a few pixels are corrupted by noise or occlusion. We propose a new method to identify the noisy pixels by 1-norm penalization and update the identified pixels only. The identification and updating of noisy pixels are formulated as one linear program which can be solved efficiently. Especially, one can apply the ν-trick to directly specify the fraction of pixels to be reconstructed. Moreover, we extend the linear program to be able to exploit prior knowledge that occlusions often appear in contiguous blocks (e.g. sunglasses on faces). The basic idea is to penalize boundary points and interior points of the occluded area differently. We are able to show the ν-property also for this extended LP leading a method which is easy to use. Experimental results impressively demonstrate the power of our approach.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Semi-Supervised Protein Classification using Cluster Kernels

Weston, J., Leslie, C., Zhou, D., Elisseeff, A., Noble, W.

In Advances in Neural Information Processing Systems 16, pages: 595-602, (Editors: Thrun, S., L.K. Saul, B. Schölkopf), MIT Press, Cambridge, MA, USA, Seventeenth Annual Conference on Neural Information Processing Systems (NIPS), June 2004 (inproceedings)

Abstract
A key issue in supervised protein classification is the representation of input sequences of amino acids. Recent work using string kernels for protein data has achieved state-of-the-art classification performance. However, such representations are based only on labeled data --- examples with known 3D structures, organized into structural classes --- while in practice, unlabeled data is far more plentiful. In this work, we develop simple and scalable cluster kernel techniques for incorporating unlabeled data into the representation of protein sequences. We show that our methods greatly improve the classification performance of string kernels and outperform standard approaches for using unlabeled data, such as adding close homologs of the positive examples to the training data. We achieve equal or superior performance to previously presented cluster kernel methods while achieving far greater computational efficiency.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Kernel Hebbian Algorithm for single-frame super-resolution

Kim, K., Franz, M., Schölkopf, B.

In Computer Vision - ECCV 2004, LNCS vol. 3024, pages: 135-149, (Editors: A Leonardis and H Bischof), Springer, Berlin, Germany, 8th European Conference on Computer Vision (ECCV), May 2004 (inproceedings)

Abstract
This paper presents a method for single-frame image super-resolution using an unsupervised learning technique. The required prior knowledge about the high-resolution images is obtained from Kernel Principal Component Analysis (KPCA). The original form of KPCA, however, can be only applied to strongly restricted image classes due to the limited number of training examples that can be processed. We therefore propose a new iterative method for performing KPCA, the {em Kernel Hebbian Algorithm}. By kernelizing the Generalized Hebbian Algorithm, one can iteratively estimate the Kernel Principal Components with only linear order memory complexity. The resulting super-resolution algorithm shows a comparable performance to the existing supervised methods on images containing faces and natural scenes.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Pattern Selection for SVM based "Futures Trading System"

Sun, J., Cho, S., Shin, H.

In Proc. of the Korean Data Mining Conference, pages: 175-183, Korean Data Mining Society Conference, April 2004 (inproceedings)

[BibTex]

[BibTex]


no image
Minimum Sum-Squared Residue based clustering of Gene Expression Data

Cho, H., Guan, Y., Dhillon, I., Sra, S.

In SIAM Data Mining, pages: 00-00, SDM, April 2004 (inproceedings)

GZIP [BibTex]

GZIP [BibTex]


no image
Preservation of Neighborhood Relation under Input to Feature Space Mapping in SVM Training

Shin, H., Cho, S.

In Proc. of the 33rd International Conference on Computers and Industrial Engineering (C&IE 2004), pages: 1-10, the 33rd International Conference on Computers and Industrial Engineering (C&IE), April 2004, in CD (inproceedings)

[BibTex]

[BibTex]


no image
Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking

Zhou, D.

January 2004 (talk)

Abstract
We consider the general problem of learning from labeled and unlabeled data, which is often called semi-supervised learning or transductive inference. A principled approach to semi-supervised learning is to design a classifying function which is sufficiently smooth with respect to the intrinsic structure collectively revealed by known labeled and unlabeled points. We present a simple algorithm to obtain such a smooth solution. Our method yields encouraging experimental results on a number of classification problems and demonstrates effective use of unlabeled data.

PDF [BibTex]


no image
Introduction to Category Theory

Bousquet, O.

Internal Seminar, January 2004 (talk)

Abstract
A brief introduction to the general idea behind category theory with some basic definitions and examples. A perspective on higher dimensional categories is given.

PDF [BibTex]

PDF [BibTex]