Journal of Machine Learning Research, 13:723−773, March, 2012
We propose a framework for analyzing and comparing distributions, which we use to construct statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS), and is called the maximum mean discrepancy (MMD). We present two distribution-free tests based on large deviation bounds for the MMD, and a third test based on the asymptotic distribution of this statistic. The MMD can be computed in quadratic time, although efficient linear time approximations are available. Our statistic is an instance of an integral probability metric, and various classical metrics on distributions are obtained when alternative function classes are used in place of an RKHS. We apply our two-sample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests.
Grünewälder, S.Lever, G.Baldassarre, L.Patterson, S.Gretton, A.Pontil, M. (2012). Conditional mean embeddings as regressors In: Proceedings of the 29th International Conference on Machine Learning, 1823–1830, Omnipress, New York, NY, USA, ICML 2012
Journal of Machine Learning Research, 13:1393-1434, May, 2012
We introduce a framework of feature selection based on dependence maximization between the selected features and the labels of an estimation problem, using the Hilbert-Schmidt Independence Criterion. The key idea is that good features should be highly dependent on the labels. Our approach leads to a greedy procedure for feature selection. We show that
a number of existing feature selectors are special cases of this framework. Experiments on both artificial and real-world data show that our feature selector works well in practice.
Kernel canonical correlation analysis (KCCA) is a general technique for subspace learning that incorporates principal components analysis (PCA) and Fisher linear discriminant analysis (LDA) as special cases. By finding directions that maximize correlation, KCCA learns representations that are more closely tied to the underlying process that generates the data and can ignore high-variance noise directions. However, for data where acquisition in one or more modalities is expensive or otherwise limited, KCCA may suffer from small sample effects. We propose to use semi-supervised Laplacian regularization to utilize data that are present in only one modality. This approach is able to find highly correlated directions that also lie along the data manifold, resulting in a more robust estimate of correlated subspaces.
Functional magnetic resonance imaging (fMRI) acquired data are naturally amenable to subspace techniques as data are well aligned. fMRI data of the human brain are a particularly interesting candidate. In this study we implemented various supervised and semi-supervised versions of KCCA on human fMRI data, with regression to single and multi-variate labels (corresponding to video content subjects viewed during the image acquisition). In each variate condition, the semi-supervised variants of KCCA performed better than the supervised variants, including a supervised variant with Laplacian regularization. We additionally analyze the weights learned by the regression in order to infer brain regions that are important to different types of visual processing.
Fukumizu, K.Song, L.Gretton, A. (2011). Kernel Bayes’ Rule In: Advances in Neural Information Processing Systems 24, 1737-1745, Curran Associates, Inc., Red Hook, NY, USA, Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS 2011)
Song, L.Gretton, A.Bickson, D.Low, Y.Guestrin, C. (2011). Kernel Belief Propagation In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, Vol. 15, 707-715, JMLR, AISTATS 2011
Proceedings of the IEEE International Symposium on Information Theory (ISIT 2010), Institute of Electrical and Electronics Engineers:1428-1432, Biologische Kybernetik, Max-Planck-Gesellschaft, June, 2010
In this paper, we develop and analyze a nonparametric
method for estimating the class of integral probability
metrics (IPMs), examples of which include the Wasserstein distance,
Dudley metric, and maximum mean discrepancy (MMD).
We show that these distances can be estimated efficiently by
solving a linear program in the case of Wasserstein distance and
Dudley metric, while MMD is computable in a closed form. All
these estimators are shown to be strongly consistent and their
convergence rates are analyzed. Based on these results, we show
that IPMs are simple to estimate and the estimators exhibit good
convergence behavior compared to fi-divergence estimators.
Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems