Header logo is ei

Hilbert Space Representations of Probability Distributions




Many problems in unsupervised learning require the analysis of features of probability distributions. At the most fundamental level, we might wish to determine whether two distributions are the same, based on samples from each - this is known as the two-sample or homogeneity problem. We use kernel methods to address this problem, by mapping probability distributions to elements in a reproducing kernel Hilbert space (RKHS). Given a sufficiently rich RKHS, these representations are unique: thus comparing feature space representations allows us to compare distributions without ambiguity. Applications include testing whether cancer subtypes are distinguishable on the basis of DNA microarray data, and whether low frequency oscillations measured at an electrode in the cortex have a different distribution during a neural spike. A more difficult problem is to discover whether two random variables drawn from a joint distribution are independent. It turns out that any dependence between pairs of random variables can be encoded in a cross-covariance operator between appropriate RKHS representations of the variables, and we may test independence by looking at a norm of the operator. We demonstrate this independence test by establishing dependence between an English text and its French translation, as opposed to French text on the same topic but otherwise unrelated. Finally, we show that this operator norm is itself a difference in feature means.

Author(s): Gretton, A.
Year: 2007
Month: October
Day: 0

Department(s): Empirical Inference
Bibtex Type: Talk (talk)

Digital: 0
Event Name: 2nd Workshop on Machine Learning and Optimization at the ISM
Event Place: Tokyo, Japan
Language: en
Organization: Max-Planck-Gesellschaft
School: Biologische Kybernetik

Links: PDF


  title = {Hilbert Space Representations of Probability Distributions},
  author = {Gretton, A.},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  month = oct,
  year = {2007},
  month_numeric = {10}