My research is concerend with discrete optimization problems in machine learning. Most recently, I have been working on submodular functions and graph cuts.

I have also worked on theoretical aspects of clustering and density estimation, from the perspective of approximation as well as learning theory.

Below is a summary of projects, and links to code. I have also co-organized the NIPS workshop on Discrete Optimization in Machine Learning.

**Structured problems with submodular cost**

The generic setting here is as follows: take a combinatorial optimization problem, and replace the usual sum-of-weights cost by a nonlinear, submodular function. This makes the problem much harder but has nice applications. In particular, we worked on this setting with graph cuts, for which there are applications in inference and computer vision.

In addition, we derived online algorithms for structured decision problems (e.g., online spanning tree, or s-t cut) with submodular cost (in contrast, almost all previous results are for linear costs).

During our work, we came across several functions that are actually not submodular (but maybe some people hoped them to be), and thus I am collecting those functions in a bag of submodular non-examples.

**Clustering & Graph Cuts**

**Nearest Neighbor Clustering:** we developed a generic algorithm to minimize popular clustering objective functions, such as Normalized Cut or k-means. This method is consistent, thanks to pooling neighboring points. (with Ulrike von Luxburg, Sebastien Bubeck, Michael Kaufmann).

Download Demo Code

**Generalized Clustering via Kernel Embeddings**: The concept of clustering can be generalized to finding distributions that are maximally separated. The standard case is to measure separation by means of the distribution (k-means). MMD, the maximum mean discrepancy in a feature space, allows to also take higher-order moments into account. As a result, it is possible to e.g. separate two Gaussians with the same mean but different variance. Interestingly, maximizing MMD is closely related to kernel k-means and various other clustering criteria. (with Ulrike von Luxburg, Bernhard Schoelkopf, Arthur Gretton, Bharath Sriperumbudur)

**Approximation Algorithms for Tensor Clustering**: We prove an approximation factor for tensor clustering (i.e., "cutting a big cube into little cubes") for maybe the simplest possible algorithm. Various divergences are possible, such as Euclidean distance, Bregman divergences etc. (with Suvrit Sra, Arindam Banerjee)

**Solution Stability in Linear Programming Relaxations: Graph Partitioning and Unsupervised Learning. **Data is often noisy. If there are several good solutions to an optimization problem (here, graph partitioning), then the noise can be the determining part to make one solution optimal. However, how much do we want to trust this solution, if a small perturbation in the data suddenly makes another solution the best? this work proposes a method to test the stability of an optimal graph cut solution to perturbation of the edge weights (i.e., noise).We also show that several common clustering and graph partitioning objectives fall in our framework. (with Sebastian Nowozin)

**ICA**

**Fast Kernel ICA: **We developed a Newton-like method for kernel ICA that uses HSIC as the independence criterion (with Hao Shen, Arthur Gretton).

Download Code for Fast kernel ICA

27 results
(BibTeX)

**Reflection methods for user-friendly submodular optimization**
In *Advances in Neural Information Processing Systems 26*, pages: 1313-1321, (Editors: C.J.C. Burges and L. Bottou and M. Welling and Z. Ghahramani and K.Q. Weinberger), 27th Annual Conference on Neural Information Processing Systems (NIPS), 2013 (inproceedings)

**Combinatorial Problems with Submodular Coupling in Machine Learning and Computer Vision**
ETH Zürich, Switzerland, 2012 (phdthesis)

**Online submodular minimization for combinatorial structures**
In pages: 345-352, (Editors: Getoor, L. , T. Scheffer), International Machine Learning Society, Madison, WI, USA, 28th International Conference on Machine Learning (ICML), July 2011 (inproceedings)

**Cooperative Cuts: a new use of submodularity in image segmentation**
Second I.S.T. Austria Symposium on Computer Vision and Machine Learning, October 2011 (talk)

**Submodularity beyond submodular energies: coupling edges in graph cuts**
In pages: 1897-1904, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2011 (inproceedings)

**Cooperative Cuts**
COSA Workshop: Combinatorial Optimization, Statistics, and Applications, March 2011 (talk)

**Multi-label cooperative cuts**
In pages: 1-4, CVPR Workshop on Inference in Graphical Models with Structured Potentials, June 2011 (inproceedings)

**On Fast Approximate Submodular Minimization**
In *Advances in Neural Information Processing Systems 24*, pages: 460-468, (Editors: J Shawe-Taylor and RS Zemel and P Bartlett and F Pereira and KQ Weinberger), Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS), 2011 (inproceedings)

**Approximation Bounds for Inference using Cooperative Cut**
In pages: 577-584, (Editors: Getoor, L. , T. Scheffer), International Machine Learning Society, Madison, WI, USA, 28th International Conference on Machine Learning (ICML), July 2011 (inproceedings)

**Cooperative Cuts: Graph Cuts with Submodular Edge Weights**
(189), Max Planck Institute for Biological Cybernetics, Tuebingen, Germany, March 2010 (techreport)

**Cooperative Cuts: Graph Cuts with Submodular Edge Weights**
24th European Conference on Operational Research (EURO XXIV), July 2010 (talk)

**Cooperative Cuts for Image Segmentation**
(UWEETR-1020-0003), University of Washington, Washington DC, USA, August 2010 (techreport)

**Online algorithms for submodular minimization with combinatorial constraints**
In pages: 1-6, NIPS Workshop on Discrete Optimization in Machine Learning: Structures, Algorithms and Applications (DISCML), December 2010 (inproceedings)

**Generalized Clustering via Kernel Embeddings**
In *KI 2009: AI and Automation, Lecture Notes in Computer Science, Vol. 5803*, pages: 144-152, (Editors: B Mertsching and M Hund and Z Aziz), Springer, Berlin, Germany, 32nd Annual Conference on Artificial Intelligence (KI), September 2009 (inproceedings)

**Solution Stability in Linear Programming Relaxations: Graph Partitioning and Unsupervised Learning**
In *ICML 2009*, pages: 769-776, (Editors: Danyluk, A. , L. Bottou, M. Littman), ACM Press, New York, NY, USA, 26th International Conference on Machine Learning, June 2009 (inproceedings)

**Fast Kernel-Based Independent Component Analysis**
*IEEE Transactions on Signal Processing*, 57(9):3498-3511, September 2009 (article)

**Approximation Algorithms for Tensor Clustering**
In *Algorithmic Learning Theory: 20th International Conference*, pages: 368-383, (Editors: Gavalda, R. , G. Lugosi, T. Zeugmann, S. Zilles), Springer, Berlin, Germany, ALT, October 2009 (inproceedings)

**Notes on Graph Cuts with Submodular Edge Weights**
In pages: 1-6, NIPS Workshop on Discrete Optimization in Machine Learning: Submodularity, Sparsity & Polyhedra (DISCML), December 2009 (inproceedings)

**Consistent Minimization of Clustering Objective Functions**
In *Advances in neural information processing systems 20*, pages: 961-968, (Editors: Platt, J. C., D. Koller, Y. Singer, S. Roweis), Curran, Red Hook, NY, USA, Twenty-First Annual Conference on Neural Information Processing Systems (NIPS), September 2008 (inproceedings)

**Approximation Algorithms for Bregman
Clustering Co-clustering and Tensor Clustering**
(177), Max-Planck Institute for Biological Cybernetics, Tübingen, Germany, September 2008 (techreport)

My research is concerend with discrete optimization problems in machine learning. Most recently, I have been working on submodular functions and graph cuts.

I have also worked on theoretical aspects of clustering and density estimation, from the perspective of approximation as well as learning theory.

Below is a summary of projects, and links to code. I have also co-organized the NIPS workshop on Discrete Optimization in Machine Learning.

**Structured problems with submodular cost**

The generic setting here is as follows: take a combinatorial optimization problem, and replace the usual sum-of-weights cost by a nonlinear, submodular function. This makes the problem much harder but has nice applications. In particular, we worked on this setting with graph cuts, for which there are applications in inference and computer vision.

In addition, we derived online algorithms for structured decision problems (e.g., online spanning tree, or s-t cut) with submodular cost (in contrast, almost all previous results are for linear costs).

During our work, we came across several functions that are actually not submodular (but maybe some people hoped them to be), and thus I am collecting those functions in a bag of submodular non-examples.

**Clustering & Graph Cuts**

**Nearest Neighbor Clustering:** we developed a generic algorithm to minimize popular clustering objective functions, such as Normalized Cut or k-means. This method is consistent, thanks to pooling neighboring points. (with Ulrike von Luxburg, Sebastien Bubeck, Michael Kaufmann).

Download Demo Code

**Generalized Clustering via Kernel Embeddings**: The concept of clustering can be generalized to finding distributions that are maximally separated. The standard case is to measure separation by means of the distribution (k-means). MMD, the maximum mean discrepancy in a feature space, allows to also take higher-order moments into account. As a result, it is possible to e.g. separate two Gaussians with the same mean but different variance. Interestingly, maximizing MMD is closely related to kernel k-means and various other clustering criteria. (with Ulrike von Luxburg, Bernhard Schoelkopf, Arthur Gretton, Bharath Sriperumbudur)

**Approximation Algorithms for Tensor Clustering**: We prove an approximation factor for tensor clustering (i.e., "cutting a big cube into little cubes") for maybe the simplest possible algorithm. Various divergences are possible, such as Euclidean distance, Bregman divergences etc. (with Suvrit Sra, Arindam Banerjee)

**Solution Stability in Linear Programming Relaxations: Graph Partitioning and Unsupervised Learning. **Data is often noisy. If there are several good solutions to an optimization problem (here, graph partitioning), then the noise can be the determining part to make one solution optimal. However, how much do we want to trust this solution, if a small perturbation in the data suddenly makes another solution the best? this work proposes a method to test the stability of an optimal graph cut solution to perturbation of the edge weights (i.e., noise).We also show that several common clustering and graph partitioning objectives fall in our framework. (with Sebastian Nowozin)

**ICA**

**Fast Kernel ICA: **We developed a Newton-like method for kernel ICA that uses HSIC as the independence criterion (with Hao Shen, Arthur Gretton).

Download Code for Fast kernel ICA