Optimization lies at the heart of most machine learning algorithms. The key aspects of the applications in which these algorithms are applied include: high-dimensional, noisy, and uncertain data; huge volumes of batch or streaming data; intractable models, low accuracy, and reliance on distributed computation or stochastic approximations. The success of most machine learning algorithms depends on how the optimization techniques can adapt and exploit these facets. Our interests are broadly divided into two categories, convex and non-convex methods.
Convex optimization In the realm of methods for convex optimization, we have addressed research challenges under various different problem settings. For large-scale problems, where scalability is an important aspect, a summary overview of large-scale aspects of convex optimization appears in our work [ ]. A theoretically optimal large-scale convex method for problems with linear constraints is presented in [ ] which develops a new stochastic alternating direction method of multipliers (ADMM) method that combines Nesterov's accelerated gradient methods with ADMM.
For learning classifiers in extremely large output spaces, we have proposed a parallelizable mixed-norm regularization approach leading to convex but non-smooth optimization in our recent work [ ]. We show that the resulting models can be orders of magnitude smaller than most state-of-the-art methods and also lead to better generalization performance.
A contribution that lies at the interface of combinatorial and convex optimization is presented in [ ]. General purpose convex methods often rely on key subroutines such as projection and proximity operators. Continuing our effort from the previous SAB assessment towards developing a library of highly tuned subroutines, e.g., for total variation, we extend to multivariate total variation in [ ]. A flourishing sub-field of convex optimization is ``Robust optimization'' which seeks to optimize models under parameters/data uncertainty. It intersects with the usual min-max theory in statistics, and can be used to offer a different view of regularization in machine learning. In [ ], we introduced the notion of robust optimization for matrix completion/recovery problems. The actual application was to recover correlation matrices under a simple bounded uncertainty model.
Non-convex Optimization In the domain of non-convex optimization for large-scale problems, our work [ ] presents a simplified analysis of what, to our knowledge, is the first non-convex, non-smooth incremental proximal method. This work started in 2011; interestingly, in recent years, the interest in incremental methods has sky-rocketed, though the analysis is limited only to the convex case. Finally, we mention a new direction in nonconvex optimization offered by our recent work [ ], which introduces "Geometric optimization" on the manifold of positive definite matrices. The underlying idea is to develop a theory of convexity along geodesics on the Positive Semi-Definite manifold. This work also identifies some basic calculus rules for detection and construction of geodesically convex functions on the Positive Definite manifold, and as an application presents new algorithms for solving maximum likelihood estimation for elliptically contoured distributions, which despite non-convexity remain tractable thanks to geodesic convexity.
Sra, S.
Positive definite matrices and the S-divergence
Proceedings of the American Mathematical Society, 2015, Published electronically: October 22, 2015 (article)
Cherian, A., Sra, S., Morellas, V., Papanikolopoulos, N.
Efficient nearest neighbors via robust sparse hashing
IEEE Transactions on Image Processing, 23(8):3646-3655, 2014 (article)
Barbero, A. J., Sra, S.
Modular proximal optimization with application to total variation regularization
2014 (article) In revision
Wytock, M., Sra, S., Kolter, J. Z.
Fast Newton methods for the group fused lasso
In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence, pages: 888-897, (Editors: Zhang, N. L. and Tian, J.), AUAI Press, UAI, 2014 (inproceedings)
Kim, D., Sra, S., Dhillon, I. S.
A non-monotonic method for large-scale non-negative least squares
Optimization Methods and Software, 28(5):1012-1039, Febuary 2012 (article)
Sra, S.
Fast projection onto mixed-norm balls with applications
Minining and Knowledge Discovery (DMKD), 25(2):358-377, September 2012 (article)
Cayton, L.
Accelerating Nearest Neighbor Search on Manycore Systems
In Parallel Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International, pages: 402-413, IPDPS, May 2012 (inproceedings)
Sra, S.
Scalable nonconvex inexact proximal splitting
In Advances of Neural Information Processing Systems 25, pages: 539-547, (Editors: P Bartlett and FCN Pereira and CJC. Burges and L Bottou and KQ Weinberger), Curran Associates Inc., 26th Annual Conference on Neural Information Processing Systems (NIPS), 2012 (inproceedings)
Jegelka, S.
Combinatorial Problems with Submodular Coupling in Machine Learning and Computer Vision
ETH Zürich, Switzerland, 2012 (phdthesis)
Sra, S., Nowozin, S., Wright, S.
Optimization for Machine Learning
pages: 494, Neural information processing series, MIT Press, Cambridge, MA, USA, December 2011 (book)
Schmidt, M., Kim, D., Sra, S.
Projected Newton-type methods in machine learning
In Optimization for Machine Learning, pages: 305-330, (Editors: Sra, S., Nowozin, S. and Wright, S. J.), MIT Press, Cambridge, MA, USA, December 2011 (inbook)
Dinuzzo, F.
Analysis of Fixed-Point and Coordinate Descent Algorithms for Regularized Kernel Methods
IEEE Transactions on Neural Networks, 22(10):1576-1587, October 2011 (article)
Barbero, A., Sra, S.
Fast Newton-type Methods for Total-Variation with Applications
In Proceedings of the 28th International Conference on Machine Learning, ICML 2011, pages: 313-320, (Editors: L Getoor and T Scheffer), Omnipress, 28th International Conference on Machine Learning (ICML), 2011 (inproceedings)
Cherian, A., Sra, S., Banerjee, A., Papanikolopoulos, N.
Efficient Similarity Search for Covariance Matrices via the Jensen-Bregman LogDet Divergence
In IEEE International Conference on Computer Vision, ICCV 2011, pages: 2399-2406, (Editors: DN Metaxas and L Quan and A Sanfeliu and LJ Van Gool), IEEE, 13th International Conference on Computer Vision (ICCV), 2011 (inproceedings)
Jegelka, S., Bilmes, J.
Approximation Bounds for Inference using Cooperative Cut
In pages: 577-584, (Editors: Getoor, L. , T. Scheffer), International Machine Learning Society, Madison, WI, USA, 28th International Conference on Machine Learning (ICML), July 2011 (inproceedings)
Jegelka, S., Bilmes, J.
Online submodular minimization for combinatorial structures
In pages: 345-352, (Editors: Getoor, L. , T. Scheffer), International Machine Learning Society, Madison, WI, USA, 28th International Conference on Machine Learning (ICML), July 2011 (inproceedings)
Jegelka, S., Bilmes, J.
Submodularity beyond submodular energies: coupling edges in graph cuts
In pages: 1897-1904, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2011 (inproceedings)
Kim, D., Sra, S., Dhillon, I.
Tackling Box-Constrained Optimization via a New Projected Quasi-Newton Approach
SIAM Journal on Scientific Computing, 32(6):3548-3563 , December 2010 (article)
Kim, D., Sra, S., Dhillon, I.
A scalable trust-region algorithm with application to mixed-norm regression
In Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pages: 519-526, (Editors: Fürnkranz, J. , T. Joachims), International Machine Learning Society, Madison, WI, USA, 27th International Conference on Machine Learning (ICML), June 2010 (inproceedings)
Cayton, L.
Efficient Bregman Range Search
In Advances in Neural Information Processing Systems 22, pages: 243-251, (Editors: Bengio, Y. , D. Schuurmans, J. Lafferty, C. Williams, A. Culotta), Curran, Red Hook, NY, USA, 23rd Annual Conference on Neural Information Processing Systems (NIPS), 2009 (inproceedings)
Kulis, B., Sra, S., Dhillon, I.
Convex Perturbations for Scalable Semidefinite Programming
In JMLR Workshop and Conference Proceedings Volume 5: AISTATS 2009, pages: 296-303, (Editors: van Dyk, D. , M. Welling), MIT Press, Cambridge, MA, USA, Twelfth International Conference on Artificial Intelligence and Statistics, April 2009 (inproceedings)