Painless stochastic gradient: Interpolation, line-search, and convergence rates S Vaswani, A Mishkin, I Laradji, M Schmidt, G Gidel, S Lacoste-Julien Advances in neural information processing systems 32, 2019 | 182 | 2019 |
Slang: Fast structured covariance approximations for bayesian deep learning with natural gradient A Mishkin, F Kunstner, D Nielsen, M Schmidt, ME Khan Advances in Neural Information Processing Systems 31, 2018 | 62 | 2018 |
Fast convex optimization for two-layer relu networks: Equivalent model classes and cone decompositions A Mishkin, A Sahiner, M Pilanci International Conference on Machine Learning, 15770-15816, 2022 | 18 | 2022 |
To each optimizer a norm, to each norm its generalization S Vaswani, R Babanezhad, J Gallego-Posada, A Mishkin, ... arXiv preprint arXiv:2006.06821, 2020 | 5 | 2020 |
Interpolation, Growth Conditions, and Stochastic Gradient Descent A Mishkin University of British Columbia, 2020 | 4 | 2020 |
Analyzing and Improving Greedy 2-Coordinate Updates for Equality-Constrained Optimization via Steepest Descent in the 1-Norm AV Ramesh, A Mishkin, M Schmidt, Y Zhou, JW Lavington, J She arXiv preprint arXiv:2307.01169, 2023 | | 2023 |
Optimal Sets and Solution Paths of ReLU Networks A Mishkin, M Pilanci arXiv preprint arXiv:2306.00119, 2023 | | 2023 |
Fast Convergence of Greedy 2-Coordinate Updates for Optimizing with an Equality Constraint AV Ramesh, A Mishkin, M Schmidt OPT 2022: Optimization for Machine Learning (NeurIPS 2022 Workshop), 2022 | | 2022 |
The Solution Path of the Group Lasso A Mishkin, M Pilanci OPT 2022: Optimization for Machine Learning (NeurIPS 2022 Workshop), 2022 | | 2022 |
Web ValueCharts: Analyzing Individual and Group Preferences with Interactive, Web-based Visualizations A Mishkin | | 2017 |
How to make your optimizer generalize better S Vaswani, R Babenzhad, J Gallego, A Mishkin, S Lacoste-Julien, ... | | |