Follow
Frederik Kunstner
Frederik Kunstner
Verified email at cs.ubc.ca - Homepage
Title
Cited by
Cited by
Year
Limitations of the empirical Fisher approximation for natural gradient descent
F Kunstner, L Balles, P Hennig
Advances in Neural Information Processing Systems 32, 4158--4169, 2019
1842019
BackPACK: Packing more into Backprop
F Dangel, F Kunstner, P Hennig
International Conference on Learning Representations, 2020
1012020
Slang: Fast structured covariance approximations for bayesian deep learning with natural gradient
A Mishkin, F Kunstner, D Nielsen, M Schmidt, ME Khan
Advances in Neural Information Processing Systems 31, 6248--6258, 2018
652018
Heavy-tailed noise does not explain the gap between SGD and Adam, but sign descent might
F Kunstner, J Chen, JW Lavington, M Schmidt
International Conference on Learning Representations, 5, 2023
31*2023
Adaptive gradient methods converge faster with over-parameterization (but you should do a line-search)
S Vaswani, I Laradji, F Kunstner, SY Meng, M Schmidt, S Lacoste-Julien
arXiv preprint arXiv:2006.06835, 2020
30*2020
Homeomorphic-Invariance of EM: Non-Asymptotic Convergence in KL Divergence for Exponential Families via Mirror Descent
F Kunstner, R Kumar, M Schmidt
International Conference on Artificial Intelligence and Statistics 130, 3295 …, 2021
262021
Fully Quantized Distributed Gradient Descent
F Künstner, SU Stich, M Jaggi
Technical report, EPFL, 2017
82017
Searching for optimal per-coordinate step-sizes with multidimensional backtracking
F Kunstner, V Sanches Portella, M Schmidt, N Harvey
Advances in Neural Information Processing Systems 36, 2024
22024
Convergence Rates for the MAP of an Exponential Family and Stochastic Mirror Descent--an Open Problem
RL Priol, F Kunstner, D Scieur, S Lacoste-Julien
arXiv preprint arXiv:2111.06826, 2021
12021
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models
F Kunstner, R Yadav, A Milligan, M Schmidt, A Bietti
arXiv preprint arXiv:2402.19449, 2024
2024
Variance Reduced Model Based Methods: New rates and adaptive step sizes
RM Gower, F Kunstner, M Schmidt
OPT 2023: Optimization for Machine Learning, 2023
2023
Why Adam Outperforms Gradient Descent on Language Models: A Heavy-Tailed Class Imbalance Problem
R Yadav, F Kunstner, M Schmidt, A Bietti
NeurIPS workshop, Optimization for Machine Learning, 2023
2023
The system can't perform the operation now. Try again later.
Articles 1–12