Noam Shazeer

Cited by

	All	Since 2019
Citations	160454	155604
h-index	58	53
i10-index	95	79

57000

28500

14250

42750

20172018201920202021202220232024643 2301 6815 13293 23047 36090 56516 19764

Public access

View all

1 article

0 articles

available

not available

Based on funding mandates

Noam Shazeer

Character.ai

Verified email at character.ai

Deep Learning


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Attention is all you need A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, ... Advances in neural information processing systems 30, 2017	118940	2017
Exploring the limits of transfer learning with a unified text-to-text transformer C Raffel, N Shazeer, A Roberts, K Lee, S Narang, M Matena, Y Zhou, W Li, ... Journal of machine learning research 21 (140), 1-67, 2020	14731	2020
Palm: Scaling language modeling with pathways A Chowdhery, S Narang, J Devlin, M Bosma, G Mishra, A Roberts, ... Journal of Machine Learning Research 24 (240), 1-113, 2023	3376	2023
Scheduled sampling for sequence prediction with recurrent neural networks S Bengio, O Vinyals, N Jaitly, N Shazeer Advances in neural information processing systems 28, 2015	2211	2015
Image transformer N Parmar, A Vaswani, J Uszkoreit, L Kaiser, N Shazeer, A Ku, D Tran International conference on machine learning, 4055-4064, 2018	1806	2018
Outrageously large neural networks: The sparsely-gated mixture-of-experts layer N Shazeer, A Mirhoseini, K Maziarz, A Davis, Q Le, G Hinton, J Dean arXiv preprint arXiv:1701.06538, 2017	1787	2017
Exploring the limits of language modeling R Jozefowicz, O Vinyals, M Schuster, N Shazeer, Y Wu arXiv preprint arXiv:1602.02410, 2016	1352	2016
Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity W Fedus, B Zoph, N Shazeer Journal of Machine Learning Research 23 (120), 1-39, 2022	1283	2022
Lamda: Language models for dialog applications R Thoppilan, D De Freitas, J Hall, N Shazeer, A Kulshreshtha, HT Cheng, ... arXiv preprint arXiv:2201.08239, 2022	1088	2022
Attention is all you need. arXiv 2017 A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, ... arXiv preprint arXiv:1706.03762 3762, 2023	1032	2023
Generating wikipedia by summarizing long sequences PJ Liu, M Saleh, E Pot, B Goodrich, R Sepassi, L Kaiser, N Shazeer arXiv preprint arXiv:1801.10198, 2018	909	2018
Adafactor: Adaptive learning rates with sublinear memory cost N Shazeer, M Stern International Conference on Machine Learning, 4596-4604, 2018	775	2018
End-to-end text-dependent speaker verification G Heigold, I Moreno, S Bengio, N Shazeer 2016 IEEE International Conference on Acoustics, Speech and Signal …, 2016	751	2016
Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017 V Ashish, S Noam, P Niki, U Jakob, J Llion Attention is all you need. In Advances in neural information processing …, 2017	728	2017
How much knowledge can you pack into the parameters of a language model? A Roberts, C Raffel, N Shazeer arXiv preprint arXiv:2002.08910, 2020	694	2020
Gshard: Scaling giant models with conditional computation and automatic sharding D Lepikhin, HJ Lee, Y Xu, D Chen, O Firat, Y Huang, M Krikun, N Shazeer, ... arXiv preprint arXiv:2006.16668, 2020	683	2020
Tensor2tensor for neural machine translation A Vaswani, S Bengio, E Brevdo, F Chollet, AN Gomez, S Gouws, L Jones, ... arXiv preprint arXiv:1803.07416, 2018	611	2018
Attention is all you need (2017) A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, ... arXiv preprint arXiv:1706.03762, 2019	505	2019
Serving content-relevant advertisements with client-side device support D Anderson, P Buchheit, JA Dean, GR Harik, CL Gonsalves, N Shazeer, ... US Patent 8,086,559, 2011	402	2011
One model to learn them all L Kaiser, AN Gomez, N Shazeer, A Vaswani, N Parmar, L Jones, ... arXiv preprint arXiv:1706.05137, 2017	381	2017

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by