Finetuned language models are zero-shot learners J Wei, M Bosma, VY Zhao, K Guu, AW Yu, B Lester, N Du, AM Dai, QV Le ICLR 2022, 2022 | 2733 | 2022 |
Scaling instruction-finetuned language models HW Chung, L Hou, S Longpre, B Zoph, Y Tay, W Fedus, Y Li, X Wang, ... Journal of Machine Learning Research 25 (70), 1-53, 2024 | 2570 | 2024 |
Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, Y Wu, JB Alayrac, J Yu, R Soricut, ... arXiv preprint arXiv:2312.11805, 2023 | 1564 | 2023 |
QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension AW Yu, D Dohan, MT Luong, R Zhao, K Chen, M Norouzi, QV Le ICLR 2018, 2018 | 1347* | 2018 |
Simvlm: Simple visual language model pretraining with weak supervision Z Wang, J Yu, AW Yu, Z Dai, Y Tsvetkov, Y Cao ICLR 2022, 2022 | 738 | 2022 |
Glam: Efficient scaling of language models with mixture-of-experts N Du, Y Huang, AM Dai, S Tong, D Lepikhin, Y Xu, M Krikun, Y Zhou, ... ICML 2022, 2022 | 602* | 2022 |
Deepfusion: Lidar-camera deep fusion for multi-modal 3d object detection Y Li, AW Yu, T Meng, B Caine, J Ngiam, D Peng, J Shen, Y Lu, D Zhou, ... CVPR 2022, 2022 | 338 | 2022 |
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context M Reid, N Savinov, D Teplyashin, D Lepikhin, T Lillicrap, J Alayrac, ... arXiv preprint arXiv:2403.05530, 2024 | 297 | 2024 |
Orthogonal weight normalization: Solution to optimization over multiple dependent stiefel manifolds in deep neural networks L Huang, X Liu, B Lang, AW Yu, B Li AAAI 2018, 2017 | 239 | 2017 |
Combined scaling for zero-shot transfer learning H Pham, Z Dai, G Ghiasi, H Liu, AW Yu, MT Luong, M Tan, QV Le arXiv preprint arXiv:2111.10050, 2021 | 218* | 2021 |
Large language models cannot self-correct reasoning yet J Huang, X Chen, S Mishra, HS Zheng, AW Yu, X Song, D Zhou arXiv preprint arXiv:2310.01798, 2023 | 168 | 2023 |
Learning to skim text AW Yu, H Lee, QV Le ACL 2017, 2017 | 164 | 2017 |
Neural symbolic reader: Scalable integration of distributed and symbolic representations for reading comprehension X Chen, C Liang, AW Yu, D Zhou, D Song, QV Le ICLR 2020, 2019 | 119 | 2019 |
Compositional generalization via neural-symbolic stack machines X Chen, C Liang, AW Yu, D Song, D Zhou NeurIPS 2020, 2020 | 94 | 2020 |
Adadelay: Delay adaptive distributed stochastic convex optimization S Sra, AW Yu, M Li, AJ Smola AISTATS 2016, 2016 | 92* | 2016 |
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining SM Xie, H Pham, X Dong, N Du, H Liu, Y Lu, P Liang, QV Le, T Ma, AW Yu NeurIPS 2023, 2023 | 82 | 2023 |
Towards zero-label language learning Z Wang, AW Yu, O Firat, Y Cao arXiv preprint arXiv:2109.09193, 2021 | 77 | 2021 |
AutoHAS: Efficient hyperparameter and architecture search X Dong, M Tan, AW Yu, D Peng, B Gabrys, QV Le arXiv preprint arXiv:2006.03656, 2020 | 69* | 2020 |
On computationally tractable selection of experiments in measurement-constrained regression models Y Wang, AW Yu, A Singh The Journal of Machine Learning Research 18 (1), 5238-5278, 2017 | 67* | 2017 |
An improved gap-dependency analysis of the noisy power method MF Balcan, SS Du, Y Wang, AW Yu COLT 2016, 2016 | 65 | 2016 |