Laion-400m: Open dataset of clip-filtered 400 million image-text pairs C Schuhmann, R Vencu, R Beaumont, R Kaczmarczyk, C Mullis, A Katta, ... arXiv preprint arXiv:2111.02114, 2021 | 825 | 2021 |
GPT-J-6B: A 6 billion parameter autoregressive language model B Wang, A Komatsuzaki | 655 | 2021 |
One epoch is all you need A Komatsuzaki arXiv preprint arXiv:1906.06669, 2019 | 36 | 2019 |
Sparse upcycling: Training mixture-of-experts from dense checkpoints A Komatsuzaki, J Puigcerver, J Lee-Thorp, CR Ruiz, B Mustafa, J Ainslie, ... arXiv preprint arXiv:2212.05055, 2022 | 29 | 2022 |
Arb: Advanced reasoning benchmark for large language models T Sawada, D Paleka, A Havrilla, P Tadepalli, P Vidas, A Kranias, JJ Nay, ... arXiv preprint arXiv:2307.13692, 2023 | 25 | 2023 |
Extractive summary as discrete latent variables A Komatsuzaki arXiv preprint arXiv:1811.05542, 2018 | 6 | 2018 |
Current Limitations of Language Models: What You Need is Retrieval A Komatsuzaki arXiv preprint arXiv:2009.06857, 2020 | 1 | 2020 |