Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context G Team, P Georgiev, VI Lei, R Burnell, L Bai, A Gulati, G Tanzer, ... arXiv preprint arXiv:2403.05530, 2024 | 875 | 2024 |
Tagged back-translation I Caswell, C Chelba, D Grangier arXiv preprint arXiv:1906.06442, 2019 | 252 | 2019 |
Lingvo: a modular and scalable framework for sequence-to-sequence modeling J Shen, P Nguyen, Y Wu, Z Chen, MX Chen, Y Jia, A Kannan, T Sainath, ... arXiv preprint arXiv:1902.08295, 2019 | 214 | 2019 |
Quality at a glance: An audit of web-crawled multilingual datasets J Kreutzer, I Caswell, L Wang, A Wahab, D van Esch, N Ulzii-Orshikh, ... Transactions of the Association for Computational Linguistics 10, 50-72, 2022 | 167* | 2022 |
BLEU might be guilty but references are not innocent M Freitag, D Grangier, I Caswell arXiv preprint arXiv:2004.06063, 2020 | 144 | 2020 |
Investigating multilingual NMT representations at scale SR Kudugunta, A Bapna, I Caswell, N Arivazhagan, O Firat arXiv preprint arXiv:1909.02197, 2019 | 129 | 2019 |
Nisansa de Silva J Kreutzer, I Caswell, L Wang, A Wahab, D Van Esch, N Ulzii-Orshikh, ... Sakine Çabuk Ballı, Stella Biderman, Alessia Battisti, Ahmed Baruwa, Ankur …, 2022 | 105 | 2022 |
Madlad-400: A multilingual and document-level large audited dataset S Kudugunta, I Caswell, B Zhang, X Garcia, D Xin, A Kusupati, R Stella, ... Advances in Neural Information Processing Systems 36, 2024 | 89 | 2024 |
Language ID in the wild: Unexpected challenges on the path to a thousand-language web text corpus I Caswell, T Breiner, D Van Esch, A Bapna arXiv preprint arXiv:2010.14571, 2020 | 87 | 2020 |
Building machine translation systems for the next thousand languages A Bapna, I Caswell, J Kreutzer, O Firat, D van Esch, A Siddhant, M Niu, ... arXiv preprint arXiv:2205.03983, 2022 | 86 | 2022 |
Dynamically composing domain-data selection with clean-data selection by" co-curricular learning" for neural machine translation W Wang, I Caswell, C Chelba arXiv preprint arXiv:1906.01130, 2019 | 69 | 2019 |
APE at scale and its implications on MT evaluation biases M Freitag, I Caswell, S Roy arXiv preprint arXiv:1904.04790, 2019 | 69 | 2019 |
Translationese as a Language in" Multilingual" NMT P Riley, I Caswell, M Freitag, D Grangier arXiv preprint arXiv:1911.03823, 2019 | 49 | 2019 |
Learning a multi-domain curriculum for neural machine translation W Wang, Y Tian, J Ngiam, Y Yang, I Caswell, Z Parekh arXiv preprint arXiv:1908.10940, 2019 | 48 | 2019 |
Quality at a glance: An audit of web-crawled multilingual datasets I Caswell, J Kreutzer, L Wang, A Wahab, D van Esch, N Ulzii-Orshikh, ... arXiv e-prints, arXiv: 2103.12028, 2021 | 37 | 2021 |
Towards the next 1000 languages in multilingual machine translation: Exploring the synergy between supervised and self-supervised learning A Siddhant, A Bapna, O Firat, Y Cao, MX Chen, I Caswell, X Garcia arXiv preprint arXiv:2201.03110, 2022 | 34 | 2022 |
Recent advances in google translate I Caswell, B Liang Google AI Blog 8, 2020 | 33 | 2020 |
Xtreme-up: A user-centric scarce-data benchmark for under-represented languages S Ruder, JH Clark, A Gutkin, M Kale, M Ma, M Nicosia, S Rijhwani, P Riley, ... arXiv preprint arXiv:2305.11938, 2023 | 24 | 2023 |
Writing system and speaker metadata for 2,800+ language varieties D van Esch, T Lucassen, S Ruder, I Caswell, C Rivera Proceedings of the Thirteenth Language Resources and Evaluation Conference …, 2022 | 22 | 2022 |
Loopy neural nets: Imitating feedback loops in the human brain I Caswell, C Shen, L Wang Tech. Report, 2016 | 17 | 2016 |