Tutel: Adaptive mixture-of-experts at scale C Hwang, W Cui, Y Xiong, Z Yang, Z Liu, H Hu, Z Wang, R Salas, J Jose, ... Proceedings of Machine Learning and Systems 5, 2023 | 34 | 2023 |
Flexmoe: Scaling large-scale sparse pre-trained model training via dynamic device placement X Nie, X Miao, Z Wang, Z Yang, J Xue, L Ma, G Cao, B Cui Proceedings of the ACM on Management of Data 1 (1), 1-19, 2023 | 15 | 2023 |