Shigang Li
Shigang Li
Postdoctoral Researcher, ETH Zurich, SPCL Lab
Verified email at - Homepage
Cited by
Cited by
Automatic tuning of sparse matrix-vector multiplication on multicore clusters
SG Li, CJ Hu, JC Zhang, YQ Zhang
Science China Information Sciences 58 (9), 1-14, 2015
NUMA-aware shared-memory collective communication for MPI
S Li, T Hoefler, M Snir
Proceedings of the 22nd international symposium on High-performance parallel …, 2013
Parallel processing systems for big data: a survey
Y Zhang, T Cao, S Li, X Tian, L Yuan, H Jia, AV Vasilakos
Proceedings of the IEEE 104 (11), 2114-2136, 2016
Deep learning for post-processing ensemble weather forecasts
P Grönquist, C Yao, T Ben-Nun, N Dryden, P Dueben, S Li, T Hoefler
Philosophical Transactions of the Royal Society A 379 (2194), 20200092, 2021
Taming unbalanced training workloads in deep learning with partial collective operations
S Li, T Ben-Nun, SD Girolamo, D Alistarh, T Hoefler
Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of …, 2020
Data Movement Is All You Need: A Case Study on Optimizing Transformers
A Ivanov, N Dryden, T Ben-Nun, S Li, T Hoefler
Proceedings of Machine Learning and Systems 3, 2021
Improved MPI collectives for MPI processes in shared address spaces
S Li, T Hoefler, C Hu, M Snir
Cluster Computing 17 (4), 1139-1155, 2014
CAS‐ESM 2: Description and climate simulation performance of the Chinese Academy of Sciences (CAS) Earth System Model (ESM) version 2
H Zhang, M Zhang, J Jin, K Fei, D Ji, C Wu, J Zhu, J He, Z Chai, J Xie, ...
Journal of Advances in Modeling Earth Systems, e2020MS002210, 2020
Cache-oblivious MPI all-to-all communications based on Morton order
S Li, Y Zhang, T Hoefler
IEEE Transactions on Parallel and Distributed Systems, 2018
Kernel optimization for short-range molecular dynamics
C Hu, X Wang, J Li, X He, S Li, Y Feng, S Yang, H Bai
Computer Physics Communications, 2016
Massively Scaling the Metal Microscopic Damage Simulation on Sunway TaihuLight Supercomputer
S Li, B Wu, Y Zhang, X Wang, J Li, C Hu, J Wang, Y Feng, N Nie
Proceedings of the 47th International Conference on Parallel Processing, 47, 2018
Asynchronous work stealing on distributed memory systems
S Li, J Hu, X Cheng, C Zhao
2013 21st Euromicro International Conference on Parallel, Distributed, and …, 2013
Efficient parallel optimizations of a high-performance SIFT on GPUs
Z Li, H Jia, Y Zhang, S Liu, S Li, X Wang, H Zhang
Journal of Parallel and Distributed Computing, 2018
Fast Convolution Operations on Many-Core Architectures
S Li, Y Zhang, C Xiang, L Shi
High Performance Computing and Communications (HPCC), 2015 IEEE 7th …, 2015
Hybrid-optimization strategy for the communication of large-scale Kinetic Monte Carlo simulation
B Wu, S Li, Y Zhang, N Nie
Computer Physics Communications, 2016
Chimera: efficiently training large-scale neural networks with bidirectional pipelines
S Li, T Hoefler
Proceedings of the International Conference for High Performance Computing …, 2021
A Cross-Platform SpMV Framework on Many-Core Architectures
Y Zhang, S Li, S Yan, H Zhou
ACM Transactions on Architecture and Code Optimization (TACO) 13 (4), 33, 2016
Analyzing MPI-3.0 Process-Level Shared Memory: A Case Study with Stencil Computations
X Zhu, J Zhang, K Yoshii, S Li, Y Zhang, P Balaji
Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International …, 2015
Asynchronous Decentralized SGD with Quantized and Local Updates
G Nadiradze, A Sabour, P Davies, S Li, D Alistarh
Advances in Neural Information Processing Systems 34, 2021
OpenKMC: a KMC design for hundred-billion-atom simulation using millions of cores on Sunway Taihulight
K Li, H Shang, Y Zhang, S Li, B Wu, D Wang, L Zhang, F Li, D Chen, ...
Proceedings of the International Conference for High Performance Computing …, 2019
The system can't perform the operation now. Try again later.
Articles 1–20