Detailed modeling, design, and evaluation of a scalable multi-level checkpointing system AT Moody, G Bronevetsky, KM Mohror, BR de Supinski Lawrence Livermore National Laboratory (LLNL), Livermore, CA, 2010 | 739* | 2010 |
Design, modeling, and evaluation of a scalable multi-level checkpointing system A Moody, G Bronevetsky, K Mohror, BR De Supinski High Performance Computing, Networking, Storage and Analysis (SC), 2010 …, 2010 | 736 | 2010 |
The Spack package manager: bringing order to HPC software chaos T Gamblin, M LeGendre, MR Collette, GL Lee, A Moody, BR de Supinski, ... Proceedings of the International Conference for High Performance Computing …, 2015 | 189 | 2015 |
Design and modeling of a non-blocking checkpointing system K Sato, N Maruyama, K Mohror, A Moody, T Gamblin, BR de Supinski, ... Proceedings of the International Conference on High Performance Computing …, 2012 | 136 | 2012 |
The design, deployment, and evaluation of the CORAL pre-exascale systems SS Vazhkudai, BR de Supinski, AS Bland, A Geist, J Sexton, J Kahle, ... Proceedings of the International Conference for High Performance Computing …, 2018 | 130 | 2018 |
McrEngine: a scalable checkpointing system using data-aware aggregation and compression TZ Islam, K Mohror, S Bagchi, A Moody, BR De Supinski, R Eigenmann Scientific Programming 21 (3-4), 149-163, 2013 | 121 | 2013 |
An ephemeral burst-buffer file system for scientific applications T Wang, K Mohror, A Moody, K Sato, W Yu Proceedings of the International Conference for High Performance Computing …, 2016 | 96 | 2016 |
Truenorth ecosystem for brain-inspired computing: scalable systems, software, and applications J Sawada, F Akopyan, AS Cassidy, B Taba, MV Debole, P Datta, ... High Performance Computing, Networking, Storage and Analysis, SC16 …, 2016 | 83 | 2016 |
Design of a scalable InfiniBand topology service to enable network-topology-aware placement of processes H Subramoni, S Potluri, K Kandalla, B Barth, J Vienne, J Keasler, ... Proceedings of the International Conference on High Performance Computing …, 2012 | 81 | 2012 |
A 1 PB/s file system to checkpoint three million MPI tasks R Rajachandrasekar, A Moody, K Mohror, DK Panda Proceedings of the 22nd international symposium on High-performance parallel …, 2013 | 78 | 2013 |
A user-level infiniband-based file system and checkpoint strategy for burst buffers K Sato, K Mohror, A Moody, T Gamblin, BR De Supinski, N Maruyama, ... Cluster, Cloud and Grid Computing (CCGrid), 2014 14th IEEE/ACM International …, 2014 | 71 | 2014 |
Scalable NIC-based reduction on large-scale clusters A Moody, J Fernandez, F Petrini, DK Panda Proceedings of the 2003 ACM/IEEE conference on Supercomputing, 59, 2003 | 71 | 2003 |
Hot-spot avoidance with multi-pathing over infiniband: An mpi perspective A Vishnu, M Koop, A Moody, AR Mamidala, S Narravula, DK Panda Cluster Computing and the Grid, 2007. CCGRID 2007. Seventh IEEE …, 2007 | 58 | 2007 |
VeloC: Towards High Performance Adaptive Asynchronous Checkpointing at Large Scale B Nicolae, A Moody, E Gonsiorowski, K Mohror, F Cappello | 47 | 2019 |
Entropy-Aware I/O Pipelining for Large-Scale Deep Learning on HPC Systems Y Zhu, F Chowdhury, H Fu, A Moody, K Mohror, K Sato, W Yu | 40* | |
Managing I/O interference in a shared burst buffer system S Thapaliya, P Bangalore, J Lofstead, K Mohror, A Moody Parallel Processing (ICPP), 2016 45th International Conference on, 416-425, 2016 | 39 | 2016 |
Fmi: Fault tolerant messaging interface for fast and transparent recovery K Sato, A Moody, K Mohror, T Gamblin, BR de Supinski, N Maruyama, ... Parallel and Distributed Processing Symposium, 2014 IEEE 28th International …, 2014 | 35 | 2014 |
Machine Learning Predictions of Runtime and IO Traffic on High-End Clusters R McKenna, S Herbein, A Moody, T Gamblin, M Taufer Cluster Computing (CLUSTER), 2016 IEEE International Conference on, 255-258, 2016 | 34 | 2016 |
Designing non-blocking allreduce with collective offload on InfiniBand clusters: A case study with conjugate gradient solvers K Kandalla, U Yang, J Keasler, T Kolev, A Moody, H Subramoni, K Tomko, ... Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th …, 2012 | 34 | 2012 |
Detailed modeling and evaluation of a scalable multilevel checkpointing system K Mohror, A Moody, G Bronevetsky, BR de Supinski IEEE Transactions on Parallel and Distributed Systems 25 (9), 2255-2263, 2014 | 32 | 2014 |