Follow
Scott Levy
Title
Cited by
Cited by
Year
Lessons learned from memory errors observed over the lifetime of Cielo
S Levy, KB Ferreira, N DeBardeleben, T Siddiqua, V Sridharan, ...
SC18: International Conference for High Performance Computing, Networking …, 2018
482018
Using simulation to evaluate the performance of resilience strategies at scale
S Levy, B Topp, KB Ferreira, D Arnold, T Hoefler, P Widener
High Performance Computing Systems. Performance Modeling, Benchmarking and …, 2014
412014
Understanding performance interference in next-generation HPC systems
OH Mondragon, PG Bridges, S Levy, KB Ferreira, P Widener
SC'16: Proceedings of the International Conference for High Performance …, 2016
382016
Understanding the effects of communication and coordination on checkpointing at scale
KB Ferreira, P Widener, S Levy, D Arnold, T Hoefler
SC'14: Proceedings of the International Conference for High Performance …, 2014
362014
Lifetime memory reliability data from the field
T Siddiqua, V Sridharan, SE Raasch, N DeBardeleben, KB Ferreira, ...
2017 IEEE International Symposium on Defect and Fault Tolerance in VLSI and …, 2017
292017
Faodel: Data management for next-generation application workflows
C Ulmer, S Mukherjee, G Templet, S Levy, J Lofstead, P Widener, ...
Proceedings of the 9th Workshop on Scientific Cloud Computing, 1-6, 2018
232018
Characterizing MPI matching via trace-based simulation
KB Ferreira, S Levy, K Pedretti, RE Grant
Proceedings of the 24th European MPI Users' Group Meeting, 1-11, 2017
232017
Improving dram fault characterization through machine learning
E Baseman, N DeBardeleben, K Ferreira, S Levy, S Raasch, V Sridharan, ...
2016 46th Annual IEEE/IFIP International Conference on Dependable Systems …, 2016
232016
Empress: extensible metadata provider for extreme-scale scientific simulations
M Lawson, C Ulmer, S Mukherjee, G Templet, J Lofstead, S Levy, ...
Proceedings of the 2nd Joint International Workshop on Parallel Data Storage …, 2017
152017
Exploring the effect of noise on the performance benefit of nonblocking allreduce
P Widener, KB Ferreira, S Levy, T Hoefler
Proceedings of the 21st European MPI Users' Group Meeting, 77-82, 2014
152014
Using unreliable virtual hardware to inject errors in extreme-scale systems
S Levy, MGF Dosanjh, PG Bridges, KB Ferreira
Proceedings of the 3rd Workshop on Fault-tolerance for HPC at extreme scale …, 2013
132013
“Smarter” NICs for faster molecular dynamics: a case study
S Karamati, C Hughes, KS Hemmert, RE Grant, WW Schonbein, S Levy, ...
2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2022
112022
Using simulation to examine the effect of MPI message matching costs on application performance
S Levy, KB Ferreira
Proceedings of the 25th European MPI Users' Group Meeting, 1-11, 2018
102018
An examination of the impact of failure distribution on coordinated checkpoint/restart
S Levy, KB Ferreira
Proceedings of the ACM Workshop on Fault-Tolerance for HPC at Extreme Scale …, 2016
102016
Scheduling in-situ analytics in next-generation applications
OH Mondragon, PG Bridges, S Levy, KB Ferreira, P Widener
2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid …, 2016
102016
Evaluating the feasibility of using memory content similarity to improve system resilience
S Levy, PG Bridges, KB Ferreira, AP Thompson, C Trott
Proceedings of the 3rd International Workshop on Runtime and Operating …, 2013
102013
On noise and the performance benefit of nonblocking collectives
PM Widener, S Levy, KB Ferreira, T Hoefler
The International Journal of High Performance Computing Applications 30 (1 …, 2016
92016
Using simulation to evaluate the performance of resilience strategies and process failures
SN Levy, BE Topp, DC Arnold, KB Ferreira, P Widener, T Hoefler
Sandia National Lab.(SNL-NM), Albuquerque, NM (United States), 2014
92014
Understanding memory failures on a petascale Arm system
KB Ferreira, S Levy, J Hemmert, K Pedretti
Proceedings of the 31st International Symposium on High-Performance Parallel …, 2022
82022
RaDD runtimes: Radical and different distributed runtimes with smartnics
RE Grant, W Schonbein, S Levy
2020 IEEE/ACM Fourth Annual Workshop on Emerging Parallel and Distributed …, 2020
82020
The system can't perform the operation now. Try again later.
Articles 1–20