Near-global climate simulation at 1 km resolution: establishing a performance baseline on 4888 GPUs with COSMO 5.0 O Fuhrer, T Chadha, T Hoefler, G Kwasniewski, X Lapillonne, D Leutwyler, ... Geoscientific Model Development 11 (4), 1665-1681, 2018 | 71 | 2018 |

Using compiler techniques to improve automatic performance modeling A Bhattacharyya, G Kwasniewski, T Hoefler 2015 International Conference on Parallel Architecture and Compilation (PACT …, 2015 | 28 | 2015 |

Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication G Kwasniewski, M Kabić, M Besta, J VandeVondele, R Solcà, T Hoefler Proceedings of the International Conference for High Performance Computing …, 2019 | 23 | 2019 |

A PCIe congestion-aware performance model for densely populated accelerator servers M Martinasso, G Kwasniewski, SR Alam, TC Schulthess, T Hoefler SC'16: Proceedings of the International Conference for High Performance …, 2016 | 18 | 2016 |

Near-global climate simulation at 1 km resolution: establishing a performance baseline on 4888 GPUs with COSMO 5.0, Geosci. Model Dev., 11, 1665–1681 O Fuhrer, T Chadha, T Hoefler, G Kwasniewski, X Lapillonne, D Leutwyler, ... gmd-11-1665-2018, 2018 | 15 | 2018 |

Extreme scale plasma turbulence simulations on top supercomputers worldwide W Tang, B Wang, S Ethier, G Kwasniewski, T Hoefler, KZ Ibrahim, ... SC'16: Proceedings of the International Conference for High Performance …, 2016 | 11 | 2016 |

Automatic complexity analysis of explicitly parallel programs T Hoefler, G Kwasniewski Proceedings of the 26th ACM symposium on Parallelism in algorithms and …, 2014 | 8 | 2014 |

Flexible communication avoiding matrix multiplication on FPGA with high-level synthesis J de Fine Licht, G Kwasniewski, T Hoefler Proceedings of the 2020 ACM/SIGDA International Symposium on Field …, 2020 | 7 | 2020 |

Automatic performance modeling of HPC applications F Wolf, C Bischof, A Calotoiu, T Hoefler, C Iwainsky, G Kwasniewski, ... Software for Exascale Computing-SPPEXA 2013-2015, 445-465, 2016 | 5 | 2016 |

GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra M Besta, Z Vonarburg-Shmaria, Y Schaffner, L Schwarz, G Kwasniewski, ... arXiv preprint arXiv:2103.03653, 2021 | 1 | 2021 |

On the parallel I/O optimality of linear algebra kernels: near-optimal LU factorization G Kwasniewski, T Ben-Nun, AN Ziogas, T Schneider, M Besta, T Hoefler Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of …, 2021 | 1 | 2021 |

SeBS: A Serverless Benchmark Suite for Function-as-a-Service Computing M Copik, G Kwasniewski, M Besta, M Podstawski, T Hoefler arXiv preprint arXiv:2012.14132, 2020 | 1 | 2020 |

SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems M Besta, R Kanakagiri, G Kwasniewski, R Ausavarungnirun, J Beránek, ... arXiv preprint arXiv:2104.07582, 2021 | | 2021 |

High-performance distributed memory systems–from supercomputers to data centers T Hoefler, A Barak, Z Drezner, A Shiloh, M Snir, W Gropp, M Besta, ... 34th International Symposium on Distributed Computing (DISC 2020), 2020 | | 2020 |

Flexible Communication Avoiding Matrix Multiplication on FPGA with High-Level Synthesis JF Licht, G Kwasniewski, T Hoefler arXiv preprint arXiv:1912.06526, 2019 | | 2019 |

A scalable weakly-synchronous algorithm for solving partial differential equations K Aditya, T Gysi, G Kwasniewski, T Hoefler, DA Donzis, JH Chen arXiv preprint arXiv:1911.05769, 2019 | | 2019 |

Scaling a Convection-Resolving RCM to Near-Global Scales O Fuhrer, D Leutwyler, T Chadha, G Kwasniewski, T Hoefler, X Lapillonne, ... 2017 AGU Fall Meeting, 2017 | | 2017 |

Scaling a Convection-Resolving RCM to Near-Global Scales D Leutwyler, O Fuhrer, T Chadha, G Kwasniewski, T Hoefler, X Lapillonne, ... AGU Fall Meeting Abstracts 2017, A24F-05, 2017 | | 2017 |