Pin: building customized program analysis tools with dynamic instrumentation CK Luk, R Cohn, R Muth, H Patil, A Klauser, G Lowney, S Wallace, ... Acm sigplan notices 40 (6), 190-200, 2005 | 5446 | 2005 |
Applied machine learning at facebook: A datacenter infrastructure perspective K Hazelwood, S Bird, D Brooks, S Chintala, U Diril, D Dzhulgakov, ... 2018 IEEE International Symposium on High Performance Computer Architecture …, 2018 | 588 | 2018 |
Profiling a warehouse-scale computer S Kanev, JP Darago, K Hazelwood, P Ranganathan, T Moseley, GY Wei, ... Proceedings of the 42nd Annual International Symposium on Computer …, 2015 | 451 | 2015 |
Machine learning at facebook: Understanding inference at the edge CJ Wu, D Brooks, K Chen, D Chen, S Choudhury, M Dukhan, ... 2019 IEEE international symposium on high performance computer architecture …, 2019 | 420 | 2019 |
Where is the data? Why you cannot debate CPU vs. GPU performance without the answer C Gregg, K Hazelwood (IEEE ISPASS) IEEE International Symposium on Performance Analysis of …, 2011 | 394 | 2011 |
Mlperf training benchmark P Mattson, C Cheng, G Diamos, C Coleman, P Micikevicius, D Patterson, ... Proceedings of Machine Learning and Systems 2, 336-349, 2020 | 243 | 2020 |
The architectural implications of facebook's dnn-based personalized recommendation U Gupta, CJ Wu, X Wang, M Naumov, B Reagen, D Brooks, B Cottel, ... 2020 IEEE International Symposium on High Performance Computer Architecture …, 2020 | 217 | 2020 |
Deep learning inference in facebook data centers: Characterization, performance optimizations and hardware implications J Park, M Naumov, P Basu, S Deng, A Kalaiah, D Khudia, J Law, P Malani, ... arXiv preprint arXiv:1811.09886, 2018 | 165 | 2018 |
Reducing DRAM footprint with NVM in Facebook A Eisenman, D Gardner, I AbdelRahman, J Axboe, S Dong, K Hazelwood, ... Proceedings of the Thirteenth EuroSys Conference, 1-13, 2018 | 161 | 2018 |
Analyzing parallel programs with pin M Bach, M Charney, R Cohn, E Demikhovsky, T Devor, K Hazelwood, ... Computer 43 (3), 34-41, 2010 | 156 | 2010 |
Recnmp: Accelerating personalized recommendation with near-memory processing L Ke, U Gupta, BY Cho, D Brooks, V Chandra, U Diril, A Firoozshahian, ... 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture …, 2020 | 131 | 2020 |
Fine-Grained Resource Sharing for Concurrent GPGPU Kernels. C Gregg, J Dorn, KM Hazelwood, K Skadron HotPar, 2012 | 123 | 2012 |
Superpin: Parallelizing dynamic instrumentation for real-time performance S Wallace, K Hazelwood International Symposium on Code Generation and Optimization (CGO'07), 209-220, 2007 | 122 | 2007 |
Enabling task parallelism in the cuda scheduler M Guevara, C Gregg, K Hazelwood, K Skadron Workshop on Programming Models for Emerging Architectures 9, 84, 2009 | 120 | 2009 |
A dynamic binary instrumentation engine for the arm architecture K Hazelwood, A Klauser Proceedings of the 2006 international conference on Compilers, architecture …, 2006 | 108 | 2006 |
Tradeoffs between power management and tail latency in warehouse-scale applications S Kanev, K Hazelwood, GY Wei, D Brooks 2014 IEEE International Symposium on Workload Characterization (IISWC), 31-40, 2014 | 87 | 2014 |
Adaptive online context-sensitive inlining K Hazelwood, D Grove International Symposium on Code Generation and Optimization, 2003. CGO 2003 …, 2003 | 87 | 2003 |
Dynamic heterogeneous scheduling decisions using historical runtime data C Gregg, M Boyer, K Hazelwood, K Skadron Workshop on Applications for Multi-and Many-Core Processors (A4MMC), 1-12, 2011 | 82 | 2011 |
Sustainable ai: Environmental implications, challenges and opportunities CJ Wu, R Raghavendra, U Gupta, B Acun, N Ardalani, K Maeng, G Chang, ... Proceedings of Machine Learning and Systems 4, 795-813, 2022 | 73 | 2022 |
Bandana: Using non-volatile memory for storing deep learning models A Eisenman, M Naumov, D Gardner, M Smelyanskiy, S Pupyrev, ... Proceedings of Machine Learning and Systems 1, 40-52, 2019 | 71 | 2019 |