Pytorch 2: Faster machine learning through dynamic python bytecode transformation and graph compilation J Ansel, E Yang, H He, N Gimelshein, A Jain, M Voznesensky, B Bao, ... Proceedings of the 29th ACM International Conference on Architectural …, 2024 | 399 | 2024 |
Gist: Efficient data encoding for deep neural network training A Jain, A Phanishayee, J Mars, L Tang, G Pekhimenko 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture …, 2018 | 206 | 2018 |
Deftnn: Addressing bottlenecks for dnn execution on gpus via synapse vector elimination and near-compute data fission P Hill, A Jain, M Hill, B Zamirai, CH Hsu, MA Laurenzano, S Mahlke, ... Proceedings of the 50th Annual IEEE/ACM International Symposium on …, 2017 | 86 | 2017 |
Concise loads and stores: The case for an asymmetric compute-memory architecture for approximation A Jain, P Hill, SC Lin, M Khan, ME Haque, MA Laurenzano, S Mahlke, ... 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture …, 2016 | 63 | 2016 |
Efficient execution of quantized deep learning models: A compiler approach A Jain, S Bhattacharya, M Masuda, V Sharma, Y Wang arXiv preprint arXiv:2006.10226, 2020 | 41 | 2020 |
UNIT: Unifying tensorized instruction compilation J Weng, A Jain, J Wang, L Wang, Y Wang, T Nowatzki 2021 IEEE/ACM International Symposium on Code Generation and Optimization …, 2021 | 35 | 2021 |
Proctor: Detecting and investigating interference in shared datacenters RS Kannan, A Jain, MA Laurenzano, L Tang, J Mars 2018 IEEE International Symposium on Performance Analysis of Systems and …, 2018 | 26 | 2018 |
Continuous shape shifting: Enabling loop co-optimization via near-free dynamic code rewriting A Jain, MA Laurenzano, L Tang, J Mars 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture …, 2016 | 19 | 2016 |
High-radix on-chip networks with low-radix routers A Jain, R Parikh, V Bertacco 2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 289-294, 2014 | 14 | 2014 |
Knowledge distillation via module replacing for automatic speech recognition with recurrent neural network transducer K Zhao, HD Nguyen, A Jain, N Susanj, A Mouchtaris, L Gupta, M Zhao 23rd Interspeech Conference, 2022 | 11 | 2022 |
Optimizing memory-access patterns for deep learning accelerators H Zheng, S Oh, H Wang, P Briggs, J Gai, A Jain, Y Liu, R Heaton, ... arXiv preprint arXiv:2002.12798, 2020 | 11 | 2020 |
Automatic attention pruning: Improving and automating model pruning using attentions K Zhao, A Jain, M Zhao International Conference on Artificial Intelligence and Statistics, 10470-10486, 2023 | 10 | 2023 |
Efficient utilization of processing element array JT Huynh, R Diamant, H Zheng, Y Liu, A Jain, Y Wang, V Sharma, ... US Patent 11,741,350, 2023 | 8 | 2023 |
Iterative activation-based structured pruning K Zhao, A Jain, M Zhao arXiv preprint arXiv:2201.09881, 2022 | 5 | 2022 |
Efficient data encoding for deep neural network training A Phanishayee, G Pekhimenko, A Jain US Patent 11,715,002, 2023 | 4 | 2023 |
Adaptive activation-based structured pruning K Zhao, A Jain, M Zhao arXiv preprint arXiv:2201.10520, 2022 | 4 | 2022 |
Automated backend-aware post-training quantization Z Jiang, A Jain, A Liu, J Fromm, C Ma, T Chen, L Ceze arXiv preprint arXiv:2103.14949, 2021 | 4 | 2021 |
High Radix On-Chip Networks at Incremental Reconfiguration Costs A Jain, R Parikh, V Bertacco 23rd International Workshop on Logic and Synthesis, 2014 | 4 | 2014 |
Architectural support for convolutional neural networks on modern cpus A Jain, MA Laurenzano, GA Pokam, J Mars, L Tang Proceedings of the 27th International Conference on Parallel Architectures …, 2018 | 3 | 2018 |
Cpsa: Compute precisely store approximately A Jain, P Hill, MA Laurenzano, ME Haque, M Khan, S Mahlke, L Tang, ... Workshop on Approximate Computing Across the Stack, 2016 | 3 | 2016 |