Xiuhong Li

Cited by

	All	Since 2019
Citations	394	324
h-index	12	10
i10-index	12	10

20152016201720182019202020212022202320241 12 20 35 54 47 54 74 65 30

Public access

View all

15 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Yun (Eric) LiangProfessor of EECS, Peking University, ACM Distinguished ScientistVerified email at pku.edu.cn
Shengen YanThe Chinese University of HongKongVerified email at ie.cuhk.edu.hk
Xiaolong XieResearch Engineer, Damo Academy, Alibaba Group.Verified email at alibaba-inc.com
Size ZhengPeking UniversityVerified email at pku.edu.cn
Xuechao WeiPeking UniversityVerified email at pku.edu.cn

Xiuhong Li

Peking University

Verified email at pku.edu.cn

GPGPU Compiler Deep Learning


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Enabling coordinated register allocation and thread-level parallelism optimization for GPUs X Xie, Y Liang, X Li, Y Wu, G Sun, T Wang, D Fan Proceedings of the 48th International Symposium on Microarchitecture, 395-406, 2015	81	2015
TGPA: tile-grained pipeline architecture for low latency CNN inference X Wei, Y Liang, X Li, CH Yu, P Zhang, J Cong 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 1-8, 2018	73	2018
A coordinated tiling and batching framework for efficient GEMM on GPUs X Li, Y Liang, S Yan, L Jia, Y Li Proceedings of the 24th symposium on principles and practice of parallel …, 2019	53	2019
AMOS: enabling automatic mapping for tensor computations on spatial accelerators with hardware abstraction S Zheng, R Chen, A Wei, Y Jin, Q Han, L Lu, B Wu, X Li, S Yan, Y Liang Proceedings of the 49th Annual International Symposium on Computer …, 2022	30	2022
Performance-centric register file design for GPUs using racetrack memory S Wang, Y Liang, C Zhang, X Xie, G Sun, Y Liu, Y Wang, X Li 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC), 25-30, 2016	21	2016
Enabling efficient fast convolution algorithms on GPUs via MegaKernels L Jia, Y Liang, X Li, L Lu, S Yan IEEE Transactions on Computers 69 (7), 986-997, 2020	18	2020
CRAT: Enabling coordinated register allocation and thread-level parallelism optimization for GPUs X Xie, Y Liang, X Li, Y Wu, G Sun, T Wang, D Fan IEEE Transactions on Computers 67 (6), 890-897, 2017	17	2017
Efficient kernel management on GPUs X Li, Y Liang 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), 85-90, 2016	16	2016
Flashdecoding++: Faster large language model inference on gpus K Hong, G Dai, J Xu, Q Mao, X Li, J Liu, K Chen, H Dong, Y Wang arXiv preprint arXiv:2311.01282, 2023	15	2023
cuMBIR: An efficient framework for low-dose X-ray CT image reconstruction on GPUs X Li, Y Liang, W Zhang, T Liu, H Li, G Luo, M Jiang Proceedings of the 2018 International Conference on Supercomputing, 184-194, 2018	13	2018
Exploring cache bypassing and partitioning for multi-tasking on GPUs Y Liang, X Li, X Xie 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 9-16, 2017	12	2017
Efficient kernel management on GPUs Y Liang, X Li ACM Transactions on Embedded Computing Systems (TECS) 16 (4), 1-24, 2017	12	2017
Neoflow: A flexible framework for enabling efficient compilation for high performance dnn training S Zheng, R Chen, Y Jin, A Wei, B Wu, X Li, S Yan, Y Liang IEEE Transactions on Parallel and Distributed Systems 33 (11), 3220-3232, 2021	9	2021
Chimera: An analytical optimizing framework for effective compute-intensive operators fusion S Zheng, S Chen, P Song, R Chen, X Li, S Yan, D Lin, J Leng, Y Liang 2023 IEEE International Symposium on High-Performance Computer Architecture …, 2023	7	2023
CuLDA: solving large-scale LDA Problems on GPUs X Xie, Y Liang, X Li, W Tan Proceedings of the 28th International Symposium on High-Performance Parallel …, 2019	7	2019
CuLDA_CGS: Solving large-scale LDA problems on GPUs X Xie, Y Liang, X Li, W Tan Proceedings of the 24th Symposium on Principles and Practice of Parallel …, 2019	6	2019
Theoretical linear convergence of deep unfolding network for block-sparse signal recovery R Fu, Y Liu, X Li Third International Conference on Computer Science and Communication …, 2022	2	2022
LongTail-Bench: A Benchmark Suite for Domain-Specific Operators in Deep Learning X Li, S Yan, L Jiang, P Xu, J Ma, X Zhang, D Lin 2022 IEEE International Symposium on Workload Characterization (IISWC), 282-295, 2022	1	2022
EasyView: Enabling and Scheduling Tensor Views in Deep Learning Compilers L Jiang, P Xu, Q Zhu, X Li, S Yan, X Zhang, D Lin, W Ma, Z Li, J Liu, J Ma, ... Proceedings of the 51st International Conference on Parallel Processing, 1-11, 2022	1	2022
Proteus: Simulating the Performance of Distributed DNN Training J Duan, X Li, P Xu, X Zhang, S Yan, Y Liang, D Lin arXiv preprint arXiv:2306.02267, 2023		2023

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors