Xiaolong Xie
Xiaolong Xie
Research & Development Engineer, Database BU, Alibaba Inc.
Verified email at alibaba-inc.com
Title
Cited by
Cited by
Year
Coordinated static and dynamic cache bypassing for GPUs
X Xie, Y Liang, Y Wang, G Sun, T Wang
2015 IEEE 21st International Symposium on High Performance Computer …, 2015
1232015
An efficient compiler framework for cache bypassing on GPUs
X Xie, Y Liang, G Sun, D Chen
2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 516-523, 2013
1012013
Enabling Coordinated Register Allocation and Thread-level Parallelism Optimization for GPUs
X Xie, Y Liang, X Li, Y Wu, S Guangyu, T Wang, D Fan
IEEE/ACM International Symposium on Microarchitecture,, 2015
662015
CuMF_SGD: Parallelized stochastic gradient descent for matrix factorization on GPUs
X Xie, W Tan, LL Fong, Y Liang
Proceedings of the 26th International Symposium on High-Performance Parallel …, 2017
232017
An Efficient Compiler Framework for Cache Bypassing on GPUs
Y Liang, X Xie, G Sun, D Chen
IEEE, 2015
222015
CuMF_SGD: Fast and scalable matrix factorization
X Xie, W Tan, LL Fong, Y Liang
arXiv preprint arXiv:1610.05838, 2016
152016
Performance-centric register file design for GPUs using racetrack memory
S Wang, Y Liang, C Zhang, X Xie, G Sun, Y Liu, Y Wang, X Li
2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC), 25-30, 2016
152016
Optimizing cache bypassing and warp scheduling for GPUs
Y Liang, X Xie, Y Wang, G Sun, T Wang
IEEE Transactions on Computer-Aided Design of Integrated Circuits and …, 2017
62017
CRAT: Enabling coordinated register allocation and thread-level parallelism optimization for GPUs
X Xie, Y Liang, X Li, Y Wu, G Sun, T Wang, D Fan
IEEE Transactions on Computers 67 (6), 890-897, 2017
52017
Exploring cache bypassing and partitioning for multi-tasking on GPUs
Y Liang, X Li, X Xie
2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 9-16, 2017
52017
CuLDA: solving large-scale LDA Problems on GPUs
X Xie, Y Liang, X Li, W Tan
Proceedings of the 28th International Symposium on High-Performance Parallel …, 2019
32019
CuLDA_CGS: Solving Large-scale LDA Problems on GPUs
X Xie, L Yun, X Li, W Tan
arxiv preprint, 2018
32018
Adaptive parallelism of task execution on machines with accelerators
LL Fong, W Tan, X Xie, H Zhou
US Patent 10,203,988, 2019
22019
Efficient data-parallel primitives on heterogeneous systems
Z Lai, Q Luo, X Xie
Proceedings of the 48th International Conference on Parallel Processing, 1-10, 2019
12019
Matrix factorization with two-stage data block dispatch associated with graphics processing units
E Duesterwald, LL Fong, W Tan, X Xie
US Patent 10,380,222, 2019
2019
The system can't perform the operation now. Try again later.
Articles 1–15