Zheng Ziming
Zheng Ziming
Postdoctoral Scholar,The University of Chicago
Verified email at
Cited by
Cited by
System log pre-processing to improve failure prediction
Z Zheng, Z Lan, BH Park, A Geist
Dependable Systems & Networks, 2009. DSN'09. IEEE/IFIP International …, 2009
Toward automated anomaly identification in large-scale systems
Z Lan, Z Zheng, Y Li
IEEE Transactions on Parallel and Distributed Systems 21 (2), 174-187, 2010
Co-analysis of RAS log and job log on Blue Gene/P
Z Zheng, L Yu, W Tang, Z Lan, R Gupta, N Desai, S Coghlan, D Buettner
Parallel & Distributed Processing Symposium (IPDPS), 2011 IEEE International …, 2011
Practical online failure prediction for blue gene/p: Period-based vs event-driven
L Yu, Z Zheng, Z Lan, S Coghlan
Dependable Systems and Networks Workshops (DSN-W), 2011 IEEE/IFIP 41st …, 2011
A practical failure prediction with location and lead time for Blue Gene/P
Z Zheng, Z Lan, R Gupta, S Coghlan, P Beckman
Dependable Systems and Networks Workshops (DSN-W), 2010 International …, 2010
Dynamic meta-learning for failure prediction in large-scale systems: A case study
J Gu, Z Zheng, Z Lan, J White, E Hocks, BH Park
Parallel Processing, 2008. ICPP'08. 37th International Conference on, 157-164, 2008
A study of dynamic meta-learning for failure prediction in large-scale systems
Z Lan, J Gu, Z Zheng, R Thakur, S Coghlan
Journal of Parallel and Distributed Computing 70 (6), 630-643, 2010
When is multi-version checkpointing needed?
G Lu, Z Zheng, AA Chien
Proceedings of the 3rd Workshop on Fault-tolerance for HPC at extreme scale …, 2013
Reliability-aware scalability models for high performance computing
Z Zheng, Z Lan
Cluster Computing and Workshops, 2009. CLUSTER'09. IEEE International …, 2009
3-Dimensional root cause diagnosis via co-analysis
Z Zheng, L Yu, Z Lan, T Jones
Proceedings of the 9th international conference on Autonomic computing, 181-190, 2012
Versioned distributed arrays for resilience in scientific applications: Global view resilience
A Chien, P Balaji, P Beckman, N Dun, A Fang, H Fujita, K Iskra, ...
Procedia Computer Science 51, 29-38, 2015
Anomaly localization in large-scale clusters
Z Zheng, Y Li, Z Lan
Cluster Computing, 2007 IEEE International Conference on, 322-330, 2007
Reliability-aware speedup models for parallel applications with coordinated checkpointing/restart
Z Zheng, L Yu, Z Lan
IEEE Transactions on Computers 64 (5), 1402-1415, 2015
Filtering log data: Finding the needles in the Haystack
L Yu, Z Zheng, Z Lan, T Jones, JM Brandt, AC Gentile
Dependable Systems and Networks (DSN), 2012 42nd Annual IEEE/IFIP …, 2012
Performance under failures of DAG-based parallel computing
H Jin, XH Sun, Z Zheng, Z Lan, B Xie
Cluster Computing and the Grid, 2009. CCGRID'09. 9th IEEE/ACM International …, 2009
Fault tolerance in an inner-outer solver: a gvr-enabled case study
Z Zheng, AA Chien, K Teranishi
International Conference on High Performance Computing for Computational …, 2014
A fault diagnosis and prognosis service for teragrid clusters
Z Lan, Y Li, P Gujrati, Z Zheng, R Thakur, J White
Proc. of The 2nd TeraGrid Conference, 2007
Exploring versioned distributed arrays for resilience in scientific applications: global view resilience
A Chien, P Balaji, N Dun, A Fang, H Fujita, K Iskra, Z Rubenstein, Z Zheng, ...
The International Journal of High Performance Computing Applications …, 2016
Towards a faultaware computing environment
XH Sun, Z Lan, Y Li, H Jin, Z Zheng
Proceedings of the High Availability and Performance Computing Workshop (HAPCW), 2008
Error checking and snapshot-based recovery in a preconditioned conjugate gradient solver
Z Rubenstein, H Fujita, Z Zheng, A Chien
Technical Report TR-2013-11, Department of Computer Science, University of …, 2013
The system can't perform the operation now. Try again later.
Articles 1–20