Wenguang Chen
Wenguang Chen
Home
Publications
Contact
Light
Dark
Automatic
Fault tolerance
Self-Checkpoint: An In-Memory Checkpoint Method Using Less Space and Its Practice on Fault-Tolerant HPL
Fault tolerance is increasingly important in high performance computing due to the substantial growth of system scale and decreasing …
Xiongchao Tang
,
Jidong Zhai
,
Bowen Yu
,
Wenguang Chen
,
Weimin Zheng
PDF
Cite
DOI
URL
CprFS: A User-Level File System to Support Consistent File States for Checkpoint and Restart
Checkpoint and Restart (CPR) is becoming critical to large scale parallel computers, whose Mean Time Between Failures (MTBF) may be …
Ruini Xue
,
Wenguang Chen
,
Weimin Zheng
PDF
Cite
DOI
URL
Cite
×