Search

Home
Publications
Contact

Light Dark Automatic

Weimin Zheng

Latest

Leveraging Code Snippets to Detect Variations in the Performance of HPC Systems
Chukonu: A Fully-Featured High-Performance Big Data Framework That Integrates a Native Compute Engine into Spark
BaGuaLu: Targeting Brain Scale Pretrained Models with over 37 Million Cores
ShenTu: Processing Multi-Trillion Edge Graphs on Millions of Cores in Seconds
An Efficient In-Memory Checkpoint Method and its Practice on Fault-Tolerant HPL
Scalable Graph Traversal on Sunway TaihuLight with Ten Million Cores
Self-Checkpoint: An In-Memory Checkpoint Method Using Less Space and Its Practice on Fault-Tolerant HPL
Characterizing and optimizing TPC-C workloads on large-scale systems using SSD arrays
Building Semi-Elastic Virtual Clusters for Cost-Effective HPC Cloud Resource Provisioning
Performance Prediction for Large-Scale Parallel Applications Using Representative Replay
Gemini: A Computation-Centric Distributed Graph Processing System
Taming Hardware Event Samples for Precise and Versatile Feedback Directed Optimizations
Employing Checkpoint to Improve Job Scheduling in Large-Scale Systems
CUDA-Zero: a framework for porting shared memory GPU applications to multi-GPUs
OpenMDSP: Extending OpenMP to Program Multi-Core DSP
RACEZ: a lightweight and non-invasive race detection tool for production applications
Do I Use the Wrong Definition? DeFuse: Definition-Use Invariants for Detecting Concurrency and Sequential Bugs
How OpenMP Applications Get More Benefit from Many-Core Era
MapCG: Writing Parallel Program Portable between CPU and GPU
PHANTOM: Predicting Performance of Parallel Applications on Large-Scale Parallel Machines Using a Single Node
Taming Hardware Event Samples for FDO Compilation
LogGPO: An accurate communication model for performance prediction of MPI programs
Cache Sharing Management for Performance Fairness in Chip Multiprocessors
FACT: Fast Communication Trace Collection for Parallel Applications through Program Slicing
MPIWiz: Subgroup Reproducible Replay of Mpi Applications
Process Mapping for MPI Collective Communications
Exploring the Emerging Applications for Transactional Memory
Maotai: View-Oriented Parallel Programming on CMT Processors
Parallelization and Characterization of Probabilistic Latent Semantic Analysis
CprFS: A User-Level File System to Support Consistent File States for Checkpoint and Restart
OpenUH: An Optimizing, Portable OpenMP Compiler: Research Articles
VODCA: View-Oriented, Distributed, Cluster-Based Approach to Parallel Computing
Parallelization of module network structure learning and performance tuning on SMP
Tree partition based parallel frequent pattern mining on shared memory systems
Parallel module network learning on distributed memory multiprocessors
A Dynamic Energy Conservation Scheme for Clusters in Computing Centers
Hierarchical Parallel Simulated Annealing and Its Applications
Parallelization of Bayesian network based SNPs pattern analysis and performance characterization on SMP/HT
A Single Thread Discrete Event Simulation Toolkit for Java: STSimJ
On the Malicious Participants Problem in Computational Grid
Communication optimization for SMP clusters
基于图划分的全基因组并行拼接算法
基于内存功能划分的并行程序检查点策略研究
基于Range Test的交互式数据相关性分析技术
一个交互式的Fortran77并行化系统
静态性能分析的训练集自动生成工具

© 2025 Wenguang Chen. This work is licensed under CC BY NC ND 4.0

Published with Wowchemy — the free, open source website builder that empowers creators.

Cite