Jidong Zhai
Jidong Zhai
Home
News
People
Publications
Projects
Teaching
Books
Awards
Service
Contact
中文版
Wenguang Chen
Latest
GLM-130B: An Open Bilingual Pre-trained Model
GraphSet: High Performance Graph Mining through Equivalent Set Transformations
BaGuaLu: targeting brain scale pretrained models with over 37 million cores
Detecting Performance Variance for Parallel Applications Without Source Code
GLM-130B: An Open Bilingual Pre-trained Model
Leveraging Code Snippets to Detect Variations in the Performance of HPC Systems
Vapro: performance variance detection and diagnosis for production-run parallel applications
A Fast Lock for Explicit Message Passing Architectures
AIPerf: Automated machine learning as an AI-HPC benchmark
Automatic Irregularity-Aware Fine-Grained Workload Partitioning on Integrated Architectures
TADOC: Text analytics directly on compression
AIPerf: Automated machine learning as an AI-HPC benchmark
TADOC: Text Analytics Directly on Compression
HiWayLib: A Software Framework for Enabling High Performance Communications for Heterogeneous Pipeline Computations
pLock: A Fast Lock for Architectures with Explicit Inter-core Message Passing
Spread-n-share: improving application performance and cluster throughput with resource-aware job placement
An Efficient In-Memory Checkpoint Method and its Practice on Fault-Tolerant HPL
Efficient Document Analytics on Compressed Data: Method, Challenges, Algorithms, Insights
Spindle: Informed Memory Access Monitoring
vSensor: leveraging fixed-workload snippets of programs for performance variance detection
Zwift: A Programming Framework for High Performance Text Analytics on Compressed Data
FinePar: irregularity-aware fine-grained workload partitioning on integrated architectures
Scalable Graph Traversal on Sunway TaihuLight with Ten Million Cores
Self-Checkpoint: An In-Memory Checkpoint Method Using Less Space and Its Practice on Fault-Tolerant HPL
Understanding Co-Running Behaviors on Integrated CPU/GPU Architectures
Versapipe: a versatile programming framework for pipelined computing on GPU
A survey of cloud resource management for complex engineering applications
Building Semi-Elastic Virtual Clusters for Cost-Effective HPC Cloud Resource Provisioning
Characterizing and optimizing TPC-C workloads on large-scale systems using SSD arrays
Performance Prediction for Large-Scale Parallel Applications Using Representative Replay
A Power-Conserving Online Scheduling Scheme for Video Streaming Services
Automatic Cloud I/O Configurator for I/O Intensive Parallel Applications
Optimizing seam carving on multi-GPU systems for real-time content-aware image resizing
To Co-run, or Not to Co-run: A Performance Study on Integrated Architectures
CYPRESS: Combining Static and Dynamic Analysis for Top-Down Communication Trace Compression
Optimizing Seam Carving on multi-GPU systems for real-time image resizing
ACIC: automatic cloud I/O configurator for HPC applications
ACIC: automatic cloud I/O configurator for parallel applications
Cost-effective cloud HPC resource provisioning by building semi-elastic virtual clusters
Employing Checkpoint to Improve Job Scheduling in Large-Scale Systems
Cloud versus in-house cluster: evaluating Amazon cluster compute instances for running MPI applications
Efficiently Acquiring Communication Traces for Large-Scale Parallel Applications
One optimized I/O configuration per HPC application: leveraging the configurability of cloud
PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single node
FACT: fast communication trace collection for parallel applications through program slicing
LogGPO: An accurate communication model for performance prediction of MPI programs
Process Mapping for MPI Collective Communications
Cite
×