Publications

Yan Liu, Jianxin Lai, Long Li, Tianxiang Sui, Linjie Xiao, Peng Yuan, Xiaojing Zhang, Qing Zhu, Wenguang Chen, Jingling Xue (2025). ReSBM: Region-based Scale and Minimal-Level Bootstrapping Management for FHE via Min-Cut. Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1.

PDF Cite DOI URL

Zhenbo Sun, Shengqi Chen, Yuanwei Wang, Jian Sha, Guanyu Feng, Wenguang Chen (2025). MEPipe: Democratizing LLM Training with Memory-Efficient Slice-Level Pipeline Scheduling on Cost-Effective Accelerators. Proceedings of the Twentieth European Conference on Computer Systems.

PDF Cite DOI URL

Mingzhe Zhang, Xiaochen Hao, Hongbo Rong, Wenguang Chen (2025). MatFactory: A Framework for High-Performance Matrix Factorization on FPGAs. Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design.

PDF Cite DOI URL

Wen-jie Lu, Zhicong Huang, Zhen Gu, Jingyu Li, Jian Liu, Cheng Hong, Kui Ren, Tao Wei, Wenguang Chen (2025). BumbleBee: Secure Two-party Inference Framework for Large Transformers.

PDF Cite DOI

Long Li, Jianxin Lai, Peng Yuan, Tianxiang Sui, Yan Liu, Qing Zhu, Xiaojing Zhang, Linjie Xiao, Wenguang Chen, Jingling Xue (2025). ANT-ACE: An FHE Compiler Framework for Automating Neural Network Inference. Proceedings of the 23rd ACM/IEEE International Symposium on Code Generation and Optimization.

PDF Cite DOI URL

Xiaohui Duan, Yi Zhang, Kai Xu, Haohuan Fu, Bin Yang, Yiming Wang, Yilun Han, Siyuan Chen, Zhuangzhuang Zhou, Chenyu Wang, Dongqiang Huang, Huihai An, Xiting Ju, Haopeng Huang, Zhuang Liu, Wei Xue, Weiguo Liu, Bowen Yan, Jianye Hou, Maoxue Yu, Wenguang Chen, Jian Li, Zhao Jing, Hailong Liu, Lixin Wu (2025). An AI-Enhanced 1km-Resolution Seamless Global Weather and Climate Model to Achieve Year-Scale Simulation Speed using 34 Million Cores. Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming.

PDF Cite DOI URL

Kairong Luo, Haodong Wen, Shengding Hu, Zhenbo Sun, Maosong Sun, Zhiyuan Liu, Kaifeng Lyu, Wenguang Chen (2025). A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules. The Thirteenth International Conference on Learning Representations.

PDF Cite URL

Zheng Chen, Feng Zhang, Yang Chen, Xiaokun Fang, Guanyu Feng, Xiaowei Zhu, Wenguang Chen, Xiaoyong Du (2024). Enabling Window-Based Monotonic Graph Analytics with Reusable Transitional Results for Pattern-Consistent Queries. Proc. VLDB Endow..

PDF Cite DOI URL

Heng Lin, Zhiyong Wang, Shipeng Qi, Xiaowei Zhu, Chuntao Hong, Wenguang Chen, Yingwei Luo (2024). Building a High-Performance Graph Storage on Top of Tree-Structured Key-Value Stores. Big Data Mining and Analytics.

PDF Cite DOI

Zhicong Huang, Wen-jie Lu, Yuchen Wang, Cheng Hong, Tao Wei, Wenguang Chen (2024). Coral: Maliciously Secure Computation Framework for Packed and Mixed Circuits. Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security.

PDF Cite DOI URL

Zhenbo Sun, Huanqi Cao, Yuanwei Wang, Guanyu Feng, Shengqi Chen, Haojie Wang, Wenguang Chen (2024). AdaPipe: Optimizing Pipeline Parallelism with Adaptive Recomputation and Partitioning. Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3.

PDF Cite DOI URL

Huanqi Cao, Shizhi Tang, Qianchao Zhu, Bowen Yu, Wenguang Chen (2023). Mat2Stencil: A Modular Matrix-Based DSL for Explicit and Implicit Matrix-Free PDE Solvers on Structured Grid. Proc. ACM Program. Lang..

PDF Cite DOI URL

Ziheng Zhou, Ying Zhao, Yiyu Qing, Wenming Jiang, Yihan Wu, Wenguang Chen (2023). A Physics-guided NN-based Approach for Tropical Cyclone Intensity Estimation. Proceedings of the 2023 SIAM International Conference on Data Mining (SDM).

Cite DOI URL

Jidong Zhai, Liyan Zheng, Jinghan Sun, Feng Zhang, Xiongchao Tang, Xuehai Qian, Bingsheng He, Wei Xue, Wenguang Chen, Weimin Zheng (2022). Leveraging Code Snippets to Detect Variations in the Performance of HPC Systems. IEEE Transactions on Parallel and Distributed Systems.

Cite DOI

Jianqiang Huang, Haojie Wang, Xiang Fei, Xiaoying Wang, Wenguang Chen (2022). $TC-Stream$TC-Stream: Large-Scale Graph Triangle Counting on a Single Machine Using GPUs. IEEE Transactions on Parallel and Distributed Systems.

Cite DOI

Guanyu Feng, Huanqi Cao, Xiaowei Zhu, Bowen Yu, Yuanwei Wang, Zixuan Ma, Shengqi Chen, Wenguang Chen (2022). TriCache: A User-Transparent Block Cache Enabling High-Performance Out-of-Core Processing with In-Memory Programs. 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22).

PDF Cite URL

Bowen Yu, Guanyu Feng, Huanqi Cao, Xiaohan Li, Zhenbo Sun, Haojie Wang, Xiaowei Zhu, Weimin Zheng, Wenguang Chen (2022). Chukonu: A Fully-Featured High-Performance Big Data Framework That Integrates a Native Compute Engine into Spark. Proc. VLDB Endow..

PDF Cite DOI URL

Liyan Zheng, Jidong Zhai, Xiongchao Tang, Haojie Wang, Teng Yu, Yuyang Jin, Shuaiwen Leon Song, Wenguang Chen (2022). Vapro: Performance Variance Detection and Diagnosis for Production-Run Parallel Applications. Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.

PDF Cite DOI URL

Huanqi Cao, Yuanwei Wang, Haojie Wang, Heng Lin, Zixuan Ma, Wanwang Yin, Wenguang Chen (2022). Scaling Graph Traversal to 281 Trillion Edges with 40 Million Cores. Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.

PDF Cite DOI URL

Zixuan Ma, Jiaao He, Jiezhong Qiu, Huanqi Cao, Yuanwei Wang, Zhenbo Sun, Liyan Zheng, Haojie Wang, Shizhi Tang, Tianyu Zheng, Junyang Lin, Guanyu Feng, Zeqiang Huang, Jie Gao, Aohan Zeng, Jianwei Zhang, Runxin Zhong, Tianhui Shi, Sha Liu, Weimin Zheng, Jie Tang, Hongxia Yang, Xin Liu, Jidong Zhai, Wenguang Chen (2022). BaGuaLu: Targeting Brain Scale Pretrained Models with over 37 Million Cores. Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.

PDF Cite DOI URL

Xiaohan Li, Bowen Yu, Guanyu Feng, Haojie Wang, Wenguang Chen (2021). LotusSQL: SQL engine for high-performance big data systems. Big Data Mining and Analytics.

PDF Cite DOI

Yiming Zhang, Kai Lu, Wenguang Chen (2021). Processing Extreme-Scale Graphs on China's Supercomputers. Commun. ACM.

PDF Cite DOI URL

Xiongchao Tang, Chen Zhang, Jidong Zhai, Xuehai Qian, Wenguang Chen, Yong Jiang (2021). A Fast Lock for Explicit Message Passing Architectures. IEEE Transactions on Computers.

Cite DOI

Zhixiang Ren, Yongheng Liu, Tianhui Shi, Lei Xie, Yue Zhou, Jidong Zhai, Youhui Zhang, Yunquan Zhang, Wenguang Chen (2021). AIPerf: Automated machine learning as an AI-HPC benchmark. Big Data Mining and Analytics.

PDF Cite DOI

Xiaowei Zhu, Zhisong Fu, Zhenxuan Pan, Jin Jiang, Chuntao Hong, Yongchao Liu, Yang Fang, Wenguang Chen, Changhua He (2021). Taking the Pulse of Financial Activities with Online Graph Processing. SIGOPS Oper. Syst. Rev..

PDF Cite DOI URL

Feng Zhang, Jidong Zhai, Bo Wu, Bingsheng He, Wenguang Chen, Xiaoyong Du (2021). Automatic Irregularity-Aware Fine-Grained Workload Partitioning on Integrated Architectures. IEEE Transactions on Knowledge and Data Engineering.

PDF Cite DOI

Bowen Yu, Huanqi Cao, Tianyi Shan, Haojie Wang, Xiongchao Tang, Wenguang Chen (2021). Sparker: Efficient Reduction for More Scalable Machine Learning with Spark. 50th International Conference on Parallel Processing.

PDF Cite DOI URL

Guanyu Feng, Zixuan Ma, Daixuan Li, Shengqi Chen, Xiaowei Zhu, Wentao Han, Wenguang Chen (2021). RisGraph: A Real-Time Streaming System for Evolving Graphs to Support Sub-Millisecond Per-Update Analysis at Millions Ops/s. Proceedings of the 2021 International Conference on Management of Data.

PDF Cite DOI URL

Yu Zhang, Chunming Hu, Mingliang Zeng, Yitong Huang, Wenguang Chen, Yuanwei Wang (2021). Encouraging Compiler Optimization Practice for Undergraduate Students through Competition. Proceedings of the 26th ACM Conference on Innovation and Technology in Computer Science Education V. 1.

PDF Cite DOI URL

Jiping Yu, Wei Qin, Xiaowei Zhu, Zhenbo Sun, Jianqiang Huang, Xiaohan Li, Wenguang Chen (2021). DFOGraph: An I/O- and Communication-Efficient System for Distributed Fully-out-of-Core Graph Processing. Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.

PDF Cite DOI URL

Feng Zhang, Jidong Zhai, Xipeng Shen, Dalin Wang, Zheng Chen, Onur Mutlu, Wenguang Chen, Xiaoyong Du (2020). TADOC: Text Analytics Directly on Compression. The VLDB Journal.

PDF Cite DOI URL

Xiaowei Zhu, Guanyu Feng, Marco Serafini, Xiaosong Ma, Jiping Yu, Lei Xie, Ashraf Aboulnaga, Wenguang Chen (2020). LiveGraph: A Transactional Graph Storage System with Purely Sequential Adjacency List Scans. Proc. VLDB Endow..

PDF Cite DOI URL

Jianqiang Huang, Wei Qin, Xiaoying Wang, Wenguang Chen (2020). Survey of External Memory Large-Scale Graph Processing on a Multi-Core System. J. Supercomput..

PDF Cite DOI URL

Nitish Srivastava, Hongbo Rong, Prithayan Barua, Guanyu Feng, Huanqi Cao, Zhiru Zhang, David Albonesi, Vivek Sarkar, Wenguang Chen, Paul Petersen, Geoff Lowney, Adam Herr, Christopher Hughes, Timothy Mattson, Pradeep Dubey (2019). T2S-Tensor: Productively Generating High-Performance Spatial Hardware for Dense Tensor Computations. 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

PDF Cite DOI

Wentao Han, Kaiwei Li, Shimin Chen, Wenguang Chen (2019). Auxo: a temporal graph management system. Big Data Mining and Analytics.

PDF Cite DOI

Rui Zhang, Wenguang Chen, Tse-Chuan Hsu, Hongji Yang, Yeh-Ching Chung (2019). ANG: A Combination of Apriori and Graph Computing Techniques for Frequent Itemsets Mining. J. Supercomput..

PDF Cite DOI URL

Xiongchao Tang, Jidong Zhai, Xuehai Qian, Wenguang Chen (2019). PLock: A Fast Lock for Architectures with Explicit Inter-Core Message Passing. Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems.

PDF Cite DOI URL

Zhen Zheng, Chanyoung Oh, Jidong Zhai, Xipeng Shen, Youngmin Yi, Wenguang Chen (2019). HiWayLib: A Software Framework for Enabling High Performance Communications for Heterogeneous Pipeline Computations. Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems.

PDF Cite DOI URL

Heng Lin, Xiaowei Zhu, Bowen Yu, Xiongchao Tang, Wei Xue, Wenguang Chen, Lufei Zhang, Torsten Hoefler, Xiaosong Ma, Xin Liu, Weimin Zheng, Jingfang Xu (2018). ShenTu: Processing Multi-Trillion Edge Graphs on Millions of Cores in Seconds. SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

PDF Cite DOI

Feng Zhang, Jidong Zhai, Xipeng Shen, Onur Mutlu, Wenguang Chen (2018). Efficient Document Analytics on Compressed Data: Method, Challenges, Algorithms, Insights. Proc. VLDB Endow..

PDF Cite DOI URL

Xiongchao Tang, Jidong Zhai, Bowen Yu, Wenguang Chen, Weimin Zheng, Keqin Li (2018). An Efficient In-Memory Checkpoint Method and its Practice on Fault-Tolerant HPL. IEEE Transactions on Parallel and Distributed Systems.

PDF Cite DOI

Feng Zhang, Jidong Zhai, Xipeng Shen, Onur Mutlu, Wenguang Chen (2018). Zwift: A Programming Framework for High Performance Text Analytics on Compressed Data. Proceedings of the 2018 International Conference on Supercomputing.

PDF Cite DOI URL

Xiongchao Tang, Jidong Zhai, Xuehai Qian, Bingsheng He, Wei Xue, Wenguang Chen (2018). VSensor: Leveraging Fixed-Workload Snippets of Programs for Performance Variance Detection. Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.

PDF Cite DOI URL

Haojie Wang, Jidong Zhai, Xiongchao Tang, Bowen Yu, Xiaosong Ma, Wenguang Chen (2018). Spindle: Informed Memory Access Monitoring. Proceedings of the 2018 USENIX Conference on Usenix Annual Technical Conference.

PDF Cite

Yu Ji, Youhui Zhang, Wenguang Chen, Yuan Xie (2018). Bridge the Gap between Neural Networks and Neuromorphic Hardware with a Neural Network Compiler. Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems.

Cite DOI URL

Zhen Zheng, Chanyoung Oh, Jidong Zhai, Xipeng Shen, Youngmin Yi, Wenguang Chen (2017). VersaPipe: A Versatile Programming Framework for Pipelined Computing on GPU. 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

PDF Cite

Heng Lin, Xiongchao Tang, Bowen Yu, Youwei Zhuo, Wenguang Chen, Jidong Zhai, Wanwang Yin, Weimin Zheng (2017). Scalable Graph Traversal on Sunway TaihuLight with Ten Million Cores. 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

PDF Cite DOI

Feng Zhang, Jidong Zhai, Bingsheng He, Shuhao Zhang, Wenguang Chen (2017). Understanding Co-Running Behaviors on Integrated CPU/GPU Architectures. IEEE Transactions on Parallel and Distributed Systems.

Cite DOI

Maohua Zhu, Youwei Zhuo, Chao Wang, Wenguang Chen, Yuan Xie (2017). Performance evaluation and optimization of HBM-Enabled GPU for data-intensive applications. Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

PDF Cite DOI

Feng Zhang, Bo Wu, Jidong Zhai, Bingsheng He, Wenguang Chen (2017). FinePar: Irregularity-aware fine-grained workload partitioning on integrated architectures. 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

PDF Cite DOI

Xiongchao Tang, Jidong Zhai, Bowen Yu, Wenguang Chen, Weimin Zheng (2017). Self-Checkpoint: An In-Memory Checkpoint Method Using Less Space and Its Practice on Fault-Tolerant HPL. Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.

PDF Cite DOI URL

Kaiwei Li, Jianfei Chen, Wenguang Chen, Jun Zhu (2017). SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs. Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems.

PDF Cite DOI URL

Wenguang Chen, Yugang Niu, Yuanyuan Zou (2017). Congestion control and energy-balanced scheme based on the hierarchy for WSNs. IET Wireless Sensor Systems.

PDF Cite DOI URL

Haohuan Fu, Junfeng Liao, Wei Xue, Lanning Wang, Dexun Chen, Long Gu, Jinxiu Xu, Nan Ding, Xinliang Wang, Conghui He, Shizhen Xu, Yishuang Liang, Jiarui Fang, Yuanchao Xu, Weijie Zheng, Jingheng Xu, Zhen Zheng, Wanjing Wei, Xu Ji, He Zhang, Bingwei Chen, Kaiwei Li, Xiaomeng Huang, Wenguang Chen, Guangwen Yang (2016). Refactoring and Optimizing the Community Atmosphere Model (CAM) on the Sunway TaihuLight Supercomputer. SC ‘16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis.

PDF Cite DOI

Yu Ji, Youhui Zhang, ShuangChen Li, Ping Chi, CiHang Jiang, Peng Qu, Yuan Xie, Wenguang Chen (2016). NEUTRAMS: Neural network transformation and co-design under neuromorphic hardware constraints. 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

PDF Cite DOI

Youhui Zhang, Yu Ji, Wenguang Chen, Yuan Xie (2016). Neural network transformation under hardware constraints. 2016 International Conference on Compliers, Architectures, and Sythesis of Embedded Systems (CASES).

Cite DOI

Jidong Zhai, Feng Zhang, Qingwen Li, Wenguang Chen, Weimin Zheng (2016). Characterizing and optimizing TPC-C workloads on large-scale systems using SSD arrays. Science China Information Sciences.

PDF Cite DOI URL

Jidong Zhai, Wenguang Chen, Weimin Zheng, Keqin Li (2016). Performance Prediction for Large-Scale Parallel Applications Using Representative Replay. IEEE Transactions on Computers.

PDF Cite DOI

Shuangcheng Niu, Jidong Zhai, Xiaosong Ma, Xiongchao Tang, Wenguang Chen, Weimin Zheng (2016). Building Semi-Elastic Virtual Clusters for Cost-Effective HPC Cloud Resource Provisioning. IEEE Transactions on Parallel and Distributed Systems.

PDF Cite DOI

Jianfei Chen, Kaiwei Li, Jun Zhu, Wenguang Chen (2016). WarpLDA: A Cache Efficient O(1) Algorithm for Latent Dirichlet Allocation. Proc. VLDB Endow..

PDF Cite DOI URL

Haibao Chen, Song Wu, Hai Jin, Wenguang Chen, Jidong Zhai, Yingwei Luo, Xiaolin Wang (2016). A survey of cloud resource management for complex engineering applications. Frontiers of Computer Science.

Cite DOI URL

Yunyun Jiang, Yi Yang, Tian Xiao, Tianwei Sheng, Wenguang Chen (2016). DRDDR: a lightweight method to detect data races in Linux kernel. The Journal of Supercomputing.

PDF Cite DOI URL

Jiangzhou He, Wenguang Chen, Zhizhong Tang (2016). NestedMP: Enabling cache-aware thread mapping for nested parallel shared memory applications. Parallel Computing.

Cite DOI URL

Xiaowei Zhu, Wenguang Chen, Weimin Zheng, Xiaosong Ma (2016). Gemini: A Computation-Centric Distributed Graph Processing System. Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation.

PDF Cite

Jidong Zhai, Mingliang Liu, Ye Jin, Xiaosong Ma, Wenguang Chen (2015). Automatic Cloud I/O Configurator for I/O Intensive Parallel Applications. IEEE Transactions on Parallel and Distributed Systems.

Cite DOI

Feng Zhang, Jidong Zhai, Wenguang Chen, Bingsheng He, Shuhao Zhang (2015). To Co-run, or Not to Co-run: A Performance Study on Integrated Architectures. 2015 IEEE 23rd International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

Cite DOI

Xiaowei Zhu, Wentao Han, Wenguang Chen (2015). GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning. Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference.

Cite

Ikjoon Kim, Jidong Zhai, Yan Li, Wenguang Chen (2014). Optimizing Seam Carving on multi-GPU systems for real-time image resizing. 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS).

PDF Cite DOI

Jidong Zhai, Jianfei Hu, Xiongchao Tang, Xiaosong Ma, Wenguang Chen (2014). CYPRESS: Combining Static and Dynamic Analysis for Top-Down Communication Trace Compression. SC ‘14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis.

PDF Cite DOI

Yunyun Jiang, Yi Yang, Tian Xiao, Tianwei Sheng, Wenguang Chen (2014). Kernel data race detection using debug register in Linux. 2014 IEEE COOL Chips XVII.

PDF Cite DOI

Tian Xiao, Jiaxing Zhang, Hucheng Zhou, Zhenyu Guo, Sean McDirmid, Wei Lin, Wenguang Chen, Lidong Zhou (2014). Nondeterminism in MapReduce Considered Harmful? An Empirical Study on Non-Commutative Aggregators in MapReduce Programs. Companion Proceedings of the 36th International Conference on Software Engineering.

PDF Cite DOI URL

Tian Xiao, Zhenyu Guo, Hucheng Zhou, Jiaxing Zhang, Xu Zhao, Chencheng Ye, Xi Wang, Wei Lin, Wenguang Chen, Lidong Zhou (2014). Cybertron: Pushing the Limit on I/O Reduction in Data-Parallel Programs. Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications.

PDF Cite DOI URL

Wentao Han, Youshan Miao, Kaiwei Li, Ming Wu, Fan Yang, Lidong Zhou, Vijayan Prabhakaran, Wenguang Chen, Enhong Chen (2014). Chronos: A Graph Engine for Temporal Graph Analysis. Proceedings of the Ninth European Conference on Computer Systems.

Cite DOI URL

Shi Feng, Hong Zhang, Wenguang Chen (2013). Shall I Use Heterogeneous Data Centers? - A Case Study on Video on Demand Systems. 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing.

Cite DOI

Shuangcheng Niu, Jidong Zhai, Xiaosong Ma, Xiongchao Tang, Wenguang Chen (2013). Cost-effective cloud HPC resource provisioning by building Semi-Elastic virtual clusters. SC ‘13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis.

Cite

Mingliang Liu, Ye Jin, Jidong Zhai, Yan Zhai, Qianqian Shi, Xiaosong Ma, Wenguang Chen (2013). ACIC: Automatic cloud I/O configurator for HPC applications. SC ‘13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis.

PDF Cite DOI

Dehao Chen, Neil Vachharajani, Robert Hundt, Xinliang Li, Stephane Eranian, Wenguang Chen, Weimin Zheng (2013). Taming Hardware Event Samples for Precise and Versatile Feedback Directed Optimizations. IEEE Transactions on Computers.

PDF Cite DOI

Shuangcheng Niu, Jidong Zhai, Xiaosong Ma, Mingliang Liu, Yan Zhai, Wenguang Chen, Weimin Zheng (2013). Employing Checkpoint to Improve Job Scheduling in Large-Scale Systems. Job Scheduling Strategies for Parallel Processing.

PDF Cite

Dehao Chen, Wenguang Chen, Weimin Zheng (2012). CUDA-Zero: a framework for porting shared memory GPU applications to multi-GPUs. Science China Information Sciences.

PDF Cite DOI URL

Yangyang Zhao, Wentao Han, Ruini Xue, Wenguang Chen (2012). SMILE: streaming management of applications and data for mobile terminals. International Journal of Cloud Computing.

PDF Cite DOI URL

Ze Tang, Heng Lin, Kaiwei Li, Wentao Han, Wenguang Chen (2012). Acolyte: An In-Memory Social Network Query System. Web Information Systems Engineering - WISE 2012.

PDF Cite

Jidong Zhai, Tianwei Sheng, Jiangzhou He, Wenguang Chen, Weiming Zheng (2011). Efficiently Acquiring Communication Traces for Large-Scale Parallel Applications. IEEE Transactions on Parallel and Distributed Systems.

Cite DOI

Jiangzhou He, Wenguang Chen, Guangri Chen, Weimin Zheng, Zhizhong Tang, Handong Ye (2011). OpenMDSP: Extending OpenMP to Program Multi-Core DSP. 2011 International Conference on Parallel Architectures and Compilation Techniques.

PDF Cite DOI

Tianwei Sheng, Neil Vachharajani, Stephane Eranian, Robert Hundt, Wenguang Chen, Weimin Zheng (2011). RACEZ: a lightweight and non-invasive race detection tool for production applications. 2011 33rd International Conference on Software Engineering (ICSE).

PDF Cite DOI

Mingliang Liu, Jidong Zhai, Yan Zhai, Xiaosong Ma, Wenguang Chen (2011). One Optimized I/O Configuration per HPC Application: Leveraging the Configurability of Cloud. Proceedings of the Second Asia-Pacific Workshop on Systems.

PDF Cite DOI URL

Yan Zhai, Mingliang Liu, Jidong Zhai, Xiaosong Ma, Wenguang Chen (2011). Cloud versus In-House Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications. State of the Practice Reports.

PDF Cite DOI URL

Hucheng Zhou, Wenguang Chen, Fred Chow (2011). An SSA-Based Algorithm for Optimal Speculative Code Motion under an Execution Profile. Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation.

PDF Cite DOI URL

Dehao Chen, Neil Vachharajani, Robert Hundt, Shih-wei Liao, Vinodha Ramasamy, Paul Yuan, Wenguang Chen, Weimin Zheng (2010). Taming Hardware Event Samples for FDO Compilation. Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization.

PDF Cite DOI URL

Jidong Zhai, Wenguang Chen, Weimin Zheng (2010). PHANTOM: Predicting Performance of Parallel Applications on Large-Scale Parallel Machines Using a Single Node. Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.

PDF Cite DOI URL

Chuntao Hong, Dehao Chen, Wenguang Chen, Weimin Zheng, Haibo Lin (2010). MapCG: Writing Parallel Program Portable between CPU and GPU. Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques.

PDF Cite DOI URL

Jianian Yan, Jiangzhou He, Wentao Han, Wenguang Chen, Weimin Zheng (2010). How OpenMP Applications Get More Benefit from Many-Core Era. Beyond Loop Level Parallelism in OpenMP: Accelerators, Tasking and More.

PDF Cite

Yao Shi, Soyeon Park, Zuoning Yin, Shan Lu, Yuanyuan Zhou, Wenguang Chen, Weimin Zheng (2010). Do I Use the Wrong Definition? DeFuse: Definition-Use Invariants for Detecting Concurrency and Sequential Bugs. Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications.

PDF Cite DOI URL

Wenguang Chen, Jidong Zhai, Jin Zhang, Weimin Zheng (2009). LogGPO: An accurate communication model for performance prediction of MPI programs. Science in China Series F: Information Sciences.

PDF Cite DOI URL

Xing Zhou, Wenguang Chen, Weimin Zheng (2009). Cache Sharing Management for Performance Fairness in Chip Multiprocessors. 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

PDF Cite DOI

Jin Zhang, Jidong Zhai, Wenguang Chen, Weimin Zheng (2009). Process Mapping for MPI Collective Communications. Euro-Par 2009 Parallel Processing.

PDF Cite

Ruini Xue, Xuezheng Liu, Ming Wu, Zhenyu Guo, Wenguang Chen, Weimin Zheng, Zheng Zhang, Geoffrey Voelker (2009). MPIWiz: Subgroup Reproducible Replay of Mpi Applications. Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.

PDF Cite DOI URL

Jidong Zhai, Tianwei Sheng, Jiangzhou He, Wenguang Chen, Weimin Zheng (2009). FACT: Fast Communication Trace Collection for Parallel Applications through Program Slicing. Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

PDF Cite DOI URL

Jiaqi Zhang, Wenguang Chen, Xinmin Tian, Weimin Zheng (2008). Exploring the Emerging Applications for Transactional Memory. 2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies.

PDF Cite DOI

Chuntao Hong, Wenguang Chen, Weimin Zheng, Jiulong Shan, Yurong Chen, Yimin Zhang (2008). Parallelization and Characterization of Probabilistic Latent Semantic Analysis. 2008 37th International Conference on Parallel Processing.

PDF Cite DOI

Jiaqi Zhang, Zhiyi Huang, Wenguang Chen, Qihang Huang, Weimin Zheng (2008). Maotai: View-Oriented Parallel Programming on CMT Processors. 2008 37th International Conference on Parallel Processing.

PDF Cite DOI

Ruini Xue, Wenguang Chen, Weimin Zheng (2008). CprFS: A User-Level File System to Support Consistent File States for Checkpoint and Restart. Proceedings of the 22nd Annual International Conference on Supercomputing.

PDF Cite DOI URL

Chunhua Liao, Oscar Hernandez, Barbara Chapman, Wenguang Chen, Weimin Zheng (2007). OpenUH: An Optimizing, Portable OpenMP Compiler: Research Articles. Concurr. Comput. : Pract. Exper..

PDF Cite

Wenguang Chen, Chuntao Hong (2007). PBB: A Parallel Bioinformatics Benchmark Suite for Shared Memory Multiprocessors. Proceedings of the 2007 Asian Technology Information Program’s (ATIP’s) 3rd Workshop on High Performance Computing in China: Solution Approaches to Impediments for High Performance Computing.

PDF Cite DOI URL