Bio简介
I'm a GPU researcher on Qualcomm's Graphics Research team, where I focus on AI inference and architectural analysis for Adreno GPUs. Over the past decade I've worked on making GPUs faster and more useful — from embedded SoCs to HPC clusters, across speech recognition, X-ray diffraction, photon transport, and CT imaging.
我是高通公司图形研究团队(Qualcomm Graphics Research)的 GPU 研究员,专注于 Adreno GPU 的 AI 推理与架构分析。过去十余年里,我致力于让 GPU 更快、更实用—— 从嵌入式 SoC 到 HPC 集群,涉及语音识别、X 射线衍射、光子传输与 CT 成像等多个领域。
Before Qualcomm, I worked on GPU optimization for CT imaging at Analogic, including a 12× speed-up on Metal Artifact Reduction and deep-learning models for low-dose CT denoising. I earned my Ph.D. in Computer Engineering at Northeastern University with NUCAR (advisor: Dr. David Kaeli) and the COTI Lab (Dr. Qianqian Fang), where I extended Monte Carlo eXtreme (MCX) to heterogeneous CPU/GPU platforms and developed Moka, a model-based analysis of concurrent kernel execution.
加入高通之前,我在 Analogic 公司负责 CT 成像的 GPU 优化工作,包括将金属伪影校正 (Metal Artifact Reduction)算法加速 12 倍,以及低剂量 CT 去噪的深度学习模型。 我在美国东北大学(Northeastern University)获得计算机工程博士学位,导师为 David Kaeli 教授 (NUCAR 实验室), 副导师为房骞骞教授(COTI 实验室)。 博士期间,我将 Monte Carlo eXtreme(MCX)扩展到异构 CPU/GPU 平台, 并开发了基于模型的并发核函数执行分析框架 Moka。
My current interests sit at the intersection of modern GPU architecture and AI workloads — LLM inference, kernel fusion and quantization, tensor-core utilization, and neural rendering (Gaussian splatting, NeRF). I'm drawn to work that's both rigorous and useful: understanding why a system behaves the way it does, then making it tangibly better.
我目前的研究兴趣聚焦于现代 GPU 架构与 AI 工作负载的交叉领域——大语言模型推理、 内核融合与量化、张量核(tensor core)利用率,以及神经渲染(高斯泼溅、NeRF)。 我所追求的是既严谨又实用的工作:先理解一个系统为何如此运行,然后让它真正变得更好。
CV履历
Experience工作经历
GPU Researcher · Qualcomm
Graphics Research Team — AI inference on Adreno GPUs, accelerator architectural analysis, GPU benchmarking. 图形研究团队 —— Adreno GPU 上的 AI 推理、加速器架构分析、GPU 性能评测。
R&D Engineer · Analogic
CT reconstruction on GPUs (12× MAR speed-up); deep-learning models for low-dose CT denoising. GPU 上的 CT 重建(金属伪影校正加速 12 倍);用于低剂量 CT 去噪的深度学习模型。
Research Intern · MERL
GPU-accelerated Model Predictive Control on NVIDIA Jetson TX1; mpcCUDA solvers. 在 NVIDIA Jetson TX1 上实现 GPU 加速的模型预测控制(MPC);开发 mpcCUDA 求解器。
Engineering Intern · MathWorks
GPU-accelerated PSK modulation, LDPC decoding, and Turbo decoder on MATLAB Distributed Computing Server. 在 MATLAB 分布式计算服务器上实现 GPU 加速的 PSK 调制、LDPC 解码与 Turbo 解码器。
Education教育背景
Ph.D., Computer Engineering 博士,计算机工程
Northeastern University · Advisor: Dr. David Kaeli (NUCAR) · Co-advisor: Dr. Qianqian Fang (COTI) 东北大学(Northeastern University) · 导师:David Kaeli 教授(NUCAR) · 副导师:房骞骞教授(COTI)
M.S., Electrical Engineering 硕士,电气工程
University of Bridgeport, CT, USA 布里奇波特大学,美国康涅狄格州
B.S., Electrical Engineering 学士,电气工程
Shanghai Maritime University, China 上海海事大学,中国
News最新动态
-
Aug 20242024 年 8 月
Service服务
Serving as reviewer for AI and Journal of Imaging (MDPI). 担任 AI 与 Journal of Imaging(MDPI)期刊审稿人。
-
Sep 20222022 年 9 月
Position职位
Joined Qualcomm's Graphics Research Team as a GPU researcher. 加入高通图形研究团队,担任 GPU 研究员。
-
Feb 20222022 年 2 月
Publication发表
Paper on 3D residual CNN for low-dose CT denoising presented at SPIE Medical Imaging. 关于低剂量 CT 去噪的 3D 残差 CNN 论文在 SPIE Medical Imaging 会议发表。
-
2022
Publication发表
Co-authored work on deep-learning-based denoising for Monte Carlo photon transport published in Journal of Biomedical Optics. 合著的基于深度学习的蒙特卡洛光子传输去噪研究发表于 Journal of Biomedical Optics。
-
2019
Milestone里程碑
Defended Ph.D. at Northeastern University. Advisor: Dr. David Kaeli. 在东北大学完成博士论文答辩,导师为 David Kaeli 教授。
Awards & Service 荣誉与服务
Awards获奖
- Best Poster — HPC Day 2017最佳海报 —— HPC Day 2017
- Best Poster — HPC Day 2016最佳海报 —— HPC Day 2016
- "Most Robust" — GE Hackathon Challenge, 2017"最稳健" 奖 —— GE 黑客马拉松,2017
- Student Travel Grant — IISWC 2017学生差旅资助 —— IISWC 2017
- Student Travel Grant — PPoPP 2015学生差旅资助 —— PPoPP 2015
Peer Review同行评审
- AI, MDPI (2024)
- Journal of Imaging, MDPI (2024)
- Journal of Parallel and Distributed Computing
- IEEE Transactions on Computers
- Simulation Modelling Practice and Theory
- PDP Conference (2016)PDP 会议(2016)
Publications 学术发表
-
2022
3D residual convolutional neural network for low dose CT denoising
SPIE Medical Imaging 2022: Physics of Medical Imaging, vol. 12031, pp. 634–645.
-
2022
Framework for denoising Monte Carlo photon transport simulations using deep learning
Journal of Biomedical Optics, vol. 27, no. 8, p. 083019.
-
2018
An efficient data management framework for the Puerto Rico Testsite for Exploring Contamination Threats (PROTECT)
2018 IEEE International Conference on Big Data, pp. 5316–5318.
-
2018
Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms
Journal of Biomedical Optics, vol. 23, no. 1, p. 010504.
-
2018
Fast Monte Carlo photon transport simulations for heterogeneous computing systems
Clinical and Translational Biophotonics, pp. JTh3A-38, OSA.
-
2018
GPU-accelerated adaptive nonlocal means filter for denoising 3D Monte Carlo photon transport simulations
Journal of Biomedical Optics, vol. 23, no. 12, p. 121618.
-
2018
Denoising in Monte Carlo photon transport simulations using GPU-accelerated adaptive non-local mean filter
Optical Tomography and Spectroscopy, pp. JTh3A-41, OSA.
-
2017
Efficient convex optimization on GPUs for embedded model predictive control
Proceedings of the General Purpose GPUs (GPGPU), pp. 12–21, ACM.
-
2017
Moka: Model-based concurrent kernel analysis
2017 IEEE International Symposium on Workload Characterization (IISWC), pp. 197–206.
-
2017
Accelerating machine learning algorithms in Python
Boston Area Architecture Workshop.
-
2017
High-performance Monte Carlo simulations for photon migration and applications in optical brain functional imaging
Handbook of Large-Scale Distributed Computing in Smart Healthcare, pp. 67–85, Springer.
-
2016
A framework for big metabolomic data management and analysis
IARIA Journal.
-
2016
Hetero-Mark: A benchmark suite for CPU-GPU collaborative computing
2016 IEEE International Symposium on Workload Characterization (IISWC), pp. 1–10.
-
2016
Diffraction pattern simulation of cellulose fibrils using distributed and quantized pair distances
Journal of Applied Crystallography, vol. 49, no. 6, pp. 2244–2248.
-
2015
Speech recognition on modern graphic processing units
6th Annual Boston Area Architecture Workshop.
-
2015
Big data analysis on Puerto Rico Testsite for Exploring Contamination Threats
ALLDATA.
-
2015
Exploring the features of OpenCL 2.0
3rd International Workshop on OpenCL, p. 5, ACM.
-
2015
NuPar: A benchmark suite for modern GPU architectures
6th ACM/SPEC International Conference on Performance Engineering, pp. 253–264.
-
2015
High performance computing of fiber scattering simulation
8th Workshop on General Purpose Processing using GPUs (GPGPU), pp. 90–98, ACM.
-
2014
GPU-accelerated HMM for speech recognition
43rd International Conference on Parallel Processing Workshops, pp. 395–402, IEEE.
-
2014
Fast simulation of X-ray diffraction patterns from cellulose fibrils using GPUs
40th Annual Northeast Bioengineering Conference (NEBEC), pp. 1–2, IEEE.
-
2009
Speech disorders: An analysis of hypernasal speech using signal processing techniques
2009 ASEE NE American Society for Engineering Education Conference.
-
2009
Classifying hypernasality using the pitch and formants
6th International Conference on Information Technology — New Generations (ITNG 2009).
Projects项目
-
Monte Carlo eXtreme (MCX)
GPU-accelerated Monte Carlo simulation of photon migration in 3D turbid media. Scaled across heterogeneous CPU/GPU platforms for biomedical optics. 在三维混浊介质中实现 GPU 加速的蒙特卡洛光子迁移仿真,并在异构 CPU/GPU 平台上扩展以服务于生物医学光学应用。
-
Fiber Scattering Simulation on GPUs
High-performance GPU implementation of fiber scattering / X-ray diffraction pattern simulation for cellulose fibrils. 面向纤维素纤维的纤维散射 / X 射线衍射图样仿真的高性能 GPU 实现。
-
PROTECT — Data Management & Modeling
Scalable data management framework for the Puerto Rico Testsite for Exploring Contamination Threats (PROTECT). 为 PROTECT(波多黎各污染威胁研究试验区)构建的可扩展数据管理框架。
-
GPU-accelerated HMM
GPU-optimized Hidden Markov Model implementation accelerating speech-recognition workloads. 面向语音识别工作负载的 GPU 优化隐马尔可夫模型(HMM)实现。
Talks & Posters 报告与海报
-
2022
3D residual convolutional neural network for low dose CT denoising
Poster海报SPIE Medical Imaging 2022 · San Diego, CA
-
2019
Neural network denoiser for Monte Carlo photon transport simulations
Poster海报SPIE Photonic West 2019 · San Francisco, CA
-
2018
Denoising in Monte Carlo photon transport simulation using neural networks
Poster海报HPC Day 2018 · Northeastern University, Boston, MA
-
2018
Fast MCX for heterogeneous computing systems
Poster海报COE Ph.D. Research Expo · Northeastern University
- 2017
-
2017
Model-based concurrent kernel execution on GPU
Poster海报HPC Day 2017 · UMass Dartmouth, MA
-
2016
Portable performance for Monte Carlo simulation of photon migration in 3D turbid media for single and multiple GPUs
Talk报告GTC 2016 · Silicon Valley, CA
-
2016
Monte Carlo simulation of photon migration in 3D turbid media
Poster海报HPC Day 2016 · UMass Dartmouth, MA
Teaching教学经历
-
2017
GPU Programming
Invited Lecturer受邀讲师Philips
-
2015 – 2017
GPU Class
Lecturer主讲教师Northeastern University
-
2013
GPU Class
Teaching Assistant助教Northeastern University
-
2008 – 2010
Audio Processing Lab & Digital Processing Lab
Teaching Assistant助教University of Bridgeport
Contact联系方式
The best way to reach me is by email. For longer-form work, see my CV (PDF) or the publication list on Google Scholar.
最便捷的联系方式是 邮箱。 如需了解更详细的信息,请查看我的 简历(PDF), 或访问 Google Scholar 论文列表。
Research Interests
AI Workloads on GPUs
LLM inference, ONNX optimization, fusion and quantization, tensor-core utilization.
GPU Architecture
NVIDIA tensor cores, AMD RDNA 4, Intel Xe, Apple silicon, Google TPU v7 (Ironwood).
Ray Tracing & Neural Rendering
Gaussian splatting, NeRF, real-time rendering on modern GPUs.
Medical Imaging
Monte Carlo photon transport, low-dose CT denoising, deep-learning reconstruction.
Performance Engineering
Concurrent kernel analysis, benchmarking, modeling, and tuning across heterogeneous systems.
Tools & Systems
CUDA, OpenCL, Python, C, profiling tools, and developer-experience tooling.