Bio简介

I'm a GPU researcher on Qualcomm's Graphics Research team, where I focus on AI inference and architectural analysis for Adreno GPUs. Over the past decade I've worked on making GPUs faster and more useful — from embedded SoCs to HPC clusters, across speech recognition, X-ray diffraction, photon transport, and CT imaging.

我是高通公司图形研究团队(Qualcomm Graphics Research)的 GPU 研究员,专注于 Adreno GPU 的 AI 推理与架构分析。过去十余年里,我致力于让 GPU 更快、更实用—— 从嵌入式 SoC 到 HPC 集群,涉及语音识别、X 射线衍射、光子传输与 CT 成像等多个领域。

Before Qualcomm, I worked on GPU optimization for CT imaging at Analogic, including a 12× speed-up on Metal Artifact Reduction and deep-learning models for low-dose CT denoising. I earned my Ph.D. in Computer Engineering at Northeastern University with NUCAR (advisor: Dr. David Kaeli) and the COTI Lab (Dr. Qianqian Fang), where I extended Monte Carlo eXtreme (MCX) to heterogeneous CPU/GPU platforms and developed Moka, a model-based analysis of concurrent kernel execution.

加入高通之前,我在 Analogic 公司负责 CT 成像的 GPU 优化工作,包括将金属伪影校正 (Metal Artifact Reduction)算法加速 12 倍,以及低剂量 CT 去噪的深度学习模型。 我在美国东北大学(Northeastern University)获得计算机工程博士学位,导师为 David Kaeli 教授NUCAR 实验室), 副导师为房骞骞教授(COTI 实验室)。 博士期间,我将 Monte Carlo eXtreme(MCX)扩展到异构 CPU/GPU 平台, 并开发了基于模型的并发核函数执行分析框架 Moka

My current interests sit at the intersection of modern GPU architecture and AI workloads — LLM inference, kernel fusion and quantization, tensor-core utilization, and neural rendering (Gaussian splatting, NeRF). I'm drawn to work that's both rigorous and useful: understanding why a system behaves the way it does, then making it tangibly better.

我目前的研究兴趣聚焦于现代 GPU 架构与 AI 工作负载的交叉领域——大语言模型推理、 内核融合与量化、张量核(tensor core)利用率,以及神经渲染(高斯泼溅、NeRF)。 我所追求的是既严谨又实用的工作:先理解一个系统为何如此运行,然后让它真正变得更好。

CV履历

Experience工作经历

2022 – Present2022 — 至今

GPU Researcher · Qualcomm

Graphics Research Team — AI inference on Adreno GPUs, accelerator architectural analysis, GPU benchmarking. 图形研究团队 —— Adreno GPU 上的 AI 推理、加速器架构分析、GPU 性能评测。

2019 – 2022

R&D Engineer · Analogic

CT reconstruction on GPUs (12× MAR speed-up); deep-learning models for low-dose CT denoising. GPU 上的 CT 重建(金属伪影校正加速 12 倍);用于低剂量 CT 去噪的深度学习模型。

Jul – Dec 20162016 年 7–12 月

Research Intern · MERL

GPU-accelerated Model Predictive Control on NVIDIA Jetson TX1; mpcCUDA solvers. 在 NVIDIA Jetson TX1 上实现 GPU 加速的模型预测控制(MPC);开发 mpcCUDA 求解器。

Summer 20122012 年夏

Engineering Intern · MathWorks

GPU-accelerated PSK modulation, LDPC decoding, and Turbo decoder on MATLAB Distributed Computing Server. 在 MATLAB 分布式计算服务器上实现 GPU 加速的 PSK 调制、LDPC 解码与 Turbo 解码器。

Education教育背景

2019

Ph.D., Computer Engineering 博士,计算机工程

Northeastern University · Advisor: Dr. David Kaeli (NUCAR) · Co-advisor: Dr. Qianqian Fang (COTI) 东北大学(Northeastern University) · 导师:David Kaeli 教授(NUCAR) · 副导师:房骞骞教授(COTI)

2010

M.S., Electrical Engineering 硕士,电气工程

University of Bridgeport, CT, USA 布里奇波特大学,美国康涅狄格州

2006

B.S., Electrical Engineering 学士,电气工程

Shanghai Maritime University, China 上海海事大学,中国

Full CV (PDF) → 完整简历(PDF)→

News最新动态

Awards & Service 荣誉与服务

Awards获奖

  • Best Poster — HPC Day 2017最佳海报 —— HPC Day 2017
  • Best Poster — HPC Day 2016最佳海报 —— HPC Day 2016
  • "Most Robust" — GE Hackathon Challenge, 2017"最稳健" 奖 —— GE 黑客马拉松,2017
  • Student Travel Grant — IISWC 2017学生差旅资助 —— IISWC 2017
  • Student Travel Grant — PPoPP 2015学生差旅资助 —— PPoPP 2015

Peer Review同行评审

  • AI, MDPI (2024)
  • Journal of Imaging, MDPI (2024)
  • Journal of Parallel and Distributed Computing
  • IEEE Transactions on Computers
  • Simulation Modelling Practice and Theory
  • PDP Conference (2016)PDP 会议(2016)

Publications 学术发表

Google Scholar  · 

  1. 2022

    3D residual convolutional neural network for low dose CT denoising

    Alexander A. Zamyatin, Leiming Yu, David Rozas

    SPIE Medical Imaging 2022: Physics of Medical Imaging, vol. 12031, pp. 634–645.

  2. 2022

    Framework for denoising Monte Carlo photon transport simulations using deep learning

    Matin Raayai Ardakani, Leiming Yu, David R. Kaeli, Qianqian Fang

    Journal of Biomedical Optics, vol. 27, no. 8, p. 083019.

  3. 2018

    An efficient data management framework for the Puerto Rico Testsite for Exploring Contamination Threats (PROTECT)

    Shi Dong, Zlatan Feric, Leiming Yu, David Kaeli, John Meeker, Ingrid Y. Padilla, Jose Cordero, Carmen Velez Vega, Zaira Rosario, Akram Alshawabkeh

    2018 IEEE International Conference on Big Data, pp. 5316–5318.

  4. 2018

    Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms

    Leiming Yu, Fanny Nina-Paravecino, David Kaeli, Qianqian Fang

    Journal of Biomedical Optics, vol. 23, no. 1, p. 010504.

  5. 2018

    Fast Monte Carlo photon transport simulations for heterogeneous computing systems

    Leiming Yu, Fanny Nina-Paravecino, David Kaeli, Qianqian Fang

    Clinical and Translational Biophotonics, pp. JTh3A-38, OSA.

  6. 2018

    GPU-accelerated adaptive nonlocal means filter for denoising 3D Monte Carlo photon transport simulations

    Yaoshen Yuan, Leiming Yu, Zafer Doğan, Qianqian Fang

    Journal of Biomedical Optics, vol. 23, no. 12, p. 121618.

  7. 2018

    Denoising in Monte Carlo photon transport simulations using GPU-accelerated adaptive non-local mean filter

    Yaoshen Yuan, Leiming Yu, Qianqian Fang

    Optical Tomography and Spectroscopy, pp. JTh3A-41, OSA.

  8. 2017

    Efficient convex optimization on GPUs for embedded model predictive control

    Leiming Yu, Abraham Goldsmith, Stefano Di Cairano

    Proceedings of the General Purpose GPUs (GPGPU), pp. 12–21, ACM.

  9. 2017

    Moka: Model-based concurrent kernel analysis

    Leiming Yu, Xun Gong, Yifan Sun, Qianqian Fang, Norm Rubin, David Kaeli

    2017 IEEE International Symposium on Workload Characterization (IISWC), pp. 197–206.

  10. 2017

    Accelerating machine learning algorithms in Python

    Leiming Yu

    Boston Area Architecture Workshop.

  11. 2017

    High-performance Monte Carlo simulations for photon migration and applications in optical brain functional imaging

    Fanny Nina-Paravecino, Leiming Yu, Qianqian Fang, David Kaeli

    Handbook of Large-Scale Distributed Computing in Smart Healthcare, pp. 67–85, Springer.

  12. 2016

    A framework for big metabolomic data management and analysis

    Leiming Yu

    IARIA Journal.

  13. 2016

    Hetero-Mark: A benchmark suite for CPU-GPU collaborative computing

    Yifan Sun, Xiang Gong, Amir Kavyan Ziabari, Leiming Yu, Xiangyu Li, Saoni Mukherjee, Carter McCardwell, Alejandro Villegas, David Kaeli

    2016 IEEE International Symposium on Workload Characterization (IISWC), pp. 1–10.

  14. 2016

    Diffraction pattern simulation of cellulose fibrils using distributed and quantized pair distances

    Yan Zhang, Hideyo Inouye, Michael Crowley, Leiming Yu, David Kaeli, Lee Makowski

    Journal of Applied Crystallography, vol. 49, no. 6, pp. 2244–2248.

  15. 2015

    Speech recognition on modern graphic processing units

    Leiming Yu

    6th Annual Boston Area Architecture Workshop.

  16. 2015

    Big data analysis on Puerto Rico Testsite for Exploring Contamination Threats

    Xiangyu Li, Leiming Yu, David Kaeli, Yuanyuan Yao, Poguang Wang, Roger Giese, Akram Alshawabkeh

    ALLDATA.

  17. 2015

    Exploring the features of OpenCL 2.0

    Saoni Mukherjee, Xiang Gong, Leiming Yu, Carter McCardwell, Yash Ukidave, Tuan Dao, Fanny Nina Paravecino, David Kaeli

    3rd International Workshop on OpenCL, p. 5, ACM.

  18. 2015

    NuPar: A benchmark suite for modern GPU architectures

    Yash Ukidave, Fanny Nina Paravecino, Leiming Yu, Charu Kalra, Amir Momeni, Zhongliang Chen, Nick Materise, Brett Daley, Perhaad Mistry, David Kaeli

    6th ACM/SPEC International Conference on Performance Engineering, pp. 253–264.

  19. 2015

    High performance computing of fiber scattering simulation

    Leiming Yu, Yan Zhang, Xiang Gong, Nilay Roy, Lee Makowski, David Kaeli

    8th Workshop on General Purpose Processing using GPUs (GPGPU), pp. 90–98, ACM.

  20. 2014

    GPU-accelerated HMM for speech recognition

    Leiming Yu, Yash Ukidave, David Kaeli

    43rd International Conference on Parallel Processing Workshops, pp. 395–402, IEEE.

  21. 2014

    Fast simulation of X-ray diffraction patterns from cellulose fibrils using GPUs

    Yan Zhang, Leiming Yu, David Kaeli, Lee Makowski

    40th Annual Northeast Bioengineering Conference (NEBEC), pp. 1–2, IEEE.

  22. 2009

    Speech disorders: An analysis of hypernasal speech using signal processing techniques

    Leiming Yu

    2009 ASEE NE American Society for Engineering Education Conference.

  23. 2009

    Classifying hypernasality using the pitch and formants

    Leiming Yu

    6th International Conference on Information Technology — New Generations (ITNG 2009).

Projects项目

Talks & Posters 报告与海报

Teaching教学经历

Contact联系方式

The best way to reach me is by . For longer-form work, see my CV (PDF) or the publication list on Google Scholar.

最便捷的联系方式是 。 如需了解更详细的信息,请查看我的 简历(PDF), 或访问 Google Scholar 论文列表