PhD Student at the University of Edinburgh
I'm a PhD student at the University of Edinburgh, working on agentic and distributed AI systems, supervised by Dr. Luo Mai (primary) and Dr. Wenda Li (assistant). My research focuses on building efficient AI systems through algorithm-system co-design for agentic workloads, including context management and model memory (KV cache) sharing across both edge and cloud environments.
I am also interested in serverless AI systems and have joined as a core developer of ServerlessLLM. In addition, I am collaborating with UK ARIA to build a benchmark that reveals the fundamental hardware capabilities required by emerging AI workloads such as Mixture-of-Experts, test-time scaling, and agentic workflows.
My goal is to improve agentic AI workflows for both personal use and practical industrial applications, in terms of both accuracy and efficiency.
Before exploring machine learning systems, my research was in computer vision, particularly for autonomous driving and medical imaging, so I have a solid understanding of visual data and its characteristics in image and point cloud forms. With multi-modality becoming increasingly important in agentic AI workflows involving visual data, building high-performance systems requires algorithm-system co-design — as VLMs and LLMs continue to evolve, understanding how to embrace their advances and leverage them to accelerate the overall system is both interesting and important. In the world of agentic AI, where workloads are highly dynamic and complex, systems and algorithms need to work together rather than being developed in isolation as before from my perspective.
I also have research experience with 2D-mesh accelerators such as Cerebras and Tenstorrent. My undergraduate thesis focused on designing high-performance KNN algorithms for 2D-mesh accelerators.
Outside of research, I enjoy singing and playing music, and I am currently trying to build my first album. I won first prize in a singing competition during high school. The instruments closest to my heart are guitar and erhu. I have long been inspired by Kotaro Oshio, a fingerstyle guitarist whose pieces I am always learning and playing.
Musically, I grew up with Jay Chou — his album Ye Hui Mei has stayed with me since 2006, when my parents first played it in the car. On the Western side, Roger Waters and Pink Floyd shaped my taste deeply — The Dark Side of the Moon remains the album I return to most. I am also drawn to Coldplay, especially Viva la Vida or Death and All His Friends.
PhD
BSc Computer Science
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems [GitHub]
NeurIPS 2025 Dataset and Benchmark Track
Fast long-context inference via context reuse. 4–13× cache hits, 1.5–3× faster prefill, and ~36% token savings across vLLM, SGLang, RAG, AI Agents, and more.
A serverless inference framework for LLMs. Load models 10x faster and serve 10 models with 1 GPU through fast, locality-optimized checkpoint loading and live migration of LLM inference across GPU clusters.
A benchmarking method designed to evaluate sparse Mixture-of-Experts systems by integrating Cost, Accuracy, and Performance across three dimensions.
An open-source textbook on machine learning systems design and implementation, covering computational graphs, model training, inference, and deployment.
San Diego, US
Rotterdam, Netherlands
Newcastle, UK