His research focuses on the development of high-performance and efficient hardware architectures and software systems for deep learning. Zhekai leads the low-level architecture design of multiple hardware projects, including SpArch (HPCA'20), SpAtten (HPCA'21), PointAcc (MICRO'21), and LEGO (HPCA'25), which have received over 700 citations. Zhekai also leads the system and CUDA kernel development of Nunchaku, an efficient inference engine used by software projects including SVDQuant and SANA.
Nunchaku: an Efficient Inference Engine for Diffusion Models (Repo, Demo)
Nunchaku leverages W4A4 quantization to accelerate DiT-based models on NVIDIA GPUs. Utilizing multiple kernel fusion techniques, Nunchaku eliminates the memory access overhead of full precision activations and is able to achieve 3x speedup on models including FLUX.1 and SANA-1.6B.
Zhekai Zhang is a fifth-year Ph.D. student at MIT EECS advised by Professor Song Han. His research focuses on the development of high-performance and efficient systems and hardware architectures for deep learning and sparse linear algebra. Zhekai has published several papers in the field of computer architecture, which have received over 700 citations.
Some of his notable contributions include SpArch, an accelerator for sparse matrix multiplication presented at HPCA 2020; SpAtten, a hardware architecture for efficient natural language processing presented at HPCA 2021; PointAcc, a hardware accelerator for 3D point-cloud neural networks presented at MICRO 2021, and LEGO, an automatic hardware accelerator generation framework at HPCA 2025.
Zhekai also works on building efficient deep learning systems on GPUs. He leads the system development of Nunchaku – an efficient inference engine for diffusion models, which received over 500 stars on Github.
Zhekai also leads the FPGA implementation of Once-for-All network, which was presented at ICLR 2020, and has won first place in the Low-Power Computer Vision Challenge 2020 and 2021 in the FPGA track.