Team

Principle Investigator

Song Han

Song Han is an associate professor at MIT EECS and distinguished scientist at NVIDIA. He received his PhD degree from Stanford University. He proposed the “Deep Compression” technique including pruning and quantization that is widely used for efficient AI computing, and “Efficient Inference Engine” that first brought weight sparsity to modern AI chips. He pioneered the TinyML research that brings deep learning to IoT devices, enabling learning on the edge (appeared on MIT home page). His team’s work on hardware-aware neural architecture search (once-for-all network) enables users to design, optimize, shrink and deploy AI models to resource-constrained hardware devices, receiving the first place in many low-power computer vision contests in flagship AI conferences. His team’s recent work on large language model quantization/acceleration (SmoothQuant, AWQ, StreamingLLM) has effectively improved the efficiency of LLM inference, adopted by NVIDIA TensorRT-LLM. Song received best paper awards at ICLR and FPGA, faculty awards from Amazon, Facebook, NVIDIA, Samsung and SONY. Song was named “35 Innovators Under 35” by MIT Technology Review for his contribution on “deep compression” technique that “lets powerful artificial intelligence (AI) programs run more efficiently on low-power mobile devices.” Song received the NSF CAREER Award for “efficient algorithms and hardware for accelerated machine learning”, IEEE “AIs 10 to Watch: The Future of AI” award, and Sloan Research Fellowship. Song’s research in efficient AI computing has witnessed successful commercialization and influenced the industry. He was the cofounder of DeePhi (now part of AMD), and cofounder of OmniML (now part of NVIDIA). Song developed the EfficientML.ai course to disseminate this line of research.

Team Members

Postdoctoral

Qinghao Hu

His research focuses on building efficient machine learning systems, particularly in the areas of training, serving, and scheduling for foundation models. His work received the Distinguished Paper Award at ASPLOS. He is also a recipient of the Google PhD Fellowship and has been recognized as one of the ML and Systems Rising Stars by MLCommons.

Ph.D

Guangxuan Xiao

His research interests focus on the development of efficient algorithms and systems for deep learning, specifically large foundation models. His work has received over 8000 stars on GitHub. His work has a real-world impact: SmoothQuant has been integrated into NVIDIA's TensorRT-LLM, FasterTransformer and Intel's NeuralCompressor and is utilized in the LLMs of industry companies like Amazon, Meta, and Huggingface. StreamingLLM has been integrated into NVIDIA's TensorRT-LLM, Huggingface's transformers, and Intels' Extension for Transformers.

Jiaming Tang

Jiaming Tang is a first-year Ph.D. student at MIT, advised by Prof. Song Han. He was a member of ACM Honors Class, Shanghai Jiao Tong University. His research interests lie in efficient systems and algorithms for large language models. His work AWQ receives the Best Paper Award at MLSys 2024 and has been integrated into Transformers, vLLM, FastChat, TensorRT-LLM, and TGI.

Muyang Li

His research interest is in the intersection of machine learning, system, and computer graphics. He is currently working on building efficient and hardware-friendly generative models with its applications in computer vision and graphics. His work GAN Compression receives 1.1K stars on GitHub.

Shang Yang

Shang Yang is a second-year Ph.D. student at MIT, advised by Prof. Song Han. He received his B.Eng. degree from Tsinghua University. His research focuses on efficient machine learning systems. He has (co-)led system development for projects including QServe, AWQ (MLSys'24), TorchSparse++ (MICRO'23), etc. And he has received over 4k stars on Github.

Zhekai Zhang

His research focuses on the development of high-performance and efficient hardware architectures and software systems for deep learning. Zhekai leads the low-level architecture design of multiple hardware projects, including SpArch (HPCA'20), SpAtten (HPCA'21), PointAcc (MICRO'21), and LEGO (HPCA'25), which have received over 700 citations. Zhekai also leads the system and CUDA kernel development of Nunchaku, an efficient inference engine used by software projects including SVDQuant and SANA.

Zhuoyang Zhang

Zhuoyang Zhang is a first-year Ph.D. student at MIT, advised by Prof. Song Han. He received his Bachelor’s degree from Yao class, Tsinghua University. His research interests lie in machine learning and computer vision. He aims to accelerate large foundation models for modern AI computing.

Master

Nicole Stiles

She completed her undergrad in CS at MIT in 2023, and is currently an MEng student interested in performant systems and machine learning.

Undergraduate

Maggie Liu

Maggie Liu is a first-year undergraduate student who loves competitive programming and hackathons and is interested in machine learning.

Shreya Chaudhury

Shreya Chaudhury is a second year undergraduate student at MIT EECS department. Her research interest is quantum computing, especially on variational quantum algorithms and quantum machine learning. She is the core developer of the TorchQuantum library.

Graduated

Yujun Lin

Ph.D

Yujun Lin graduated from MIT HAN Lab in May 2025. He joined NVIDIA Research as a research scientist after graduation. His research focuses on the intersection of computer architecture and machine learning, particularly the co-design of software and hardware for deep learning and its applications. Yujun was awarded the 2021 Qualcomm Innovation Fellowship, and he is the founding member of the new course on TinyML and efficient deep learning computing (MIT 6.S965) teaching crew, which received 12k views on YouTube.

Haotian Tang

Ph.D

Haotian is currently a research scientist at the GenAI org of Google DeepMind. He received his PhD at MIT EECS with Prof. Song Han in January 2025. His research interests lie at the intersection of computer systems and machine learning. He is currently working on efficient multi-modal generation with foundation models. He has authored multiple papers with over 3,800 citations. Haotian has successfully advised several undergraduate students, and the intern students he mentored have continued as PhD students at MIT and UC Berkeley.

Wei-Chen Wang

Postdoctoral

His research focuses on efficient deep learning, TinyML, embedded systems, and memory/storage systems. Wei-Chen has received several accolades for his work, including the MLSys Best Paper Award, the Best Poster Award at the NSF Athena AI Institute, the ACM/IEEE CODES+ISSS Best Paper Award, and the IEEE NVMSA Best Paper Award. In addition, he received first place (among 150 teams) in the flash consumption track of the ACM/IEEE TinyML Design Contest at ICCAD 2022. His research has received over 4,000 stars on GitHub, and his work "On-device training under 256KB memory" (MCUNetV3) was highlighted by the MIT homepage. He will join Amazon as an Applied Scientist.

Hanrui Wang

Ph.D

Hanrui Wang graduated from MIT HAN Lab in 2024. His research focuses on efficient AI and emerging hardware (e.g. quantum architecture). His research has been recognized by ACM student research competition 1st place award, best poster award at NSF AI Institute, Best Presentation Award as a DAC Young Fellow and appears in top conferences such as NeurIPS, ISCA, MICRO, HPCA, and DAC. His co-authored papers received ICML RL4RL Best Paper Award and QCE Best Paper Award. He is the recipient of Qualcomm Fellowship, Unitary Fund, and Nvidia Fellowship Finalist. He is the creator of TorchQuantum library which has been adopted by IBM and PyTorch Ecosystems. He is also the co-founder of QuCS lecture series for quantum education. Hanrui received his B. Eng. degree from Fudan University.

Han Cai

Ph.D

Han Cai graduated from MIT HAN Lab in May 2024. He joined NVIDIA Research as a research scientist after graduation. His research focuses on algorithms and acceleration of efficient deep learning computing. Han has made significant contributions to the field, including his work on hardware-aware neural architecture search (ProxylessNAS, Once-for-All), which has been integrated into PytorchHub@Meta, AutoGluon@Amazon, NNI@Microsoft, SONY Neural Architecture Search Library, SONY Model Compression Toolkit, and ADI Model Training and Synthesis Tool. His research has received 6.9K+ citations on Google Scholar and 5.2K+ stars on GitHub.

Zhijian Liu

Ph.D

Zhijian graduated from MIT HAN Lab in May 2024. Zhijian will join UCSD as a tenure-track assistant professor, after a gap year at NVIDIA Research. His research focuses on efficient machine learning and systems. His work has been featured in oral and spotlight presentations at conferences such as NeurIPS, ICLR, and CVPR. He received the Qualcomm Innovation Fellowship. He was recognized as a Rising Star in ML and Systems by MLCommons and a Rising Star in Data Science by UChicago and UCSD. His work has received over 9000 citations on Google Scholar and over 11000 stars on GitHub. He received his B.Eng. degree from Shanghai Jiao Tong University.

Ji Lin

Ph.D

Ji Lin graduated from MIT HAN Lab in Dec. 2023 and joined OpenAI as a research scientist. His research focuses on efficient deep learning computing, systems for ML and recently, accelerating large language models (LLMs). Ji is pioneering the research in the field of TinyML. His research has received over 10,000 citations on Google Scholar and over 8,000 stars on GitHub. His work on LLM quantization (AWQ) received the best paper award at MLSys'24. AWQ has been widely adopted by NVIDIA, Intel, Microsoft, AMD, HuggingFace, Berkeley to accelerate LLM inference. AWQ-quantized LLMs have been downloaded by more than 6 million times on HuggingFace. Ji is an NVIDIA Graduate Fellowship Finalist in 2020, and Qualcomm Innovation Fellowship recipient in 2022. His work has been covered by MIT Tech Review, MIT News (twice on MIT homepage and four times on MIT News), WIRED, Engadget, VentureBeat, etc.

Wei-Ming Chen

Postdoctoral

Wei-Ming Chen is a Postdoctoral Associate at MIT EECS advised by Professor Song Han. His research focuses on TinyML, embedded systems, and real-time systems, with a particular emphasis on enabling efficient deep learning on Internet of Things (IoT) devices, such as microcontrollers. Chen's recent work on the MCUNet series (MCUNet, MCUNetv2, and MCUNetv3) has enabled efficient inference and training on devices with limited memory through the co-design of systems and algorithms. He is also a key contributor and maintainer of TinyEngine, an open-source library for high-performance and memory-efficient deep learning on microcontrollers. His work "On-device training under 256KB memory" (MCUNetV3) is highlighted by the MIT homepage in fall 2022. He received first place (among 150 teams) in the flash consumption track of the ACM/IEEE TinyML Design Contest at ICCAD 2022. He developed TinyChatEngine that enables LLM inference on the edge (laptop, Paspberry PI). His research has received more than 1,000 stars on GitHub. After graduation, he joined NVIDIA as a senior deep learning engineer working on large language model acceleration.

Jessica Zheng

Undergraduate

Jessica Zheng was an M.Eng student at MIT Han Lab. Her research focused on efficient deep learning, anomaly detection and machine learning for healthcare. She received the 1st place award at the ACM TinyML Design Contest memory size track in 2022.

Kevin Shao

Master

Kevin Shao was an M.Eng student at MIT HAN Lab, working on autonomous driving and efficient 3D deep learning. After graduation, he joined Two Sigma as a Quantitative Researcher.

Driss Hafdi

Master

Driss Hafdi was an M.Eng student at MIT EECS, working on specialized hardware for mixed-precision quantization. After graduation, he joined Hudson River Trading as an FPGA developer.

Openings

If you work on efficient LLM, VLM, GenAI and are interested in joining us, please fill in the recruiting form. Inquiry emails will not be replied if the recruiting form is incomplete. PhD applicants: select "ML+System" track in the MIT PhD application system.

Efficient AI Computing,
Transforming the Future.

Team

Principle Investigator

Song Han

Team Members

Postdoctoral

Qinghao Hu

Ph.D

Guangxuan Xiao

Jiaming Tang

Muyang Li

Shang Yang

Zhekai Zhang

Zhuoyang Zhang

Master

Nicole Stiles

Undergraduate

Maggie Liu

Shreya Chaudhury

Graduated

Yujun Lin

Haotian Tang

Wei-Chen Wang

Hanrui Wang

Han Cai

Zhijian Liu

Ji Lin

Wei-Ming Chen

Jessica Zheng

Kevin Shao

Driss Hafdi

Openings

Sponsors

Efficient AI Computing,Transforming the Future.

Team

Principle Investigator

Song Han

Team Members

Postdoctoral

Qinghao Hu

Ph.D

Guangxuan Xiao

Jiaming Tang

Muyang Li

Shang Yang

Zhekai Zhang

Zhuoyang Zhang

Master

Nicole Stiles

Undergraduate

Maggie Liu

Shreya Chaudhury

Graduated

Yujun Lin

Haotian Tang

Wei-Chen Wang

Hanrui Wang

Han Cai

Zhijian Liu

Ji Lin

Wei-Ming Chen

Jessica Zheng

Kevin Shao

Driss Hafdi

Openings

Sponsors

Efficient AI Computing,
Transforming the Future.