Efficient AI Computing,
Transforming the Future.

Who We Are

Welcome to MIT HAN Lab! We focus on making AI faster, smarter, and more efficient. Our research covers a broad spectrum, including generative AI (e.g., LLMs and diffusion models), TinyML, system optimization and hardware design. By integrating algorithm and hardware expertise, we strive to push the frontiers of AI efficiency and performance.

Graduated PhD students: Ji Lin (OpenAI), Hanrui Wang (assistant professor @UCLA), Zhijian Liu (assistant professor @UCSD), Han Cai (NVIDIA Research), Haotian Tang (Google DeepMind), Yujun Lin (NVIDIA Research).

Highlights

Accelerating LLM and Generative AI [slides]:

  • LLM Quantization: AWQ, TinyChat enables on-device LLM inference with 4bit quantization (best paper award at MLSys'24), with 19 million downloads on HuggingFace. SmoothQuant is a training-free and accuracy-preserving 8-bit post-training quantization (PTQ) solution for LLMs. QServe speeds up the large scale LLM serving with W4A8KV4 quantization (4-bit weights, 8-bit activations, and 4-bit KV cache). COAT enables memory efficient FP8 training.
  • Long Context LLM: StreamingLLM enables LLMs to generate infinite-length texts with a fixed memory budget by preserving the "attention sinks" in the KV-cache. Quest leverages query-aware sparsity in long-context KV cache to boost inference throughput. DuoAttention reduces both LLM's decoding and pre-filling memory and latency with retrieval and streaming heads. LServe accelerates long-context LLM serving with hardware-aware unified sparse attention framework.
  • Efficient Visual Generation: HART is an autoregressive visual generation model capable of directly generating 1024×1024 images on a laptop. SANA enables 4K image synthesis under low computation, using deep compression auto-encoder (DC-AE) and linear diffusion transformer. SVDQuant further enables 4-bit diffusion models (W4A4) by absorbing the outliers with low-rank components.
  • Efficient Visual Language Models: VILA, VILA-U, LongVILA are a family of efficient visual language models for both understanding and generation. LongVILA efficiently scales to 6K frames of video.

We Work On

The incredible potential of large models in Artificial Intelligence Generated Content (AIGC), including cutting-edge technologies like Large Language Models (LLMs) and Diffusion Models, have revolutionized a wide range of applications, spanning natural language processing, content generation, creative arts, and more. However, large model size, and high memory and computational requirements present formidable challenges. We aim to tackle these hurdles head-on and make these advanced AI technologies more practical, democratizing access to these future-changing technologies for everyone.

Efficient AI Algorithm
1
2

News

  • Feb 2025
    A new blog post
    RTX 5090 Workstation Configuration Journey
     is published.
    With the arrival of the RTX 5090, we built a high-performance workstation to maximize its AI computing potential. In this blog post, we share our experience—from overcoming setup challenges to testing its performance.
  • Dec 2024

    🔥⚡ We release TinyChat 2.0, the latest version with significant advancements in prefilling speed of Edge LLMs and VLMs, 1.5-1.7x faster than the previous version of TinyChat. Please refer to our blog for more details.

    AWQ
  • Dec 2024
    A new blog post
    TinyChat 2.0: Accelerating Edge AI with Efficient LLM and VLM Deployment
     is published.
    Explore the latest advancement in TinyChat – the 2.0 version with significant advancements in prefilling speed of Edge LLMs and VLMs. Apart from the 3-4x decoding speedups achieved with AWQ quantization, TinyChat 2.0 now delivers state-of-the-art Time-To-First-Token, which is 1.5-1.7x faster than the legacy version of TinyChat.
  • Dec 2024

    DistriFusion is integrated in NVIDIA's TensorRT-LLM for distributed inference on high-resolution image generation.

    DistriFusion
  • Oct 2024
    A new blog post
    Block Sparse Attention
     is published.
    We introduce Block Sparse Attention, a library of sparse attention kernels that supports various sparse patterns, including streaming attention with token granularity, streaming attention with block granularity, and block-sparse attention. By incorporating these patterns, Block Sparse Attention can significantly reduce the computational costs of LLMs, thereby enhancing their efficiency and scalability. We release the implementation of Block Sparse Attention, which is modified based on FlashAttention 2.4.2.
  • Aug 2024

    The TinyML and Efficient Deep Learning Computing course will be returning in Fall, with recorded sessions on YouTube!

    6.5940
  • Jun 2024
    Congrats
    AWQ
     team
     on
    Best Paper Award
     of
    MLSys 2024
     
    .
    AWQ
  • Jun 2024

    AWQ is presented at MLSys 2024. Talk video has been released!

    AWQ
  • May 2024

    🏆 AWQ receives the Best Paper Award at MLSys 2024. 🎉

    AWQ
  • May 2024
    Congrats
    Qinghao Hu
     on
    2024 Rising Stars in ML and Systems
    .
  • Mar 2024

    We show SmoothQuant can enable W8A8 quantization for Llama-1/2, Falcon, Mistral, and Mixtral models with negligible loss.

    SmoothQuant
  • Mar 2024
    A new blog post
    Patch Conv: Patch Convolution to Avoid Large GPU Memory Usage of Conv2D
     is published.
    In this blog, we introduce Patch Conv to reduce memory footprint when generating high-resolution images. PatchConv significantly cuts down the memory usage by over 2.4× compared to existing PyTorch implementation. Code: https://github.com/mit-han-lab/patch_conv
  • Mar 2024
    A new blog post
    TinyChat: Visual Language Models & Edge AI 2.0
     is published.
    Explore the latest advancement in TinyChat and AWQ – the integration of Visual Language Models (VLM) on the edge! The exciting advancements in VLM allows LLMs to comprehend visual inputs, enabling seamless image understanding tasks like caption generation, question answering, and more. With the latest release, TinyChat now supports leading VLMs such as VILA, which can be easily quantized with AWQ, empowering users with seamless experience for image understanding tasks.
  • Feb 2024
    A new blog post
    DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
     is published.
    In this blog, we introduce DistriFusion, a training-free algorithm to harness multiple GPUs to accelerate diffusion model inference without sacrificing image quality. It can reduce SDXL latency by up to 6.1× on 8 A100s. Our work has been accepted by CVPR 2024 as a highlight. Code: https://github.com/mit-han-lab/distrifusion
  • Feb 2024

    AWQ has been accepted to MLSys 2024!

    AWQ
  • Feb 2024
    Congrats
    Hanrui Wang
     on
    Rising Star in Solid-State Circuits at ISSCC
    .
  • Feb 2024

    We released new version of quantized GEMM/GEMV kernels in TinyChat, leading to 38 tokens/second inference speed on NVIDIA Jetson Orin!

    AWQ
  • Jan 2024

    StreamingLLM is integrated by HPC-AI Tech SwiftInfer to support infinite input length for LLM inference.

    StreamingLLM
  • Dec 2023

    StreamingLLM is integrated by CMU, UW, and OctoAI, enabling endless and efficient LLM generation on iPhone!

    StreamingLLM
  • Dec 2023

    Congrats Ji Lin completed and defended his PhD thesis: "Efficient Deep Learning Computing: From TinyML to Large Language Model". Ji joined OpenAI after graduation.

  • Dec 2023

    SmoothQuant is adopted by NVIDIA TensorRT-LLM.

    SmoothQuant
  • Dec 2023

    AWQ is integrated by HuggingFace Transformers' main branch.

    AWQ
  • Nov 2023

    TorchSparse++ has been adopted by One-2-3-45++ from Prof. Hao Su's lab (UCSD) for 3D object generation!

    TorchSparse++
  • Nov 2023
    Congrats
    Zhijian Liu
     on
    2023 Rising Stars in Data Science
    .
  • Nov 2023

    SmoothQuant is adopted by Amazon SageMaker.

    SmoothQuant
  • Nov 2023

    🔥 AWQ is now integrated natively in Hugging Face transformers through from_pretrained. You can either load quantized models from the Hub or your own HF quantized models.

    AWQ
  • Oct 2023

    TorchQuantum is used in winning team for ACM Quantum Computing for Drug Discovery.

    QuantumNAS
  • Oct 2023
    Congrats
    QuantumNAS
     team on
    1st Place Award
     of
    ACM Quantum Computing for Drug Discovery Contest
     on
     @
    ICCAD 2023
     
    2023
    .
    QuantumNAS
  • Oct 2023
    Song Han
     presented "
    Quantization for Foundation Models
    " at
    the ICCV 2023 Workshop on Low-Bit Quantized Neural Networks
    .
    VideoSlidesMediaEvent
  • Oct 2023
    Song Han
     presented "
    Efficient Vision Transformer
    " at
    the ICCV 2023 Workshop on Resource-Efficient Deep Learning for Computer Vision (RCV'23)
    .
    VideoSlidesMediaEvent
  • Oct 2023
    Congrats
    Qinghao Hu
     on
    2023 Google PhD Fellowship
    .
  • Sep 2023
    Song Han
     presented "
    TinyChat for On-device LLM
    " at
    the IAP MIT Workshop on the Future of AI and Cloud Computing Applications and Infrastructure
    .
    VideoSlidesMediaEvent
  • Sep 2023
    A new blog post
    TinyChat: Large Language Model on the Edge
     is published.
    Running large language models (LLMs) on the edge is of great importance. In this blog, we introduce TinyChat, an efficient and lightweight system for LLM deployment on the edge. It runs Meta's latest LLaMA-2 model at 30 tokens / second on NVIDIA Jetson Orin and can easily support different models and hardware.
  • Aug 2023
    Congrats
    Hanrui Wang
     on
    2023 Rising Stars in ML and Systems
    .
  • Aug 2023
    Congrats
    Zhijian Liu
     on
    2023 Rising Stars in ML and Systems
    .
  • Jul 2023

    The TinyML and Efficient Deep Learning Computing course will be returning in Fall, with live sessions on YouTube!

    6.5940
  • Jul 2023
    Congrats
    SpAtten
     team
     on
    Best Demo Award
     of
    DAC University Demo
     
    .
    SpAtten
  • Jul 2023

    SpAtten and SpAtten-Chip won the 1st Place Award at 2023 DAC University Demo.

    SpAtten
  • Jul 2023

    We released TinyChat, an efficient and lightweight chatbot interface based on AWQ. TinyChat enables efficient LLM inference on both cloud and edge GPUs. Llama-2-chat models are supported! Check out our implementation here.

    AWQ
  • Jun 2023

    TorchSparse++ has been adopted by One-2-3-45 from Prof. Hao Su's lab (UCSD) for 3D mesh reconstruction!

    TorchSparse++
  • Jun 2023
    Song Han
     presented "
    Efficient Deep Learning Computing with Sparsity
    " at
    CVPR Workshop on Efficient Computer Vision
    .
    VideoSlidesMediaEvent
  • May 2023
    Congrats
    Wei-Chen Wang
     team
     on
    2023 NSF Athena AI Institute Best Poster Award
     of
     
    .
  • May 2023
    Congrats
    Song Han
     on
    2023 Sloan Research Fellowship
    .
  • Jan 2023
    Congrats
    Hanrui Wang
     on
    MARC 2023 Best Pitch Award
    .
  • Nov 2022
    A new blog post
    On-Device Training Under 256KB Memory
     is published.
    In MCUNetV3, we enable on-device training under 256KB SRAM and 1MB Flash, using less than 1/1000 memory of PyTorch while matching the accuracy on the visual wake words application. It enables the model to adapt to newly collected sensor data and users can enjoy customized services without uploading the data to the cloud thus protecting privacy.
  • Nov 2022
    Congrats
    HAT
     team on
    First Place (1/150)
     of
    ACM/IEEE TinyML Design Contest
     on
    Memory Occupation Track
     @
    ICCAD
     
    2022
    .
    HAT
  • Nov 2022
    Congrats
    Hanrui Wang
     on
    Gold Medal of ACM Student Research Competition
    .
  • Sep 2022
    Congrats
    Hanrui Wang
     team
     on
    Best Paper Award
     of
    IEEE International Conference on Quantum Computing and Engineering (QCE)
     
    .
  • Aug 2022
    Congrats
    Zhijian Liu
     on
    the 2022 MIT Ho-Ching and Han-Ching Fund Award
    .
  • Aug 2022
    Congrats
    Ji Lin
     on
    the 2022 Qualcomm Innovation Fellowship
    .
  • Jun 2022

    TorchSparse has been adopted by SparseNeuS for neural surface reconstruction.

    TorchSparse
  • May 2022
    Congrats
    Song Han
     on
    2022 Red Dot Award
    .
  • May 2022
    Congrats
    Hanrui Wang
     on
    2022 ACM Student Research Competition Award 1st Place
    .
  • Nov 2021
    Song Han
     presented "
    Plenary: Putting AI on a Diet: TinyML and Efficient Deep Learning
    " at
    TinyML Technical Forum Asia
    .
    VideoSlidesMediaEvent
  • Nov 2021
    Song Han
     presented "
    TinyML and Efficient Deep Learning for Automotive Applications
    " at
    Hyundai Motor Group Developers Conference
    .
    VideoSlidesMediaEvent
  • Oct 2021
    Song Han
     presented "
    Challenges and Directions of Low-Power Computer Vision
    " at
    International Conference on Computer Vision (ICCV) Workshop Panel
    .
    VideoSlidesMediaEvent
  • Oct 2021
    Song Han
     presented "
    TinyML Techniques for Greener, Faster and Sustainable AI
    " at
    IBM IEEE CAS/EDS – AI Compute Symposium
    .
    VideoSlidesMediaEvent
  • Aug 2021
    Song Han
     presented "
    Putting AI On A Diet: TinyML and Efficient Deep Learning
    " at
    Silicon Research Cooperation (SRC) AI Hardware E-Workshops
    .
    VideoSlidesMediaEvent
  • Aug 2021
    Song Han
     presented "
    Frontiers of AI Accelerators: Technologies, Circuits and Applications
    " at
    Hong Kong University of Science and Technology, AI Chip Center for Emerging Smart Systems
    .
    VideoSlidesMediaEvent
  • Aug 2021
    Song Han
     presented "
    AutoML for Tiny Machine Learning
    " at
    AutoML Workshop at Knowledge Discovery and Data Mining (KDD) Conference
    .
    VideoSlidesMediaEvent
  • Jun 2021
    Congrats
    SPVNAS
     team on
    First Price
     of
    6th AI Driving Olympics
     on
    nuScenes Semantic Segmentation
     @
    ICRA
     
    2021
    .
    SPVNAS
  • Jun 2021
    Song Han
     presented "
    Putting AI on a Diet: TinyML and Efficient Deep Learning
    " at
    Shanghai Jiaotong University
    .
    VideoSlidesMediaEvent
  • Jun 2021
    Song Han
     presented "
    Putting AI on a Diet: TinyML and Efficient Deep Learning
    " at
    MLOps World – Machine Learning in Production
    .
    VideoSlidesMediaEvent
  • Jun 2021
    Song Han
     presented "
    Putting AI on a Diet: TinyML and Efficient Deep Learning
    " at
    Efficient Deep Learning for Computer Vision Workshop at CVPR
    .
    VideoSlidesMediaEvent
  • Jun 2021
    Song Han
     presented "
    Machine Learning for Analog and Digital Design
    " at
    VLSI symposia workshop on AI/Machine Learning for Circuit Design and Optimization
    .
    VideoSlidesMediaEvent
  • Jun 2021
    Song Han
     presented "
    NAAS: Neural-Accelerator Architecture Search
    " at
    4th International Workshop on AI-assisted Design for Architecture at ISCA
    .
    VideoSlidesMediaEvent
  • May 2021
    Congrats
    Song Han
     on
    2021 Samsung Global Research Outreach (GRO) Award
    .
  • May 2021
    Congrats
    Song Han
     on
    2021 NVIDIA Academic Partnership Award
    .
  • May 2021
    Congrats
    Hanrui Wang
     on
    2021 Qualcomm Innovation Fellowship
    .
  • May 2021
    Congrats
    Han Cai
     on
    the 2021 Qualcomm Innovation Fellowship
    .
  • May 2021
    Congrats
    Zhijian Liu
     on
    the 2021 Qualcomm Innovation Fellowship
    .
  • May 2021
    Congrats
    Yujun Lin
     on
    the 2021 DAC Young Fellowship
    .
  • May 2021
    Congrats
    Yujun Lin
     on
    the 2021 Qualcomm Innovation Fellowship
    .
  • May 2021
    Song Han
     presented "
    Putting AI on a Diet: TinyML and Efficient Deep Learning
    " at
    Apple’s On-Device ML Workshop
    .
    VideoSlidesMediaEvent
  • Apr 2021
    Song Han
     presented "
    Putting AI on a Diet: TinyML and Efficient Deep Learning
    " at
    ISQED’21 Embedded Tutorials
    .
    VideoSlidesMediaEvent
  • Apr 2021
    Song Han
     presented "
    Putting AI on a Diet: TinyML and Efficient Deep Learning
    " at
    MLSys’21 On-Device Intelligence Workshop
    .
    VideoSlidesMediaEvent
  • Jan 2021
    Song Han
     presented "
    Efficient AI: Reducing the Carbon Footprint of AI in the Internet of Things (IoT)
    " at
    MIT ILP Japan conference
    .
    VideoSlidesMediaEvent
  • Dec 2020
    Congrats
    Hanrui Wang
     team
     on
    Best Presentation Award
     of
    DAC 2020 Young Fellow
     
    .
  • Nov 2020
    Song Han
     presented "
    Putting AI on a Diet: TinyML and Efficient Deep Learning
    " at
    MIT ILP webinar session on low power/edge/efficient computing
    .
    VideoSlidesMediaEvent
  • Jul 2020
    Congrats
    SPVNAS
     team on
    First Place
     of
    SemanticKITTI leaderboard
     on
    3D semantic segmentation
     @
    ECCV
     
    2020
    .
    SPVNAS
  • Jul 2020
    A new blog post
    Reducing the carbon footprint of AI using the Once-for-All network
     is published.
    “The aim is smaller, greener neural networks,” says Song Han, an assistant professor in the Department of Electrical Engineering and Computer Science. “Searching efficient neural network architectures has until now had a huge carbon footprint. But we reduced that footprint by orders of magnitude with these new methods.”
  • Jun 2020
    Congrats
    OFA
     team on
    First Place
     of
    Low-Power Computer Vision Challenge
     on
    CPU Detection, FPGA
     @
    CVPR
     
    2020
    .
    OFA
  • May 2020
    A new blog post
    Efficiently Understanding Videos, Point Cloud and Natural Language on NVIDIA Jetson Xavier NX
     is published.
    Thanks to NVIDIA’s amazing deep learning eco-system, we are able to deploy three applications on Jetson Xavier NX soon after we receive the kit, including efficient video understanding with Temporal Shift Module (TSM, ICCV’19), efficient 3D deep learning with Point-Voxel CNN (PVCNN, NeurIPS’19), and efficient machine translation with hardware-aware transformer (HAT, ACL’20).
  • May 2020
    Congrats
    Song Han
     on
    2020 NVIDIA Academic Partnership Award
    .
  • May 2020
    Congrats
    Song Han
     on
    2020 IEEE "AIs 10 to Watch: The Future of AI" Award
    .
  • May 2020
    Congrats
    Song Han
     on
    2020 NSF CAREER Award
    .
  • May 2020
    Congrats
    Song Han
     on
    2020 SONY Faculty Award
    .
  • May 2020
    Congrats
    Ji Lin
     on
    the 2020 Nvidia Graduate Fellowship Finalist
    .
  • May 2020
    Congrats
    Hanrui Wang
     on
    the 2020 Nvidia Graduate Fellowship Finalist
    .
  • May 2020
    Congrats
    Hanrui Wang
     on
    the 2021 Analog Devices Outstanding Student Designer Award
    .
  • May 2020
    Congrats
    Hanrui Wang
     on
    the 2020 DAC Young Fellowship
    .
  • Apr 2020
    Song Han
     presented "
    Once-for-All: Train One Network and Specialize it for Efficient Deployment
    " at
    TinyML Webinar
    .
    VideoSlidesMediaEvent
  • Oct 2019
    Congrats
    OFA
     team on
    First Place
     of
    Low-Power Computer Vision Workshop at ICCV 2019
     on
    DSP
     @
    ICCV
     
    2019
    .
    OFA
  • Jun 2019
    Congrats
    Hanrui WangPark
     team
     on
    Best Paper Award
     of
    ICML 2019 Reinforcement Learning for Real Life Workshop
     
    .
    Park
  • Jun 2019
    Congrats
    OFA
     team on
    First Place
     of
    Low-Power Image Recognition Challenge
     on
    classification, detection
     @
    IEEE
     
    2019
    .
    OFA
  • Jun 2019
    Congrats
    ProxylessNAS
     team on
    First Place
     of
    Visual Wake Words Challenge
     on
    TF-lite track
     @
    CVPR
     
    2019
    .
    ProxylessNAS
  • May 2019
    Congrats
    Song Han
     on
    2019 MIT Technology Review list of 35 Innovators Under 35
    .
  • May 2019
    Congrats
    Song Han
     on
    2019 Amazon Machine Learning Research Award
    .
  • May 2019
    Congrats
    Song Han
     on
    2019 Facebook Research Award
    .
  • Aug 2018
    Congrats
    Yujun Lin
     on
    the 2018 Robert J. Shillman Fellowship
    .
  • May 2018
    Congrats
    Song Han
     on
    2018 SONY Faculty Award
    .
  • May 2018
    Congrats
    Song Han
     on
    2018 Amazon Machine Learning Research Award
    .
  • May 2017
    Congrats
    Song Han
     team
     on
    Best Paper Award
     of
    FPGA 2017
     
    .
  • May 2017
    Congrats
    Song Han
     on
    2017 SONY Faculty Award
    .
  • May 2016
    Congrats
    Song Han
     team
     on
    Best Paper Award
     of
    ICLR 2016
     
    .

Our Full-Stack Projects

To choose projects, simply check the boxes of the categories, topics and techniques.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Our Impacts

We actively collaborate with industry partners on efficient AI, model compression and acceleration. Our research has influenced and landed in many industrial products: Intel OpenVino, Intel Neural Network Distiller, Intel Neural Compressor, Apple Neural Engine, NVIDIA Sparse Tensor Core, NVIDIA TensorRT LLM, AMD-Xilinx Vitis AI, Qualcomm AI Model Efficiency Toolkit (AIMET), Amazon AutoGluon, Facebook PyTorch, Microsoft NNI, SONY Neural Architecture Search Library, SONY Model Compression Toolkit,  ADI MAX78000/MAX78002 Model Training and Synthesis Tool.