Efficient AI Computing,
Transforming the Future.

Publications

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers

Enze Xie¹*, Junsong Chen¹*, Junyu Chen²³, Han Cai¹, Haotian Tang², Yujun Lin², Zhekai Zhang², Muyang Li², Ligeng Zhu¹, Yao Lu¹, Song Han¹²
2024

SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Muyang Li*, Yujun Lin*, Zhekai Zhang*, Tianle Cai, Xiuyu Li, Junxian Guo, Enze Xie, Chenlin Meng, Jun-Yan Zhu, Song Han
ArXiv 2024

VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation

Yecheng Wu*, Zhuoyang Zhang*, Junyu Chen, Haotian Tang, Dacheng Li, Yunhao Fang, Ligeng Zhu, Enze Xie, Hongxu Yin, Li Yi, Song Han, Yao Lu
arxiv 2024

Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models

Junyu Chen*, Han Cai*, Junsong Chen, Enze Xie, Shang Yang, Haotian Tang, Muyang Li, Yao Lu, Song Han
Preprint

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Guangxuan Xiao¹, Jiaming Tang¹, Jingwei Zuo², Junxian Guo¹ ³, Shang Yang¹, Haotian Tang¹, Yao Fu⁴, Song Han¹ ⁵
arXiv

HART: Efficient Visual Generation with Hybrid Autoregressive Transformer

Haotian Tang*, Yecheng Wu*, Shang Yang, Enze Xie, Junsong Chen, Junyu Chen, Zhuoyang Zhang, Han Cai, Yao Lu, Song Han
arXiv 2024

Sparse Refinement for Efficient High-Resolution Semantic Segmentation

Zhijian Liu*, Zhuoyang Zhang*, Samir Khaki, Shang Yang, Haotian Tang, Chenfeng Xu, Kurt Keutzer, Song Han
ECCV 2024

FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention

Guangxuan Xiao*¹, Tianwei Yin*¹, William T. Freeman¹, Frédo Durand¹, Song Han¹
International Journal of Computer Vision 2024

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Yujun Lin*¹, Haotian Tang*¹, Shang Yang*¹, Zhekai Zhang¹, Guangxuan Xiao¹, Chuang Gan³⁴, Song Han¹²
ArXiv 2024

Atomique: A Quantum Compiler for Reconfigurable Neutral Atom Arrays

Hanrui Wang, Pengyu Liu, Daniel Bochen Tan, Yilian Liu, Jiaqi Gu, David Z. Pan, Jason Cong, Umut A. Acar, Song Han
ISCA 2024

Q-Pilot: Field Programmable Qubit Array Compilation with Flying Ancillas

Hanrui Wang, Daniel Bochen Tan, Pengyu Liu, Yilian Liu, Jiaqi Gu, Jason Cong, Song Han
DAC 2024

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Muyang Li*, Tianle Cai*, Jiaxin Cao, Qinsheng Zhang, Han Cai, Junjie Bai, Yangqing Jia, Ming-Yu Liu, Kai Li, and Song Han
CVPR 2024

VILA: On Pre-training for Visual Language Models

Ji Lin*, Hongxu Yin*, Wei Ping, Yao Lu, Pavlo Molchanov, Andrew Tao, Huizi Mao, Jan Kautz, Mohammad Shoeybi, Song Han
CVPR 2024

Condition-Aware Neural Network for Controlled Image Generation

Han Cai, Muyang Li, Zhuoyang Zhang, Qinsheng Zhang, Ming-Yu Liu, Song Han
CVPR 2024

Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Jiaming Tang*, Yilong Zhao*, Kan Zhu, Guangxuan Xiao, Baris Kasikci, Song Han
ICML 2024

Efficient Streaming Language Models with Attention Sinks

Guangxuan Xiao¹, Yuandong Tian², Beidi Chen³, Song Han¹⁴, Mike Lewis²
ICLR 2024

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Ji Lin*, Jiaming Tang*, Haotian Tang*, Shang Yang*, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, and Song Han
MLSys 2024

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

Yukang Chen¹, Shengju Qian¹, Haotian Tang², Xin Lai¹, Zhijian Liu², Song Han², Jiaya Jia¹
ICLR 2024

Tiny Machine Learning Projects

Ji Lin, Ligeng Zhu, Wei-Ming Chen, Wei-Chen Wang, Han Cai, Guangxuan Xiao, Haotian Tang, Shang Yang, Yujun Lin, and Song Han
NeurIPS 2020/2021/2022, MICRO 2023, ICML 2023, MLSys 2024, IEEE CAS Magazine 2023

Tiny Machine Learning: Progress and Futures [Feature]

Ji Lin, Ligeng Zhu, Wei-Ming Chen, Wei-Chen Wang, and Song Han
IEEE CAS magazine

TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs

Haotian Tang*¹, Shang Yang*¹², Zhijian Liu¹, Ke Hong², Zhongming Yu³, Xiuyu Li⁴, Guohao Dai⁵, Yu Wang², Song Han¹
MICRO 2023

PockEngine: Sparse and Efficient Fine-tuning in a Pocket

Ligeng Zhu, Lanxiang Hu, Ji Lin, Wei-Chen Wang, Wei-Ming Chen, Chuang Gan, Song Han
MICRO 2023

EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction

Han Cai, Junyan Li, Muyan Hu, Chuang Gan, Song Han
ICCV 2023

SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Guangxuan Xiao*¹, Ji Lin*¹, Mickael Seznec², Hao Wu², Julien Demouth², Song Han¹
ICML 2023

FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer

Zhijian Liu*, Xinyu Yang*, Haotian Tang, Shang Yang, Song Han
CVPR 2023

SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer

Xuanyao Chen*¹, Zhijian Liu*², Haotian Tang², Li Yi¹, Hang Zhao¹, Song Han²
CVPR 2023

Retrospective: EIE: Efficient Inference Engine on Sparse and Compressed Neural Network

Song Han¹³, Xingyu Liu⁴, Huizi Mao³ , Jing Pu⁵ , Ardavan Pedram²⁶ , Mark A. Horowitz² , William J. Dally²³
ISCA 2023

BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation

Zhijian Liu*, Haotian Tang*, Alexander Amini, Xinyu Yang, Huizi Mao, Daniela L. Rus, Song Han
ICRA 2023

QuEst: Graph Transformer for Quantum Circuit Reliability Estimation

Hanrui Wang, Pengyu Liu, Jinglei Cheng, Zhiding Liang, Jiaqi Gu, Zirui Li, Yongshan Ding, Weiwen Jiang, Yiyu Shi, Xuehai Qian, David Z Pan, Frederic T Chong, Song Han
ICCAD 2022

Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models

Muyang Li¹, Ji Lin¹, Chenlin Meng³, Stefano Ermon³, Song Han¹, and Jun-Yan Zhu²
NeurIPS 2022 & TPAMI

TorchSparse: Efficient Point Cloud Inference Engine

Haotian Tang*, Zhijian Liu*, Xiuyu Li*, Yujun Lin, Song Han
MLSys 2022

QuantumNAT: Quantum Noise-Aware Training with Noise Injection, Quantization and Normalization

Hanrui Wang¹, Jiaqi Gu², Yongshan Ding³, Zirui Li⁴, David Z. Pan³, Frederic T. Chong⁵, Song Han¹
DAC 2022

QOC: Quantum On-Chip Training with Parameter Shift and Gradient Pruning

Hanrui Wang¹, Zirui Li², Jiaqi Gu³, Yongshan Ding⁴, Yujun Lin¹, David Z. Pan³, Frederic T. Chong⁵, Song Han¹
DAC 2022

On-Device Training Under 256KB Memory

Ji Lin*, Ligeng Zhu*, Wei-Ming Chen, Wei-Chen Wang, Chuang Gan, Song Han
NeurIPS 2022

Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation

Yihan Wang, Muyang Li, Han Cai, Wei-Ming Chen, Song Han
CVPR 2022

QuantumNAS: Noise-Adaptive Search for Robust Quantum Circuits

Hanrui Wang¹, Yongshan Ding², Jiaqi Gu³, Zirui Li⁴, Yujun Lin¹, David Z. Pan³, Frederic T. Chong⁵, Song Han¹
HPCA 2022

Network Augmentation for Tiny Deep Learning

Han Cai, Chuang Gan, Ji Lin, Song Han
ICLR 2022

NAAS: Neural Accelerator Architecture Search

Yujun Lin *¹, Mengtian Yang *², Song Han¹
DAC 2021

MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning

Ji Lin, Wei-Ming Chen, Han Cai, Chuang Gan, Song Han
NeurIPS 2021

Delayed Gradient Averaging: Tolerate the Communication Latency in Federated Learning

Ligeng Zhu¹, Hongzhou Lin², Yao Lu³, Yujun Lin¹, Song Han¹
NeurIPS 2021

PointAcc: Efficient Point Cloud Accelerator

Yujun Lin, Zhekai Zhang, Haotian Tang, Hanrui Wang, Song Han
MICRO 2021

SemAlign: Annotation-Free Camera-LiDAR Calibration with Semantic Alignment Loss

Zhijian Liu*, Haotian Tang*, Sibo Zhu*, Song Han
IROS 2021

Anycost GANs for Interactive Image Synthesis and Editing

Ji Lin, Richard Zhang, Frieder Ganz, Song Han, Jun-Yan Zhu
CVPR 2021

Differentiable Augmentation for Data-Efficient GAN Training

Shengyu Zhao, Zhijian Liu, Ji Lin, Jun-Yan Zhu, Song Han
NeurIPS 2020

Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution

Haotian Tang*, Zhijian Liu*, Shengyu Zhao, Yujun Lin, Ji Lin, Hanrui Wang, Song Han
ECCV 2020

MCUNet: Tiny Deep Learning on IoT Devices

Ji Lin, Wei-Ming Chen, Yujun Lin, John Cohn, Chuang Gan, Song Han
NeurIPS 2020

GCN-RL Circuit Designer: Transferable Transistor Sizing with Graph Neural Networks and Reinforcement Learning

Hanrui Wang, Kuan Wang, Jiacheng Yang, Linxiao Shen, Nan Sun, Hae-Seung Lee, Song Han
DAC 2020

APQ: Joint Search for Nerwork Architecture, Pruning and Quantization Policy

Tianzhe Wang, Kuan Wang, Han Cai, Ji Lin, Zhijian Liu, Hanrui Wang, Yujun Lin, Song Han
CVPR 2020

GAN Compression: Efficient Architectures for Interactive Conditional GANs

Muyang Li¹, Ji Lin², Yaoyao Ding³, Zhijian Liu², Jun-Yan Zhu¹ and Song Han²
CVPR 2020 & TPAMI

SpArch: Efficient Architecture for Sparse Matrix Multiplication

Zhekai Zhang*, Hanrui Wang*, Song Han, William J. Dally
HPCA 2020

Once-for-All: Train One Network and Specialize it for Efficient Deployment

Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, Song Han
ICLR 2020

Lite Transformer with Long-Short Range Attention

Zhanghao Wu*, Zhijian Liu*, Ji Lin, Yujun Lin, Song Han
ICLR 2020

HAT: Hardware-Aware Transformers for Efficient Natural Language Processing

Hanrui Wang¹, Zhanghao Wu¹, Zhijian Liu¹, Han Cai¹, Ligeng Zhu¹, Chuang Gan², Song Han¹
ACL 2020

Point-Voxel CNN for Efficient 3D Deep Learning

Zhijian Liu*, Haotian Tang*, Yujun Lin, Song Han
NeurIPS 2019

Park: An Open Platform for Learning-Augmented Computer Systems

Hongzi Mao, Parimarjan Negi, Akshay Narayan, Hanrui Wang, Jiacheng Yang, Haonan Wang, Ryan Marcus, Mehrdad Khani Shirkoohi, Songtao He, Vikram Nathan, Frank Cangialosi, Shaileshh Venkatakrishnan, Wei-Hung Weng, Song Han, Tim Kraska, Mohammad Alizadeh
NeurIPS 2019

TSM: Temporal Shift Module for Efficient Video Understanding

Ji Lin¹, Chuang Gan², Song Han¹
ICCV 2019

HAQ: Hardware-Aware Automated Quantization

Kuan Wang*, Zhijian Liu*, Yujun Lin*, Ji Lin, and Song Han
CVPR 2019

Deep Gradient Compression: Reducing the Communication Bandwidth in Distributed Training

Yujun Lin¹, Song Han², Huizi Mao², Yu Wang¹, William J. Dally²³
ICLR 2018

Learning to Design Circuits

Hanrui Wang, Jiacheng Yang, Hae-Seung Lee, Song Han
NIPS 2019 MLSys Workshop

AMC: AutoML for Model Compression and Acceleration on Mobile Devices

Yihui He*, Ji Lin*, Zhijian Liu, Hanrui Wang, Li-Jia Li, and Song Han
ECCV 2018

EIE: efficient inference engine on compressed deep neural network

Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, William J. Dally
ISCA 2016

Learning both Weights and Connections for Efficient Neural Network

Song Han, Jeff Pool, John Tran, William Dally
NIPS 2015