Efficient AI Computing,
Transforming the Future.

Publications

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Yujun Lin*¹, Haotian Tang*¹, Shang Yang*¹, Zhekai Zhang¹, Guangxuan Xiao¹, Chuang Gan³⁴, Song Han¹²
ArXiv 2024

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Muyang Li*, Tianle Cai*, Jiaxin Cao, Qinsheng Zhang, Han Cai, Junjie Bai, Yangqing Jia, Ming-Yu Liu, Kai Li, and Song Han
CVPR 2024

VILA: On Pre-training for Visual Language Models

Ji Lin*, Hongxu Yin*, Wei Ping, Yao Lu, Pavlo Molchanov, Andrew Tao, Huizi Mao, Jan Kautz, Mohammad Shoeybi, Song Han
CVPR 2024

Condition-Aware Neural Network for Controlled Image Generation

Han Cai, Muyang Li, Zhuoyang Zhang, Qinsheng Zhang, Ming-Yu Liu, Song Han
CVPR 2024

Efficient Streaming Language Models with Attention Sinks

Guangxuan Xiao¹, Yuandong Tian², Beidi Chen³, Song Han¹⁴, Mike Lewis²
ICLR 2024

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Ji Lin*, Jiaming Tang*, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, and Song Han
MLSys 2024

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

Yukang Chen¹, Shengju Qian¹, Haotian Tang², Xin Lai¹, Zhijian Liu², Song Han², Jiaya Jia¹
ICLR 2024

Tiny Machine Learning Projects

Ji Lin, Ligeng Zhu, Wei-Ming Chen, Wei-Chen Wang, Han Cai, Guangxuan Xiao, Haotian Tang, Shang Yang, Yujun Lin, and Song Han
NeurIPS 2020/2021/2022, MICRO 2023, ICML 2023, MLSys 2024, IEEE CAS Magazine 2023

Tiny Machine Learning: Progress and Futures [Feature]

Ji Lin, Ligeng Zhu, Wei-Ming Chen, Wei-Chen Wang, and Song Han
IEEE CAS magazine

TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs

Haotian Tang*¹, Shang Yang*¹², Zhijian Liu¹, Ke Hong², Zhongming Yu³, Xiuyu Li⁴, Guohao Dai⁵, Yu Wang², Song Han¹
MICRO 2023

PockEngine: Sparse and Efficient Fine-tuning in a Pocket

Ligeng Zhu, Lanxiang Hu, Ji Lin, Wei-Chen Wang, Wei-Ming Chen, Chuang Gan, Song Han
MICRO 2023

EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction

Han Cai, Junyan Li, Muyan Hu, Chuang Gan, Song Han
ICCV 2023

SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Guangxuan Xiao*¹, Ji Lin*¹, Mickael Seznec², Hao Wu², Julien Demouth², Song Han¹
ICML 2023

FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer

Zhijian Liu*, Xinyu Yang*, Haotian Tang, Shang Yang, Song Han
CVPR 2023

SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer

Xuanyao Chen*¹, Zhijian Liu*², Haotian Tang², Li Yi¹, Hang Zhao¹, Song Han²
CVPR 2023

Retrospective: EIE: Efficient Inference Engine on Sparse and Compressed Neural Network

Song Han¹³, Xingyu Liu⁴, Huizi Mao³ , Jing Pu⁵ , Ardavan Pedram²⁶ , Mark A. Horowitz² , William J. Dally²³
ISCA 2023

BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation

Zhijian Liu*, Haotian Tang*, Alexander Amini, Xinyu Yang, Huizi Mao, Daniela L. Rus, Song Han
ICRA 2023

Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models

Muyang Li¹, Ji Lin¹, Chenlin Meng³, Stefano Ermon³, Song Han¹, and Jun-Yan Zhu²
NeurIPS 2022 & TPAMI

TorchSparse: Efficient Point Cloud Inference Engine

Haotian Tang*, Zhijian Liu*, Xiuyu Li*, Yujun Lin, Song Han
MLSys 2022

QuantumNAT: Quantum Noise-Aware Training with Noise Injection, Quantization and Normalization

Hanrui Wang¹, Jiaqi Gu², Yongshan Ding³, Zirui Li⁴, David Z. Pan³, Frederic T. Chong⁵, Song Han¹
DAC 2022

QOC: Quantum On-Chip Training with Parameter Shift and Gradient Pruning

Hanrui Wang¹, Zirui Li², Jiaqi Gu³, Yongshan Ding⁴, Yujun Lin¹, David Z. Pan³, Frederic T. Chong⁵, Song Han¹
DAC 2022

On-Device Training Under 256KB Memory

Ji Lin*, Ligeng Zhu*, Wei-Ming Chen, Wei-Chen Wang, Chuang Gan, Song Han
NeurIPS 2022

Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation

Yihan Wang, Muyang Li, Han Cai, Wei-Ming Chen, Song Han
CVPR 2022

QuantumNAS: Noise-Adaptive Search for Robust Quantum Circuits

Hanrui Wang¹, Yongshan Ding², Jiaqi Gu³, Zirui Li⁴, Yujun Lin¹, David Z. Pan³, Frederic T. Chong⁵, Song Han¹
HPCA 2022

Network Augmentation for Tiny Deep Learning

Han Cai, Chuang Gan, Ji Lin, Song Han
ICLR 2022

NAAS: Neural Accelerator Architecture Search

Yujun Lin *¹, Mengtian Yang *², Song Han¹
DAC 2021

MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning

Ji Lin, Wei-Ming Chen, Han Cai, Chuang Gan, Song Han
NeurIPS 2021

PointAcc: Efficient Point Cloud Accelerator

Yujun Lin, Zhekai Zhang, Haotian Tang, Hanrui Wang, Song Han
MICRO 2021

SemAlign: Annotation-Free Camera-LiDAR Calibration with Semantic Alignment Loss

Zhijian Liu*, Haotian Tang*, Sibo Zhu*, Song Han
IROS 2021

Anycost GANs for Interactive Image Synthesis and Editing

Ji Lin, Richard Zhang, Frieder Ganz, Song Han, Jun-Yan Zhu
CVPR 2021

Differentiable Augmentation for Data-Efficient GAN Training

Shengyu Zhao, Zhijian Liu, Ji Lin, Jun-Yan Zhu, Song Han
NeurIPS 2020

Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution

Haotian Tang*, Zhijian Liu*, Shengyu Zhao, Yujun Lin, Ji Lin, Hanrui Wang, Song Han
ECCV 2020

MCUNet: Tiny Deep Learning on IoT Devices

Ji Lin, Wei-Ming Chen, Yujun Lin, John Cohn, Chuang Gan, Song Han
NeurIPS 2020

APQ: Joint Search for Nerwork Architecture, Pruning and Quantization Policy

Tianzhe Wang, Kuan Wang, Han Cai, Ji Lin, Zhijian Liu, Song Han
CVPR 2020

GAN Compression: Efficient Architectures for Interactive Conditional GANs

Muyang Li¹, Ji Lin², Yaoyao Ding³, Zhijian Liu², Jun-Yan Zhu¹ and Song Han²
CVPR 2020 & TPAMI

SpArch: Efficient Architecture for Sparse Matrix Multiplication

Zhekai Zhang*, Hanrui Wang*, Song Han, William J. Dally
HPCA 2020

Once-for-All: Train One Network and Specialize it for Efficient Deployment

Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, Song Han
ICLR 2020

Lite Transformer with Long-Short Range Attention

Zhanghao Wu*, Zhijian Liu*, Ji Lin, Yujun Lin, Song Han
ICLR 2020

HAT: Hardware-Aware Transformers for Efficient Natural Language Processing

Hanrui Wang¹, Zhanghao Wu¹, Zhijian Liu¹, Han Cai¹, Ligeng Zhu¹, Chuang Gan², Song Han¹
ACL 2020

Point-Voxel CNN for Efficient 3D Deep Learning

Zhijian Liu*, Haotian Tang*, Yujun Lin, Song Han
NeurIPS 2019

TSM: Temporal Shift Module for Efficient Video Understanding

Ji Lin¹, Chuang Gan², Song Han¹
ICCV 2019

HAQ: Hardware-Aware Automated Quantization

Kuan Wang*, Zhijian Liu*, Yujun Lin*, Ji Lin, and Song Han
CVPR 2019

Deep Gradient Compression: Reducing the Communication Bandwidth in Distributed Training

Yujun Lin¹, Song Han², Huizi Mao², Yu Wang¹, William J. Dally²³
ICLR 2018

AMC: AutoML for Model Compression and Acceleration on Mobile Devices

Yihui He*, Ji Lin*, Zhijian Liu, Hanrui Wang, Li-Jia Li, and Song Han
ECCV 2018

EIE: efficient inference engine on compressed deep neural network

Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, William J. Dally
ISCA 2016

Learning both Weights and Connections for Efficient Neural Network

Song Han, Jeff Pool, John Tran, William Dally
NIPS 2015