News

Jun 2024
6/21/2024
DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
appears at
CVPR 2024
.
A training-free algorithm to harness multiple GPUs to accelerate diffusion model inference without sacrificing image quality.
DistriFusion Paper Code Slides Video

Jun 2024
6/17/2024
VILA: On Pre-training for Visual Language Models
appears at
CVPR 2024
.
VILA is a visual language model (VLM) pre-trained with interleaved image-text data at scale, enabling multi-image VLM. VILA is deployable on the edge.
VILA Paper Code Slides Video

Feb 2020
2/1/2020
Lite Transformer with Long-Short Range Attention
appears at
ICLR 2020
.
Lite Transformer is an efficient mobile NLP architecture. The key primitive is the Long-Short Range Attention (LSRA), where one group of heads specializes in the local context modeling (by convolution) while another group specializes in the long-distance relationship modeling (by attention).
Lite Transformer Paper Code Slides Video

Dec 2023
12/1/2023
Tiny Machine Learning: Progress and Futures [Feature]
appears at
IEEE CAS magazine
.
We discuss the definition, challenges, and applications of TinyML.
TinyML-Magazine Paper Code Slides Video

Jan 2024
1/17/2024
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
appears at
ICLR 2024
.
LongLoRA takes advantage of shifted sparse attention to greatly reduce the finetuning cost of long context LLMs.
LongLoRA Paper Code Slides Video

Jun 2023
6/15/2023
Retrospective: EIE: Efficient Inference Engine on Sparse and Compressed Neural Network
appears at
ISCA 2023
.
EIE proposed to accelerate pruned and compressed neural networks, exploiting weight sparsity, activation sparsity, and 4-bit weight-sharing in neural network accelerators.
EIE Retrospective Paper Code Slides Video

Dec 2020
12/6/2020
Differentiable Augmentation for Data-Efficient GAN Training
appears at
NeurIPS 2020
.
Differentiable augmentation to improve the data efficiency of GAN training.
DiffAugment Paper Code Slides Video

May 2024
5/10/2024
Efficient Streaming Language Models with Attention Sinks
appears at
ICLR 2024
.
We enable LLMs to work on infinite-length texts without compromising efficiency and performance.
StreamingLLM Paper Code Slides Video

Feb 2021
2/17/2021
SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
appears at
HPCA 2021
.
Pruning and Quantization for Transformer models such as BERT and GPT
SpAtten Paper Code Slides Video

Jan 2020
1/1/2020
HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
appears at
ACL 2020
.
HAT NAS framework leverages the hardware feedback in the neural architecture search loop, providing a most suitable model for the target hardware platform. The results on different hardware platforms and datasets show that HAT searched models have better accuracy-efficiency trade-offs.
HAT Paper Code Slides Video

May 2024
5/13/2024
AWQ has been awarded the Best Paper of MLSys 2024!
AWQ

Mar 2024
3/29/2024
We show SmoothQuant can enable W8A8 quantization for Llama-1/2, Falcon, Mistral, and Mixtral models with negligible loss.
SmoothQuant

Feb 2024
2/24/2024
AWQ has been accepted to MLSys 2024!
AWQ

Feb 2024
2/13/2024
Our work StreamingLLM is covered by MIT News as spotlight!
StreamingLLM

Feb 2024
2/1/2024
We supported VILA Vision Languague Models in AWQ & TinyChat! Check our latest demos with multi-image inputs!
AWQ

Feb 2024
2/1/2024
We released new version of quantized GEMM/GEMV kernels in TinyChat, leading to 38 tokens/second inference speed on NVIDIA Jetson Orin!
AWQ

Jan 2024
1/7/2024
SwiftInfer, a TensorRT-based implementation makes StreamingLLM more production-grade.
StreamingLLM

Jan 2024
1/2/2024
StreamingLLM is integrated into NVIDIA TensorRT-LLM!
StreamingLLM

Dec 2023
12/15/2023
Congrats Ji Lin completed and defended his PhD thesis: "Efficient Deep Learning Computing: From TinyML to Large Language Model". Ji joined OpenAI after graduation.

Dec 2023
12/15/2023
StreamingLLM enables endless and efficient LLM generation on iPhone!
StreamingLLM

Dec 2023
12/5/2023
Attention Sink is integrated by HuggingFace Transformers' main branch.
StreamingLLM

Dec 2023
12/5/2023
AWQ is integrated by HuggingFace Transformers' main branch.
AWQ

Dec 2023
12/5/2023
AWQ is integrate by NVIDIA TensorRT-LLM, can fit Falcon-180B on a single H200GPU with INT4 AWQ, and 6.7x faster Llama-70B over A100.
AWQ

Dec 2023
12/5/2023
SmoothQuant is integrate by NVIDIA TensorRT-LLM.
SmoothQuant

Oct 2023
10/20/2023
StreamingLLM is integrated into Intel Extension for Transformers.
StreamingLLM

Oct 2023
10/9/2023
Attention Sinks, an library from community enables StreamingLLM on more Huggingface LLMs. blog.
StreamingLLM

Sep 2023
9/1/2023
AWQ is integrated into FastChat, vLLM, HuggingFace TGI, and LMDeploy.
AWQ

Jul 2023
7/30/2023
The TinyML and Efficient Deep Learning Computing course will be returning in Fall, with live sessions on YouTube!
6.5940

Jul 2023
7/1/2023
We released TinyChat, an efficient and lightweight chatbot interface based on AWQ. TinyChat enables efficient LLM inference on both cloud and edge GPUs. Llama-2-chat models are supported! Check out our implementation here.
AWQ

Nov 2022
11/1/2022
Congrats
On-Device Training
team on
First Place (1/150)
of
ACM/IEEE TinyML Design Contest
on
Memory Occupation Track
@
ICCAD

2022
.
On-Device Training

Jul 2020
7/30/2020
Congrats
SPVNAS
team on
First Place
of
SemanticKITTI leaderboard
on
3D semantic segmentation
@
ECCV

2020
.
SPVNAS

Jun 2021
6/1/2021
Congrats
SPVNAS
team on
First Price
of
6th AI Driving Olympics
on
nuScenes Semantic Segmentation
@
ICRA

2021
.
SPVNAS

Oct 2019
10/1/2019
Congrats
OFA
team on
First Place
of
Low-Power Computer Vision Workshop at ICCV 2019
on
DSP
@
ICCV

2019
.
OFA

Jun 2019
6/1/2019
Congrats
OFA
team on
First Place
of
Low-Power Image Recognition Challenge
on
classification, detection
@
IEEE

2019
.
OFA

Jun 2020
6/1/2020
Congrats
OFA
team on
First Place
of
Low-Power Computer Vision Challenge
on
CPU Detection, FPGA
@
CVPR

2020
.
OFA

Jun 2019
6/1/2019
Congrats
ProxylessNAS
team on
First Place
of
Visual Wake Words Challenge
on
TF-lite track
@
CVPR

2019
.
ProxylessNAS

Nov 2023
11/12/2023
Congrats
Zhijian Liu
on
2023 Rising Stars in Data Science
.

Jan 2023
1/25/2023
Congrats
Hanrui Wang
on
MARC 2023 Best Pitch Award
.

Nov 2022
11/1/2022
Congrats
Hanrui Wang
on
Gold Medal of ACM Student Research Competition
.

Aug 2023
8/17/2023
Congrats
Hanrui Wang
on
2023 Rising Stars in ML and Systems
.

May 2023
5/1/2023
Congrats
Song Han
on
2023 Sloan Research Fellowship
.

May 2022
5/1/2022
Congrats
Song Han
on
2022 Red Dot Award
.

May 2021
5/1/2021
Congrats
Song Han
on
2021 Samsung Global Research Outreach (GRO) Award
.

May 2021
5/1/2021
Congrats
Song Han
on
2021 NVIDIA Academic Partnership Award
.

May 2020
5/1/2020
Congrats
Song Han
on
2020 NVIDIA Academic Partnership Award
.

May 2020
5/1/2020
Congrats
Song Han
on
2020 IEEE "AIs 10 to Watch: The Future of AI" Award
.

May 2020
5/1/2020
Congrats
Song Han
on
2020 NSF CAREER Award
.

May 2019
5/1/2019
Congrats
Song Han
on
2019 MIT Technology Review list of 35 Innovators Under 35
.

May 2020
5/1/2020
Congrats
Song Han
on
2020 SONY Faculty Award
.

May 2017
5/1/2017
Congrats
Song Han
on
2017 SONY Faculty Award
.

May 2018
5/1/2018
Congrats
Song Han
on
2018 SONY Faculty Award
.

May 2018
5/1/2018
Congrats
Song Han
on
2018 Amazon Machine Learning Research Award
.

May 2019
5/1/2019
Congrats
Song Han
on
2019 Amazon Machine Learning Research Award
.

May 2019
5/1/2019
Congrats
Song Han
on
2019 Facebook Research Award
.

Aug 2022
8/1/2022
Congrats
on
the 2022 Qualcomm Innovation Fellowship
.

Aug 2022
8/1/2022
Congrats
Ji Lin
on
the 2022 Qualcomm Innovation Fellowship
.

Aug 2023
8/17/2023
Congrats
Zhijian Liu
on
2023 Rising Stars in ML and Systems
.

May 2021
5/1/2021
Congrats
Hanrui Wang
on
the 2021 Qualcomm Innovation Fellowship
.

May 2021
5/1/2021
Congrats
Han Cai
on
the 2021 Qualcomm Innovation Fellowship
.

May 2021
5/1/2021
Congrats
Zhijian Liu
on
the 2021 Qualcomm Innovation Fellowship
.

May 2020
5/1/2020
Congrats
Ji Lin
on
the 2020 Nvidia Graduate Fellowship Finalist
.

May 2021
5/1/2021
Congrats
Yujun Lin
on
the 2021 DAC Young Fellowship
.

May 2022
5/1/2022
Congrats
Hanrui Wang
on
2022 ACM Student Research Competition Award 1st Place
.

Aug 2022
8/24/2022
Congrats
Zhijian Liu
on
the 2022 MIT Ho-Ching and Han-Ching Fund Award
.

May 2021
5/1/2021
Congrats
Yujun Lin
on
the 2021 Qualcomm Innovation Fellowship
.

May 2020
5/1/2020
Congrats
Hanrui Wang
on
the 2020 Nvidia Graduate Fellowship Finalist
.

May 2020
5/1/2020
Congrats
Hanrui Wang
on
the 2021 Analog Devices Outstanding Student Designer Award
.

May 2020
5/1/2020
Congrats
Hanrui Wang
on
the 2020 DAC Young Fellowship
.

Aug 2018
8/24/2018
Congrats
Yujun Lin
on
the 2018 Robert J. Shillman Fellowship
.

Jun 2023
6/15/2023
Congrats
Song Han EIE Retrospective
team
on
Top 5 cited papers in 50 years of ISCA
of

.
EIE Retrospective

May 2017
5/15/2017
Congrats
Song Han
team
on
Best Paper Award
of
FPGA 2017

.

May 2016
5/15/2016
Congrats
Song Han
team
on
Best Paper Award
of
ICLR 2016

.

Jul 2023
7/15/2023
Congrats
Hanrui Wang SpAtten
team
on
the Best University Demo Award
of
DAC 2023

for “An Energy-Scalable Transformer Accelerator Supporting Adaptive Model Configuration and Word Elimination” in collaboration with Anantha Chandrakasan’s team
.
SpAtten

May 2023
5/3/2023
Congrats
Wei-Chen Wang
team
on
the 2023 NSF Athena AI Institute Best Poster Award rank #1
of

.

May 2022
5/3/2022
Congrats
Hanrui Wang
team
on
the 2022 NSF AI Institute Best Poster Award rank #1
of

.

Dec 2020
12/15/2020
Congrats
Hanrui Wang
team
on
the Young Fellow Best Presentation Award
of
DAC 2020

.

Oct 2021
10/1/2021
Congrats
Wei-Chen Wang
team
on
the Best Paper Award
of
IEEE NVMSA 2021

.

Oct 2019
10/1/2019
Congrats
Wei-Chen Wang
team
on
the Best Paper Award
of
ACM/IEEE CODES+ISSS 2019

.

Mar 2024
3/10/2024
A new blog post
Patch Conv: Patch Convolution to Avoid Large GPU Memory Usage of Conv2D
is published.
In this blog, we introduce Patch Conv to reduce memory footprint when generating high-resolution images. PatchConv significantly cuts down the memory usage by over 2.4× compared to existing PyTorch implementation. Code: https://github.com/mit-han-lab/patch_conv

Feb 2024
2/29/2024
A new blog post
DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
is published.
In this blog, we introduce DistriFusion, a training-free algorithm to harness multiple GPUs to accelerate diffusion model inference without sacrificing image quality. It can reduce SDXL latency by up to 6.1× on 8 A100s. Our work has been accepted by CVPR 2024 as a highlight.

Mar 2024
3/3/2024
A new blog post
TinyChat: Visual Language Models & Edge AI 2.0
is published.
Explore the latest advancement in TinyChat and AWQ – the integration of Visual Language Models (VLM) on the edge! The exciting advancements in VLM allows LLMs to comprehend visual inputs, enabling seamless image understanding tasks like caption generation, question answering, and more. With the latest release, TinyChat now supports leading VLMs such as VILA, which can be easily quantized with AWQ, empowering users with seamless experience for image understanding tasks.

Nov 2022
11/28/2022
A new blog post
On-Device Training Under 256KB Memory
is published.
In MCUNetV3, we enable on-device training under 256KB SRAM and 1MB Flash, using less than 1/1000 memory of PyTorch while matching the accuracy on the visual wake words application. It enables the model to adapt to newly collected sensor data and users can enjoy customized services without uploading the data to the cloud thus protecting privacy.

May 2020
5/22/2020
A new blog post
Efficiently Understanding Videos, Point Cloud and Natural Language on NVIDIA Jetson Xavier NX
is published.
Thanks to NVIDIA’s amazing deep learning eco-system, we are able to deploy three applications on Jetson Xavier NX soon after we receive the kit, including efficient video understanding with Temporal Shift Module (TSM, ICCV’19), efficient 3D deep learning with Point-Voxel CNN (PVCNN, NeurIPS’19), and efficient machine translation with hardware-aware transformer (HAT, ACL’20).

Jul 2020
7/2/2020
A new blog post
Auto Hardware-Aware Neural Network Specialization on ImageNet in Minutes
is published.
This tutorial introduces how to use the Once-for-All (OFA) Network to get specialized ImageNet models for the target hardware in minutes with only your laptop.

Jul 2020
7/3/2020
A new blog post
Reducing the carbon footprint of AI using the Once-for-All network
is published.
“The aim is smaller, greener neural networks,” says Song Han, an assistant professor in the Department of Electrical Engineering and Computer Science. “Searching efficient neural network architectures has until now had a huge carbon footprint. But we reduced that footprint by orders of magnitude with these new methods.”

Sep 2023
9/6/2023
A new blog post
TinyChat: Large Language Model on the Edge
is published.
Running large language models (LLMs) on the edge is of great importance. In this blog, we introduce TinyChat, an efficient and lightweight system for LLM deployment on the edge. It runs Meta's latest LLaMA-2 model at 30 tokens / second on NVIDIA Jetson Orin and can easily support different models and hardware.

Dec 2023
12/5/2023
Song Han
presented "
EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction
" at
Google
.
EfficientViT Video Slides Media Event

Dec 2023
12/4/2023
Song Han
presented "
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
" at
Apple
.
AWQ Video Slides Media Event

Oct 2023
10/18/2023
Song Han
presented "
TinyML: Enable Efficient Deep Learning on Mobile Devices
" at
2023 MIT AI Hardware Fall Research Update
.
On-Device Training Video Slides Media Event

Oct 2023
10/18/2023
Song Han
presented "
Efficient Large Language Model
" at
2023 MIT AI Hardware Fall Research Update
.
SmoothQuant Video Slides Media Event

Oct 2023
10/2/2023
Song Han
presented "
Efficient Vision Transformer
" at
the ICCV 2023 Workshop on Resource-Efficient Deep Learning for Computer Vision (RCV'23)
.
Video Slides Media Event

Oct 2023
10/2/2023
Song Han
presented "
Quantization for Foundation Models
" at
the ICCV 2023 Workshop on Low-Bit Quantized Neural Networks
.
Video Slides Media Event

Sep 2023
9/29/2023
Song Han
presented "
TinyChat for On-device LLM
" at
the IAP MIT Workshop on the Future of AI and Cloud Computing Applications and Infrastructure
.
Video Slides Media Event

Aug 2023
8/1/2023
Ji Lin
presented "
SmoothQuant, AWQ, TinyChat
" at
UC Berkeley SkyLab
.
Video Slides Media Event

Jun 2023
6/1/2023
Song Han
presented "
Efficient Deep Learning Computing with Sparsity
" at
CVPR Workshop on Efficient Computer Vision
.
Video Slides Media Event

Jun 2023
6/1/2023
Zhijian Liu
presented "
Efficient 3D Perception for Autonomous Vehicles
" at
CVPR Workshop on Efficient Computer Vision
.
Video Slides Media Event

Jun 2023
6/1/2023
Ji Lin
presented "
SmoothQuant, AWQ
" at
NVIDIA
.
AWQ Video Slides Media Event

Nov 2021
11/1/2021
Song Han
presented "
TinyML and Efficient Deep Learning for Automotive Applications
" at
Hyundai Motor Group Developers Conference
.
Video Slides Media Event

Nov 2021
11/1/2021
Song Han
presented "
Plenary: Putting AI on a Diet: TinyML and Efficient Deep Learning
" at
TinyML Technical Forum Asia
.
Video Slides Media Event

Oct 2021
10/1/2021
Song Han
presented "
Efficient Methods & Hardware for TinyML
" at
Sony Professor Lecture Series
.
Video Slides Media Event

Oct 2021
10/1/2021
Song Han
presented "
Computationally Efficient Large-Scale AI
" at
Microsoft Research Summit
.
Video Slides Media Event

Oct 2021
10/1/2021
Song Han
presented "
TinyML Techniques for Greener, Faster and Sustainable AI
" at
IBM IEEE CAS/EDS – AI Compute Symposium
.
Video Slides Media Event

Oct 2021
10/1/2021
Song Han
presented "
Challenges and Directions of Low-Power Computer Vision
" at
International Conference on Computer Vision (ICCV) Workshop Panel
.
Video Slides Media Event

Oct 2021
10/1/2021
Song Han
presented "
Today’s AI is Too Big
" at
Industry-Academia Partnership
.
Video Slides Media Event

Sep 2021
9/1/2021
Song Han
presented "
TinyML and Efficient Deep Learning
" at
Synopsys ARC Processor Summit
.
Video Slides Media Event

Sep 2021
9/1/2021
Song Han
presented "
One-For-All Network on FPGAs
" at
Xilinx Adaptive Computing Conference
.
Video Slides Media Event

Aug 2021
8/1/2021
Song Han
presented "
AutoML for Tiny Machine Learning
" at
AutoML Workshop at Knowledge Discovery and Data Mining (KDD) Conference
.
Video Slides Media Event

Aug 2021
8/1/2021
Song Han
presented "
Frontiers of AI Accelerators: Technologies, Circuits and Applications
" at
Hong Kong University of Science and Technology, AI Chip Center for Emerging Smart Systems
.
Video Slides Media Event

Aug 2021
8/1/2021
Song Han
presented "
Putting AI On A Diet: TinyML and Efficient Deep Learning
" at
Silicon Research Cooperation (SRC) AI Hardware E-Workshops
.
Video Slides Media Event

Aug 2021
8/1/2021
Song Han
presented "
TinyML and Efficient Deep Learning
" at
Machine Learning Summer School 2021 Taiwan
.
Video Slides Media Event

Jul 2021
7/1/2021
Song Han
presented "
TinyML and Efficient Deep Learning
" at
Alibaba
.
Video Slides Media Event

Jul 2021
7/1/2021
Song Han
presented "
MCUNet and Tiny Machine Learning for Mobile Devices
" at
Apple
.
Video Slides Media Event

Jun 2021
6/1/2021
Song Han
presented "
NAAS: Neural-Accelerator Architecture Search
" at
4th International Workshop on AI-assisted Design for Architecture at ISCA
.
Video Slides Media Event

Jun 2021
6/1/2021
Song Han
presented "
Machine Learning for Analog and Digital Design
" at
VLSI symposia workshop on AI/Machine Learning for Circuit Design and Optimization
.
Video Slides Media Event

Jun 2021
6/1/2021
Song Han
presented "
Putting AI on a Diet: TinyML and Efficient Deep Learning
" at
Efficient Deep Learning for Computer Vision Workshop at CVPR
.
Video Slides Media Event

Jun 2021
6/1/2021
Song Han
presented "
Putting AI on a Diet: TinyML and Efficient Deep Learning
" at
MLOps World – Machine Learning in Production
.
Video Slides Media Event

Jun 2021
6/1/2021
Song Han
presented "
Putting AI on a Diet: TinyML and Efficient Deep Learning
" at
Samsung
.
Video Slides Media Event

Jun 2021
6/1/2021
Song Han
presented "
Putting AI on a Diet: TinyML and Efficient Deep Learning
" at
Ford
.
Video Slides Media Event

Jun 2021
6/1/2021
Song Han
presented "
Putting AI on a Diet: TinyML and Efficient Deep Learning
" at
Princeton University
.
Video Slides Media Event

Jun 2021
6/1/2021
Song Han
presented "
Putting AI on a Diet: TinyML and Efficient Deep Learning
" at
Shanghai Jiaotong University
.
Video Slides Media Event

May 2021
5/1/2021
Song Han
presented "
Putting AI on a Diet: TinyML and Efficient Deep Learning
" at
Apple’s On-Device ML Workshop
.
Video Slides Media Event

Apr 2021
4/1/2021
Song Han
presented "
Putting AI on a Diet: TinyML and Efficient Deep Learning
" at
MLSys’21 On-Device Intelligence Workshop
.
Video Slides Media Event

Apr 2021
4/1/2021
Song Han
presented "
Putting AI on a Diet: TinyML and Efficient Deep Learning
" at
ISQED’21 Embedded Tutorials
.
Video Slides Media Event

Mar 2021
3/1/2021
Song Han
presented "
Putting AI on a Diet: TinyML and Efficient Deep Learning
" at
TinyML Summit
.
Video Slides Media Event

Jan 2021
1/1/2021
Song Han
presented "
Putting AI on a Diet: TinyML and Efficient Deep Learning
" at
Boeing
.
Video Slides Media Event

Jan 2021
1/1/2021
Song Han
presented "
Putting AI on a Diet: TinyML and Efficient Deep Learning
" at
Stanford MLSys seminar
.
Video Slides Media Event

Jan 2021
1/1/2021
Song Han
presented "
Putting AI on a Diet: TinyML and Efficient Deep Learning
" at
Microsoft
.
Video Slides Media Event

Jan 2021
1/1/2021
Song Han
presented "
Efficient AI: Reducing the Carbon Footprint of AI in the Internet of Things (IoT)
" at
MIT ILP Japan conference
.
Video Slides Media Event

Nov 2020
11/1/2020
Song Han
presented "
Putting AI on a Diet: TinyML and Efficient Deep Learning
" at
MIT ILP webinar session on low power/edge/efficient computing
.
Video Slides Media Event

Apr 2020
4/1/2020
Song Han
presented "
Once-for-All: Train One Network and Specialize it for Efficient Deployment
" at
TinyML Webinar
.
Video Slides Media Event

Apr 2020
4/1/2020
Song Han
presented "
AutoML for TinyML with Once-for-all-Network
" at
ICLR’20 NAS workshop
.
Video Slides Media Event

Mar 2020
3/1/2020
Song Han
presented "
Faster, Power-Efficient Video Recognition
" at
EmTech Digital
.
Video Slides Media Event

Feb 2024
2/13/2024
Our work
StreamingLLM
is covered by
MIT News, MIT Homepage
: "
A new way to let AI chatbots converse all day without crashing
".

Sep 2023
9/15/2023
Our work
EfficientViT
is covered by
marktechpost
: "
MIT Researchers Introduce A Novel Lightweight Multi-Scale Attention For On-Device Semantic Segmentation
".

Nov 2023
11/16/2023
Our work
PockEngine
is covered by
MIT News
: "
Technique enables AI on edge devices to keep learning over time
".

Oct 2023
10/5/2023
Our work
StreamingLLM
is covered by
VentureBeat
: "
StreamingLLM shows how one token can keep AI models running smoothly indefinitely
".

Oct 2022
10/4/2022
Our work
On-Device Training
is covered by
MIT News, MIT Homepage
: "
Learning on the edge
".

Dec 2021
12/8/2021
Our work
MCUNet-v2
is covered by
MIT News
: "
Tiny machine learning design alleviates a bottleneck in memory usage on internet-of-things devices
".

Dec 2020
12/13/2020
Our work
MCUNet
is covered by
WIRED
: "
AI Algorithms Are Slimming Down to Fit in Your Fridge
".

Nov 2020
11/13/2020
Our work
MCUNet
is covered by
MIT News, MIT Homepage
: "
System brings deep learning to “internet of things” devices
".

Sep 2023
9/13/2023
Our work
EfficientViT
is covered by
MIT News, MIT Homepage
: "
AI model speeds up high-resolution computer vision
".

Apr 2020
4/23/2020
Our work
OFA
is covered by
VentureBeat
: "
MIT aims for energy efficiency in AI model training
".

Jul 2021
7/13/2021
Our work
OFA
is covered by
Xilinx News
: "
Bringing OFA (Once-for-All) to FPGA
".

Jun 2020
6/8/2020
Our work
OFA
is covered by
Qualcomm News
: "
Research from MIT shows promising results for on-device AI
".

Apr 2020
4/23/2020
Our work
OFA
is covered by
MIT News
: "
Reducing the carbon footprint of artificial intelligence
".

Apr 2019
4/2/2019
Our work
ProxylessNAS
is covered by
IEEE Spectrum
: "
Using AI to Make Better AI New approach brings faster, AI-optimized AI within reach for image recognition and other applications
".

Mar 2019
3/21/2019
Our work
ProxylessNAS
is covered by
MIT News
: "
Kicking neural network design automation into high gear
".

Aug 2023
8/7/2023
Our work
SmoothQuant
is covered by
Intel News
: "
Smaller is Better: Q8-Chat LLM is an Efficient Generative AI Experience on Intel® Xeon® Processors
".

Mar 2020
3/25/2020
Our work
PVCNN
is covered by
NVIDIA News
: "
NVIDIA Jetson Community Project Spotlight: Point-Voxel CNN for Efficient 3D Deep Learning
".

Efficient Streaming Language Models with Attention Sinks

ICLR 2024

(

)

We enable LLMs to work on infinite-length texts without compromising efficiency and performance.

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

MLSys 2024

(

)

Low-bit weight-only quantization for LLMs.

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

ICLR 2024

(

)

LongLoRA takes advantage of shifted sparse attention to greatly reduce the finetuning cost of long context LLMs.

Tiny Machine Learning Projects

NeurIPS 2020/2021/2022, MICRO 2023, ICML 2023, MLSys 2024, IEEE CAS Magazine 2023

(

Feature

)

This TinyML project aims to enable efficient AI computing on the edge by innovating model compression techniques as well as high-performance system design.

Efficient AI Computing,
Transforming the Future.

Who We Are

Highlights

We Work On

News

Our Full-Stack Projects

Efficient Streaming Language Models with Attention Sinks

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

Tiny Machine Learning Projects

Our Impacts

Featured Videos

Latest Blog Posts

Patch Conv: Patch Convolution to Avoid Large GPU Memory Usage of Conv2D

TinyChat: Visual Language Models & Edge AI 2.0

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

TinyChat: Large Language Model on the Edge

On-Device Training Under 256KB Memory

Reducing the carbon footprint of AI using the Once-for-All network

Efficient AI Computing,Transforming the Future.

Who We Are

Highlights

We Work On

News

Our Full-Stack Projects

Efficient Streaming Language Models with Attention Sinks

Efficient Streaming Language Models with Attention Sinks

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

Tiny Machine Learning Projects

Tiny Machine Learning Projects

Our Impacts

Featured Videos

Latest Blog Posts

Patch Conv: Patch Convolution to Avoid Large GPU Memory Usage of Conv2D

TinyChat: Visual Language Models & Edge AI 2.0

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

TinyChat: Large Language Model on the Edge

On-Device Training Under 256KB Memory

Reducing the carbon footprint of AI using the Once-for-All network

Efficient AI Computing,
Transforming the Future.