Quantization

Projects

Tiny Machine Learning Projects

NeurIPS 2020/2021/2022, MICRO 2023, ICML 2023, MLSys 2024, IEEE CAS Magazine 2023

This TinyML project aims to enable efficient AI computing on the edge by innovating model compression techniques as well as high-performance system design.

Tiny Machine Learning: Progress and Futures [Feature]

We discuss the definition, challenges, and applications of TinyML.

SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

ICML 2023

(

)

We propose SmoothQuant, a training-free, accuracy-preserving, and general-purpose post-training quantization (PTQ) solution to enable 8-bit weight, 8-bit activation (W8A8) quantization for LLMs.

Retrospective: EIE: Efficient Inference Engine on Sparse and Compressed Neural Network

ISCA 2023

(

)

EIE proposed to accelerate pruned and compressed neural networks, exploiting weight sparsity, activation sparsity, and 4-bit weight-sharing in neural network accelerators.

On-Device Training Under 256KB Memory

NeurIPS 2022

(

)

In MCUNetV3, we enable on-device training under 256KB memory, using less than 1/1000 memory of PyTorch while matching the accuracy on the visual wake words application using system-algorithm co-design.

MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning

NeurIPS 2021

(

)

In MCUNetV2, we propose a generic patch-by-patch inference scheduling, which operates only on a small spatial region of the feature map and significantly cuts down the peak memory. We further propose network redistribution to shift the receptive field and FLOPs to the later stage and reduce the computation overhead.

Blog Posts

On-Device Training Under 256KB Memory

November 28, 2022

In MCUNetV3, we enable on-device training under 256KB SRAM and 1MB Flash, using less than 1/1000 memory of PyTorch while matching the accuracy on the visual wake words application. It enables the model to adapt to newly collected sensor data and users can enjoy customized services without uploading the data to the cloud thus protecting privacy.