Efficient AI Computing,
Transforming the Future.

Training

Projects

TinyTL: Reduce Activations, Not Trainable Parameters for Efficient On-Device Learning

NeurIPS 2020

Tiny-Transfer-Learning (TinyTL) provides memory-efficient on-device learning by freezing the weights while only learns the bias modules to get rid of the intermediate activations, and introducing the lite residual module to maintain the adaptation capacity.

Deep Gradient Compression: Reducing the Communication Bandwidth in Distributed Training

ICLR 2018

(

)

Deep Gradient Compression (DGC) reduces the communication bandwidth in the large-scale distributed training via four techniques: momentum correction, local gradient clipping, momentum factor masking, and warm-up training.

Blog Posts

On-Device Training Under 256KB Memory

November 28, 2022

In MCUNetV3, we enable on-device training under 256KB SRAM and 1MB Flash, using less than 1/1000 memory of PyTorch while matching the accuracy on the visual wake words application. It enables the model to adapt to newly collected sensor data and users can enjoy customized services without uploading the data to the cloud thus protecting privacy.