Efficient AI Computing,
Transforming the Future.

Projects

To choose projects, simply check the boxes of the categories, topics and techniques.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

TSM: Temporal Shift Module for Efficient Video Understanding

ICCV 2019
 (
)

We introduce the Temporal Shift Module (TSM), a novel solution for efficient video understanding. TSM combines the performance of 3D CNNs with the computational simplicity of 2D CNNs, enabling real-time online video recognition and object detection.

HAQ: Hardware-Aware Automated Quantization

CVPR 2019
 (
)

In this paper, we introduce the Hardware-Aware Automated Quantization (HAQ) framework which leverages the reinforcement learning to automatically determine the quantization policy, and we take the hardware accelerator's feedback in the design loop.

ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware

ICLR 2019
 (
)

ProxylessNAS is an efficient hardware-aware neural architecture search method, which can directly search on large-scale datasets. It can design specialized neural network architecture for different hardware platforms. With >74.5% top-1 accuracy, the latency of ProxylessNAS is 1.8x faster than MobileNetV2.

Deep Gradient Compression: Reducing the Communication Bandwidth in Distributed Training

ICLR 2018
 (
)

Deep Gradient Compression (DGC) reduces the communication bandwidth in the large-scale distributed training via four techniques: momentum correction, local gradient clipping, momentum factor masking, and warm-up training.