Projects

Conditional Generative Adversarial Networks (cGANs) have enabled controllable image synthesis for many computer vision and graphics applications. However, recent cGANs are 1-2 orders of magnitude more computationally-intensive than modern recognition CNNs. For example, GauGAN consumes 281G MACs per image, compared to 0.44G MACs for MobileNet-v3, making it difficult for interactive deployment. In this work, we propose a general-purpose compression framework for reducing the inference time and model size of the generator in cGANs. Directly applying existing CNNs compression methods yields poor performance due to the difficulty of GAN training and the differences in generator architectures. We address these challenges in two ways. First, to stabilize the GAN training, we transfer knowledge of multiple intermediate representations of the original model to its compressed model, and unify unpaired and paired learning. Second, instead of reusing existing CNN designs, our method automatically finds efficient architectures via neural architecture search (NAS). To accelerate the search process, we decouple the model training and architecture search via weight sharing. Experiments demonstrate the effectiveness of our method across different supervision settings (paired and unpaired), model architectures, and learning methods (e.g., pix2pix, GauGAN, CycleGAN). Without losing image quality, we reduce the computation of CycleGAN by more than 20× and GauGAN by 9×, paving the way for interactive image synthesis.

GAN Compression: Efficient Architectures for Interactive Conditional GANs

CVPR 2020 & TPAMI

(

)

A general-purpose compression framework for reducing the inference time and model size of the generator in conditional GANs.

SpArch: Efficient Architecture for Sparse Matrix Multiplication

HPCA 2020

(

)

Hardware Accelerator for Sparse Matrix-Matrix Multiplication (SpGEMM)

Lite Transformer with Long-Short Range Attention

ICLR 2020

(

)

Lite Transformer is an efficient mobile NLP architecture. The key primitive is the Long-Short Range Attention (LSRA), where one group of heads specializes in the local context modeling (by convolution) while another group specializes in the long-distance relationship modeling (by attention).

Once-for-All: Train One Network and Specialize it for Efficient Deployment

ICLR 2020

(

)

OFA is an efficient AutoML technique that decouples model training from architecture search. Train only once, specialize for many hardware platforms, from CPU/GPU to hardware accelerators. OFA achieves a new SOTA 80.0% ImageNet top1 accuracy under the mobile setting (<600M FLOPs).

Efficient AI Computing,Transforming the Future.

GAN Compression: Efficient Architectures for Interactive Conditional GANs

GAN Compression: Efficient Architectures for Interactive Conditional GANs

SpArch: Efficient Architecture for Sparse Matrix Multiplication

SpArch: Efficient Architecture for Sparse Matrix Multiplication

Lite Transformer with Long-Short Range Attention

Lite Transformer with Long-Short Range Attention

Once-for-All: Train One Network and Specialize it for Efficient Deployment

Once-for-All: Train One Network and Specialize it for Efficient Deployment

Efficient AI Computing,
Transforming the Future.