Conditional Generative Adversarial Networks (cGANs) have enabled controllable image synthesis for many computer vision and graphics applications. However, recent cGANs are 1-2 orders of magnitude more computationally-intensive than modern recognition CNNs. For example, GauGAN consumes 281G MACs per image, compared to 0.44G MACs for MobileNet-v3, making it difficult for interactive deployment. In this work, we propose a general-purpose compression framework for reducing the inference time and model size of the generator in cGANs. Directly applying existing CNNs compression methods yields poor performance due to the difficulty of GAN training and the differences in generator architectures. We address these challenges in two ways. First, to stabilize the GAN training, we transfer knowledge of multiple intermediate representations of the original model to its compressed model, and unify unpaired and paired learning. Second, instead of reusing existing CNN designs, our method automatically finds efficient architectures via neural architecture search (NAS). To accelerate the search process, we decouple the model training and architecture search via weight sharing. Experiments demonstrate the effectiveness of our method across different supervision settings (paired and unpaired), model architectures, and learning methods (e.g., pix2pix, GauGAN, CycleGAN). Without losing image quality, we reduce the computation of CycleGAN by more than 20× and GauGAN by 9×, paving the way for interactive image synthesis.
GAN Compression framework: ① Given a pre-trained teacher generator G', we distill a smaller "once-for-all" student generator G that contains all possible channel numbers through weight sharing. We choose different channel numbers for the student generator G at each training step. ② We then extract many sub-generators from the "once-for-all" generator and evaluate their performance. No retraining is needed, which is the advantage of the "once-for-all" generator. ③ Finally, we choose the best sub-generator given the compression ratio target and performance target (FID or mAP), perform fine-tuning, and obtain the final compressed model.
GAN Compression reduces the computation of pix2pix, cycleGAN and GauGAN by 9-21×, and model size by 4.6-33×.
@inproceedings{li2020gan,
title={GAN Compression: Efficient Architectures for Interactive Conditional GANs},
author={Li, Muyang and Lin, Ji and Ding, Yaoyao and Liu, Zhijian and Zhu, Jun-Yan and Han, Song},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2020}
}
We thank NSF Career Award #1943349, MIT-IBM Watson AI Lab, Adobe, Intel, Samsung and AWS machine learning research award for supporting this research. We thank Ning Xu, Zhuang Liu, Richard Zhang, and Antonio Torralba for helpful comments. We thank NVIDIA for donating the Jetson AGX Xavier that is used in our demo.