ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware

Neural architecture search (NAS) has a great impact by automatically designing effective neural network architectures. However, the prohibitive computational demand of conventional NAS algorithms (e.g. $10^4$ GPU hours) makes it difficult to directly search the architectures on large-scale tasks (e.g. ImageNet). Differentiable NAS can reduce the cost of GPU hours via a continuous representation of network architecture but suffers from the high GPU memory consumption issue (grow linearly w.r.t. candidate set size). As a result, they need to utilize proxy tasks, such as training on a smaller dataset, or learning with only a few blocks, or training only for a few epochs. These architectures optimized on proxy tasks are not guaranteed to be optimal on target task. In this paper, we present ProxylessNAS that can directly learn the architectures for large-scale target tasks and target hardware platforms. We address the high memory consumption issue of differentiable NAS and reduce the computational cost (CPU hours and GPU memory) to the same level of regular training while still allowing a large candidate set. Experiments on CIFAR-10 and ImageNet demonstrate the effectiveness of directness and specialization. On CIFAR-10, our model achieves 2.08$\%$ test error with only 5.7M parameters, better than the previous state-of-the-art architecture AmoebaNet-B, while using 6$\times$ fewer parameters. On ImageNet, our model achieves 3.1$\%$ better top-1 accuracy than MobileNetV2, while being 1.2$\times$ faster with measured GPU latency. We also apply ProxylessNAS to specialize neural architectures for hardware with direct hardware metrics (e.g. latency) and provide insights for efficient CNN architecture design.
Table 1: ProxylessNAS achieves state-of-the art accuracy (%) on ImageNet (under mobile latency constraint < 80ms) with 200 times less search cost in GPU hours. LL indicates latency regularization loss.
Table 2: ImageNet Accuracy (%) and GPU latency (Tesla V100) on ImageNet.
Figure 1: ProxylessNAS consistently outperforms MobileNetV2 under various latency settings.
Table 3: Hardware prefers specialized models. Models optimized for GPU does not run fast on CPU and mobile phone, vice versa. ProxylessNAS provides an efficient solution to search a specialized neural network architecture for a target hardware, while cutting down the search cost by 200 times.
Figure 2: GPU prefers shallow and wide model with early pooling; CPU prefers deep and narrow model with late pooling. Pooling layers prefer large and wide kernel. Early layers prefer small kernel. Late layers prefer large kernel. The visualization of history is attached below.
Video 1: The history of neural architectures during the searching.
Please cite our work if you find this helps your research.
@inproceedings{ cai2018proxylessnas, title={Proxyless{NAS}: Direct Neural Architecture Search on Target Task and Hardware}, author={Han Cai and Ligeng Zhu and Song Han}, booktitle={International Conference on Learning Representations}, year={2019}, url={https://arxiv.org/pdf/1812.00332.pdf}, }