Efficient AI Computing,
Transforming the Future.

Auto Hardware-Aware Neural Network Specialization on ImageNet in Minutes

tl;dr

This tutorial introduces how to use the Once-for-All (OFA) Network to get specialized ImageNet models for the target hardware in minutes with only your laptop.

Related Projects

Designing specialized neural network architectures for different target hardware platforms and efficiency constraints is an essential step for the deployment phase of deep learning models in production. Compared to general-purpose neural network architectures (like ResNet-50, MobileNets),specialized neural network architectures can better fit the target hardware and efficient constraint, thereby provide a better trade-off between accuracy and inference efficiency.

However, specializing neural network architectures used to be very expensive, especially for large-scale real-world datasets (e.g., ImageNet). Firstly, searching for a suitable neural architecture requires to evaluate the accuracy performances of many neural architectures on the target dataset, as well as their inference efficiency on the target hardware. Second, the searched neural architecture needs to be trained from scratch, which can take hundreds of GPU-hours.

This tutorial introduces how to use OFA Network to get specialized ImageNet models for the target hardware in minutes with only your laptop. Once-for-All (OFA) is an efficient AutoML technique that decouples training from search. Different sub-nets can directly grab weights from the OFA network without training. Therefore, getting a new specialized neural network with the OFA network is highly efficient, incurring little computation cost.

Besides the OFA Network, the other key components for very fast neural network specialization are accuracy predictors and efficiency predictors. For the accuracy predictor, it predicts the Top-1 accuracy of a given sub-network on a holdout validation set (different from the official 50K validation set) so that we do NOT need to run very costly inferences on ImageNet while searching for specialized models. Such an accuracy predictor is trained using an accuracy dataset built with the OFA network.

Using Samsung Note10 as an example,we load the pre-built accuracy predictor and latency lookup table on Note10:

Then, we can search for specialized ImageNet models for Samsung Note10 without requirements of ImageNet and Samsung Note10:

Finally, we can check the accuracy of the searched model on real ImageNet validation set without training:

As you can see, we get a specialized ImageNet model that can clearly outperform MobileNetV3 within minutes. Furthermore, the accuracy of this searched model can be significantly improved by a few epochs of fine-tuning on ImageNet. Similarly, we can repeat the above process for other efficiency constraints (e.g., FLOPs), which also only requires a few minutes to complete: