NAAS: Neural Accelerator Architecture Search

Data-driven, automatic design space exploration of neural accelerator architecture is desirable for specialization and productivity. Previous frameworks focus on sizing the numerical architectural hyper-parameters while neglect searching the PE connectivities and compiler mappings. We push beyond searching only hardware hyper-parameters and propose the Neural Accelerator Architecture Search (NAAS), which fully exploits the hardware design space and compiler mapping strategies at the same time. Unlike prior work which formulate the hardware parameter search as a pure sizing optimization, NAAS models the co-search as a two-level optimization problem, where each level is a combination of indexing, ordering and sizing optimization. To tackle such challenges, we propose an encoding method which is able to encode the non-numerical parameters such as loop order and parallel dimension chosen as numerical parameters for optimization. Thanks to the low search cost, NAAS can be easily integrated with hardware-aware NAS algorithm by adding another optimization level, achieving the joint searching for neural network architecture, accelerator architecture and compiler mapping. Thus NAAS composes highly matched architectures together with efficient mapping. As a data-driven approach, NAAS rivals the human design Eyeriss by 4.4x EDP reduction with 2.7% accuracy improvement on ImageNet under the same computation resource, and offers 1.4x to 3.5x EDP reduction than only sizing the architectural hyper-parameters.

(N is NVDLA and E is Eyeriss)

Hardware encoding vector contains two parts: architecture sizing and connectivity parameters, and the mapping encoding vector contains multiple parts, including loop orders for PE level and loop tiling for each array dimension level.

This strategy is interpretable, since the importance value represents the data locality of the dimension: the dimension labeled as most important has the best data locality since it is the outermost loop, while the dimension labeled as least important has the poorest data locality therefore it is the innermost loop.

@inproceedings{ lin2020naas, title={{NAAS: Neural Accelerator Architecture Search}}, author={Lin, Yujun and Yang, Mengtian and Han, Song}, booktitle={2021 58th ACM/ESDA/IEEE Design Automation Conference (DAC)}, year={2021} }

**Acknowledgments**: This work was supported by SRC GRC program under task 2944.001. We also
thank AWS Machine Learning Research Awards for the computational resource.