Projects

Data-driven, automatic design space exploration of neural accelerator architecture is desirable for specialization and productivity. Previous frameworks focus on sizing the numerical architectural hyper-parameters while neglect searching the PE connectivities and compiler mappings. We push beyond searching only hardware hyper-parameters and propose the Neural Accelerator Architecture Search (NAAS), which fully exploits the hardware design space and compiler mapping strategies at the same time. Unlike prior work which formulate the hardware parameter search as a pure sizing optimization, NAAS models the co-search as a two-level optimization problem, where each level is a combination of indexing, ordering and sizing optimization. To tackle such challenges, we propose an encoding method which is able to encode the non-numerical parameters such as loop order and parallel dimension chosen as numerical parameters for optimization. Thanks to the low search cost, NAAS can be easily integrated with hardware-aware NAS algorithm by adding another optimization level, achieving the joint searching for neural network architecture, accelerator architecture and compiler mapping. Thus NAAS composes highly matched architectures together with efficient mapping. As a data-driven approach, NAAS rivals the human design Eyeriss by 4.4x EDP reduction with 2.7% accuracy improvement on ImageNet under the same computation resource, and offers 1.4x to 3.5x EDP reduction than only sizing the architectural hyper-parameters.

NAAS: Neural Accelerator Architecture Search

DAC 2021

(

)

As a data-driven approach, NAAS holistically composes highly matched accelerator and neural architectures together with efficient compiler mapping.

Delayed Gradient Averaging: Tolerate the Communication Latency in Federated Learning

NeurIPS 2021

(

)

We propose Delayed Gradient Averaging (DGA), which delays the averaging step to improve efficiency and allows local computation in parallel to communication.

MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning

NeurIPS 2021

(

)

In MCUNetV2, we propose a generic patch-by-patch inference scheduling, which operates only on a small spatial region of the feature map and significantly cuts down the peak memory. We further propose network redistribution to shift the receptive field and FLOPs to the later stage and reduce the computation overhead.

PointAcc: Efficient Point Cloud Accelerator

MICRO 2021

(

)

PointAcc is a novel point cloud deep learning accelerator. It introduces a configurable sorting-based mapping unit that efficiently supports diverse operations in point cloud networks. PointAcc further exploits simplified caching and layer fusion specialized for point cloud models, effectively reducing the DRAM access.

Efficient AI Computing,Transforming the Future.

NAAS: Neural Accelerator Architecture Search

NAAS: Neural Accelerator Architecture Search

Delayed Gradient Averaging: Tolerate the Communication Latency in Federated Learning

Delayed Gradient Averaging: Tolerate the Communication Latency in Federated Learning

MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning

MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning

PointAcc: Efficient Point Cloud Accelerator

PointAcc: Efficient Point Cloud Accelerator

Efficient AI Computing,
Transforming the Future.