Efficient AI Computing,
Transforming the Future.

Projects

To choose projects, simply check the boxes of the categories, topics and techniques.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

SpArch: Efficient Architecture for Sparse Matrix Multiplication

HPCA 2020
 (
)

Hardware Accelerator for Sparse Matrix-Matrix Multiplication (SpGEMM)

Lite Transformer with Long-Short Range Attention

ICLR 2020
 (
)

Lite Transformer is an efficient mobile NLP architecture. The key primitive is the Long-Short Range Attention (LSRA), where one group of heads specializes in the local context modeling (by convolution) while another group specializes in the long-distance relationship modeling (by attention).

Once-for-All: Train One Network and Specialize it for Efficient Deployment

ICLR 2020
 (
)

OFA is an efficient AutoML technique that decouples model training from architecture search. Train only once, specialize for many hardware platforms, from CPU/GPU to hardware accelerators. OFA achieves a new SOTA 80.0% ImageNet top1 accuracy under the mobile setting (<600M FLOPs).

HAT: Hardware-Aware Transformers for Efficient Natural Language Processing

ACL 2020
 (
)

HAT NAS framework leverages the hardware feedback in the neural architecture search loop, providing a most suitable model for the target hardware platform. The results on different hardware platforms and datasets show that HAT searched models have better accuracy-efficiency trade-offs.