Projects

Transformer, as an alternative to CNN, has been proven effective in many modalities (e.g., texts and images). For 3D point cloud transformers, existing efforts focus primarily on pushing their accuracy to the state-of-the-art level. However, their latency lags behind sparse convolution-based models (3x slower), hindering their usage in resource-constrained, latency-sensitive applications (such as autonomous driving). This inefficiency comes from point clouds' sparse and irregular nature, whereas transformers are designed for dense, regular workloads. This paper presents FlatFormer to close this latency gap by trading spatial proximity for better computational regularity. We first flatten the point cloud with window-based sorting and partition points into groups of equal sizes rather than windows of equal shapes. This effectively avoids expensive structuring and padding overheads. We then apply self-attention within groups to extract local features, alternate sorting axis to gather features from different directions, and shift windows to exchange features across groups. FlatFormer delivers state-of-the-art accuracy on Waymo Open Dataset with 4.6x speedup over (transformer-based) SST and 1.4x speedup over (sparse convolutional) CenterPoint. This is the first point cloud transformer that achieves real-time performance on edge GPUs and is faster than sparse convolutional methods while achieving on-par or even superior accuracy on large-scale benchmarks.

FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer

CVPR 2023

(

)

We present FlatFormer, an efficient ViT architecture for large-scale point cloud analysis.

Retrospective: EIE: Efficient Inference Engine on Sparse and Compressed Neural Network

ISCA 2023

(

)

EIE proposed to accelerate pruned and compressed neural networks, exploiting weight sparsity, activation sparsity, and 4-bit weight-sharing in neural network accelerators.

BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation

ICRA 2023

(

)

BEVFusion unifies multi-modal features in the shared bird’s-eye view (BEV) representation space, which nicely preserves both geometric and semantic information. It establishes the new state of the art on nuScenes, achieving 1.3% higher mAP and NDS on 3D object detection and 13.6% higher mIoU on BEV map segmentation with 1.9x lower computation cost.

Quantum Computing has attracted much research attention because of its potential to achieve fundamental speed and efficiency improvements in various domains. Among different quantum algorithms, Parameterized Quantum Circuits (PQC) for Quantum Machine Learning (QML) show promises to realize quantum advantages on the current Noisy Intermediate-Scale Quantum (NISQ) Machines. Therefore, to facilitate the QML and PQC research, a recent python library called TorchQuantum has been released. It can construct, simulate, and train PQC for machine learning tasks with high speed and convenient debugging supports. Besides quantum for ML, we want to raise the community's attention on the reversed direction: ML for quantum. Specifically, the TorchQuantum library also supports using data-driven ML models to solve problems in quantum system research, such as predicting the impact of quantum noise on circuit fidelity and improving the quantum circuit compilation efficiency. This paper presents a case study of the ML for quantum part in TorchQuantum. Since estimating the noise impact on circuit reliability is an essential step toward understanding and mitigating noise, we propose to leverage classical ML to predict noise impact on circuit fidelity. Inspired by the natural graph representation of quantum circuits, we propose to leverage a graph transformer model to predict the noisy circuit fidelity. We firstly collect a large dataset with a variety of quantum circuits and obtain their fidelity on noisy simulators and real machines. Then we embed each circuit into a graph with gate and noise properties as node features, and adopt a graph transformer to predict the fidelity. We can avoid exponential classical simulation cost and efficiently estimate fidelity with polynomial complexity. Evaluated on 5 thousand random and algorithm circuits, the graph transformer predictor can provide accurate fidelity estimation with RMSE error 0.04 and outperform a simple neural network-based model by 0.02 on average. It can achieve 0.99 and 0.95 R2 scores for random and algorithm circuits, respectively. Compared with circuit simulators, the predictor has over 200× speedup for estimating the fidelity. The datasets and predictors can be accessed in the TorchQuantum library.

QuEst: Graph Transformer for Quantum Circuit Reliability Estimation

ICCAD 2022

(

oral

)

We develop graph transformer models to predict the fidelity of quantum circuits on real quantum devices.

Efficient AI Computing,Transforming the Future.

FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer

FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer

Retrospective: EIE: Efficient Inference Engine on Sparse and Compressed Neural Network

Retrospective: EIE: Efficient Inference Engine on Sparse and Compressed Neural Network

BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation

BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation

QuEst: Graph Transformer for Quantum Circuit Reliability Estimation

QuEst: Graph Transformer for Quantum Circuit Reliability Estimation

Efficient AI Computing,
Transforming the Future.