TorchSparse++ is a high-performance computing library for efficient 3D sparse convolution. It offers significant performance improvement over TorchSparse++ by overlapping computation with memory access. It also searches for the best execution strategy for sparse workloads within a large design space through auto-tuning.
EfficientViT is a new family of vision models for high-resolution dense prediction. It achieves global receptive field and multi-scale learning with only hardware-efficient operations. EfficientViT delivers remarkable performance gains over previous models with speedup on diverse hardware platforms, including mobile CPU, edge GPU, and cloud GPU.
We present FlatFormer, an efficient ViT architecture for large-scale point cloud analysis.
BEVFusion unifies multi-modal features in the shared bird’s-eye view (BEV) representation space, which nicely preserves both geometric and semantic information. It establishes the new state of the art on nuScenes, achieving 1.3% higher mAP and NDS on 3D object detection and 13.6% higher mIoU on BEV map segmentation with 1.9x lower computation cost.
TorchSparse is a high-performance computing library for efficient 3D sparse convolution. This library aims at accelerating sparse computation in 3D, in particular the Sparse Convolution operation.
PointAcc is a novel point cloud deep learning accelerator. It introduces a configurable sorting-based mapping unit that efficiently supports diverse operations in point cloud networks. PointAcc further exploits simplified caching and layer fusion specialized for point cloud models, effectively reducing the DRAM access.
Thanks to NVIDIA’s amazing deep learning eco-system, we are able to deploy three applications on Jetson Xavier NX soon after we receive the kit, including efficient video understanding with Temporal Shift Module (TSM, ICCV’19), efficient 3D deep learning with Point-Voxel CNN (PVCNN, NeurIPS’19), and efficient machine translation with hardware-aware transformer (HAT, ACL’20).