Projects

Sparse Refinement for Efficient High-Resolution Semantic Segmentation

ECCV 2024

Zhijian Liu*, Zhuoyang Zhang*, Samir Khaki, Shang Yang, Haotian Tang, Chenfeng Xu, Kurt Keutzer, Song Han

Semantic segmentation empowers numerous real-world applications, such as autonomous driving and augmented/mixed reality. These applications often operate on high-resolution images (e.g., 8 megapixels) to capture the fine details. However, this comes at the cost of considerable computational complexity, hindering the deployment in latency-sensitive scenarios. In this paper, we introduce SparseRefine, a novel approach that enhances dense low-resolution predictions with sparse high-resolution refinements. Based on coarse low-resolution outputs, SparseRefine first uses an entropy selector to identify a sparse set of pixels with high entropy. It then employs a sparse feature extractor to efficiently generate the refinements for those pixels of interest. Finally, it leverages a gated ensembler to apply these sparse refinements to the initial coarse predictions. SparseRefine can be seamlessly integrated into any existing semantic segmentation model, regardless of CNN- or ViT-based. SparseRefine achieves significant speedup: 1.5 to 3.7 times when applied to HRNet-W48, SegFormer-B5, Mask2Former-T/L and SegNeXt-L on Cityscapes, with negligible to no loss of accuracy. Our "dense+sparse'' paradigm paves the way for efficient high-resolution visual computing.

More Close

Sparse Refinement for Efficient High-Resolution Semantic Segmentation

ECCV 2024

(

)

SparseRefine is a novel approach that enhances dense low-resolution predictions with sparse high-resolution refinements. It achieves significant speedup: 1.5 to 3.7 times when applied to HRNet-W48, SegFormer-B5, Mask2Former-T/L and SegNeXt-L on Cityscapes, with negligible to no loss of accuracy.

FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention

International Journal of Computer Vision 2024

Guangxuan Xiao*¹, Tianwei Yin*¹, William T. Freeman¹, Frédo Durand¹, Song Han¹

Diffusion models excel at text-to-image generation, especially in subject-driven generation for personalized images. However, existing methods are inefficient due to the subject-specific fine-tuning, which is computationally intensive and hampers efficient deployment. Moreover, existing methods struggle with multi-subject generation as they often blend features among subjects. We present FastComposer which enables efficient, personalized, multi-subject text-to-image generation without fine-tuning. FastComposer uses subject embeddings extracted by an image encoder to augment the generic text conditioning in diffusion models, enabling personalized image generation based on subject images and textual instructions with only forward passes. To address the identity blending problem in the multi-subject generation, FastComposer proposes cross-attention localization supervision during training, enforcing the attention of reference subjects localized to the correct regions in the target images. Naively conditioning on subject embeddings results in subject overfitting. FastComposer proposes delayed subject conditioning in the denoising step to maintain both identity and editability in subject-driven image generation. FastComposer generates images of multiple unseen individuals with different styles, actions, and contexts. It achieves 300x-2500x speedup compared to fine-tuning-based methods and requires zero extra storage for new subjects. FastComposer paves the way for efficient, personalized, and high-quality multi-subject image creation.

More Close

FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention

International Journal of Computer Vision 2024

(

)

We present FastComposer which enables efficient, personalized, multi-subject text-to-image generation without fine-tuning.

Atomique: A Quantum Compiler for Reconfigurable Neutral Atom Arrays

ISCA 2024

Hanrui Wang, Pengyu Liu, Daniel Bochen Tan, Yilian Liu, Jiaqi Gu, David Z. Pan, Jason Cong, Umut A. Acar, Song Han

The neutral atom array has gained prominence in quantum computing for its scalability and operation fidelity. Previous works focus on fixed atom arrays (FAAs) that require extensive SWAP operations for long-range interactions. This work explores a novel architecture reconfigurable atom arrays (RAAs), also known as field programmable qubit arrays (FPQAs), which allows for coherent atom movements during circuit execution under some constraints. Such atom movements, which are unique to this architecture, could reduce the cost of long-range interactions significantly if the atom movements could be scheduled strategically. In this work, we introduce Atomique, a compilation framework designed for qubit mapping, atom movement, and gate scheduling for RAA. Atomique contains a qubit-array mapper to decide the coarse-grained mapping of the qubits to arrays, leveraging MAX k-Cut on a constructed gate frequency graph to minimize SWAP overhead. Subsequently, a qubit-atom mapper determines the fine-grained mapping of qubits to specific atoms in the array and considers load balance to prevent hardware constraint violations. We further propose a router that identifies parallel gates, schedules them simultaneously, and reduces depth. We evaluate Atomique across 20+ diverse benchmarks, including generic circuits (arbitrary, QASMBench, SupermarQ), quantum simulation, and QAOA circuits. Atomique consistently outperforms IBM Superconducting, FAA with long-range gates, and FAA with rectangular and triangular topologies, achieving significant reductions in depth and the number of two-qubit gates.

More Close

Atomique: A Quantum Compiler for Reconfigurable Neutral Atom Arrays

ISCA 2024

(

oral

)

We develop a new compiler for the emerging reconfigurable neutral atom array (FPQA) device.

Q-Pilot: Field Programmable Qubit Array Compilation with Flying Ancillas

DAC 2024

Hanrui Wang, Daniel Bochen Tan, Pengyu Liu, Yilian Liu, Jiaqi Gu, Jason Cong, Song Han

Neutral atom arrays have become a promising platform for quantum computing, especially the field programmable qubit array (FPQA) endowed with the unique capability of atom movement. This feature allows dynamic alterations in qubit connectivity during runtime, which can reduce the cost of executing long-range gates and improve parallelism. However, this added flexibility introduces new challenges in circuit compilation. Inspired by the placement and routing strategies for FPGAs, we propose to map all data qubits to fixed atoms while utilizing movable atoms to route for 2-qubit gates between data qubits. Coined flying ancillas, these mobile atoms function as ancilla qubits, dynamically generated and recycled during execution. We present Q-Pilot, a scalable compiler for FPQA employing flying ancillas to maximize circuit parallelism. For two important quantum applications, quantum simulation and the Quantum Approximate Optimization Algorithm (QAOA), we devise domain-specific routing strategies. In comparison to alternative technologies such as superconducting devices or fixed atom arrays, Q-Pilot effectively harnesses the flexibility of FPQA, achieving reductions of 1.4x, 27.7x, and 6.3x in circuit depth for 100-qubit random, quantum simulation, and QAOA circuits, respectively.

More Close

Q-Pilot: Field Programmable Qubit Array Compilation with Flying Ancillas

DAC 2024

(

oral

)

We develop a compiler for emerging reconfigurable neutral atom array quantum hardware, with ancilla qubits.

Efficient AI Computing,Transforming the Future.

Sparse Refinement for Efficient High-Resolution Semantic Segmentation

Sparse Refinement for Efficient High-Resolution Semantic Segmentation

FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention

FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention

Atomique: A Quantum Compiler for Reconfigurable Neutral Atom Arrays

Atomique: A Quantum Compiler for Reconfigurable Neutral Atom Arrays

Q-Pilot: Field Programmable Qubit Array Compilation with Flying Ancillas

Q-Pilot: Field Programmable Qubit Array Compilation with Flying Ancillas

Efficient AI Computing,
Transforming the Future.