Efficient AI Computing,
Transforming the Future.

Projects

To choose projects, simply check the boxes of the categories, topics and techniques.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Sparse Refinement for Efficient High-Resolution Semantic Segmentation

ECCV 2024
 (
)

SparseRefine is a novel approach that enhances dense low-resolution predictions with sparse high-resolution refinements. It achieves significant speedup: 1.5 to 3.7 times when applied to HRNet-W48, SegFormer-B5, Mask2Former-T/L and SegNeXt-L on Cityscapes, with negligible to no loss of accuracy.

FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention

International Journal of Computer Vision 2024
 (
)

We present FastComposer which enables efficient, personalized, multi-subject text-to-image generation without fine-tuning.

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

ArXiv 2024
 (
)

We introduce QoQ, a W4A8KV4 quantization algorithm with 4-bit weight, 8-bit activation, and 4-bit KV cache, and implement QServe inference library that improves the maximum achievable serving throughput of Llama-3-8B by 1.2× on A100, 1.4× on L40S; and Qwen1.5-72B by 2.4× on A100, 3.5× on L40S, surpassing the leading industry solution TensorRT-LLM.

Atomique: A Quantum Compiler for Reconfigurable Neutral Atom Arrays

ISCA 2024
 (
oral
)

We develop a new compiler for the emerging reconfigurable neutral atom array (FPQA) device.