Efficient AI Computing,
Transforming the Future.

Projects

To choose projects, simply check the boxes of the categories, topics and techniques.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer

ICCV 2025
 (
)

DC-AR is a high-efficiency masked AR framework for text-to-image generation, leveraging DC-HT—a hybrid tokenizer enabling 32x compression. It refines images via residual tokens, achieving remarkable results with 1.5–7.9x faster throughput and 2–3.5x lower latency than other leading models.

XAttention: Block Sparse Attention with Antidiagonal Scoring

ICML 2025
 (
)

A plug-and-play method that uses antidiagonal sums to efficiently identify important parts of the attention matrix, achieving up to 13.5x speedup on long-context tasks with comparable accuracy to full attention.

Radial Attention: O(nlogn) Sparse Attention with Energy Decay for Long Video Generation

ArXiv 2025
 (
)

A O(nlogn) Sparse Attention Mask for Long Video Generation

LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

MLSys 2025
 (
)

LServe accelerates long-sequence LLM serving with unified sparse attention for both prefilling and decoding, achieving up to 3.3× speedup over state-of-the-art solution without sacrificing accuracy.