Efficient AI Computing,
Transforming the Future.

Projects

To choose projects, simply check the boxes of the categories, topics and techniques.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

ICLR 2025
 (
)

A new W4A4 quantization paradigm for diffusion models.

COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training

ICLR 2025
 (
)

We propose COAT, a memory efficient FP8 training method for large language models.

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers

ICLR 2025
 (
)

SANA is an efficient linear DiT that can generate images up to 4096 × 4096. SANA delivers: 20x smaller & 100x faster than FLUX; Deployable on laptop GPU; Top-notch GenEval & DPGBench results.

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

ICLR 2025
 (
)

By selectively applying full attention to critical attention heads and using "Streaming Attention" on others, DuoAttention significantly reduces both pre-filling and decoding memory usage and latency for long-context LLMs, while maintaining their long-context capabilities.