Efficient AI Computing,
Transforming the Future.

Projects

To choose projects, simply check the boxes of the categories, topics and techniques.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

ICLR 2025
 (
)

By selectively applying full attention to critical attention heads and using "Streaming Attention" on others, DuoAttention significantly reduces both pre-filling and decoding memory usage and latency for long-context LLMs, while maintaining their long-context capabilities.

Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models

ICLR 2025
 (
)

A new family of high-spatial compression autoencoders for accelerating high-resolution diffusion models.

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

ICLR 2025
 (
)

LongVILA is a full-stack solution for long VLM, which incorporates novel training strategies, dataset, and the Multi-Modal Sequence Parallelism (MM-SP) system to efficiently handle long video understanding, achieving significant scalability, accuracy, and speed improvements on multi-modal benchmarks.

VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation

ICLR 2025
 (
)

VILA-U is a Unified foundation model that integrates Video, Image, Language understanding and generation.