Efficient AI Computing,
Transforming the Future.

Blog Posts

SVDQuant Meets NVFP4: 4× Smaller and 3× Faster FLUX with 16-bit Quality on NVIDIA Blackwell GPUs

Feb 21, 2025

SVDQuant supports NVFP4 on NVIDIA Blackwell GPUs with 3× speedup over BF16 and better image quality than INT4. Try our interactive demo below or at https://svdquant.mit.edu/! Our code is all available at https://github.com/mit-han-lab/nunchaku.

Read Article

RTX 5090 Workstation Configuration Journey

Feb 10, 2025

With the arrival of the RTX 5090, we built a high-performance workstation to maximize its AI computing potential. In this blog post, we share our experience—from overcoming setup challenges to testing its performance.

Read Article

TinyChat 2.0: Accelerating Edge AI with Efficient LLM and VLM Deployment

Dec 12, 2024

Explore the latest advancement in TinyChat – the 2.0 version with significant advancements in prefilling speed of Edge LLMs and VLMs. Apart from the 3-4x decoding speedups achieved with AWQ quantization, TinyChat 2.0 now delivers state-of-the-art Time-To-First-Token, which is 1.5-1.7x faster than the legacy version of TinyChat.

Read Article

SVDQuant: Accurate 4-Bit Quantization Powers 12B FLUX on a 16GB 4090 Laptop with 3x Speedup

Nov 7, 2024

A new post-training training quantization paradigm for diffusion models, which quantize both the weights and activations of FLUX.1 to 4 bits, achieving 3.5× memory and 8.7× latency reduction on a 16GB laptop 4090 GPU. Code: https://www.github.com/mit-han-lab/nunchaku

Read Article

Efficient AI Computing,
Transforming the Future.

Blog Posts

SVDQuant Meets NVFP4: 4× Smaller and 3× Faster FLUX with 16-bit Quality on NVIDIA Blackwell GPUs

RTX 5090 Workstation Configuration Journey

TinyChat 2.0: Accelerating Edge AI with Efficient LLM and VLM Deployment

SVDQuant: Accurate 4-Bit Quantization Powers 12B FLUX on a 16GB 4090 Laptop with 3x Speedup

Latest Posts

Categories

Topics

Techniques

Efficient AI Computing,Transforming the Future.

Blog Posts

SVDQuant Meets NVFP4: 4× Smaller and 3× Faster FLUX with 16-bit Quality on NVIDIA Blackwell GPUs

RTX 5090 Workstation Configuration Journey

TinyChat 2.0: Accelerating Edge AI with Efficient LLM and VLM Deployment

SVDQuant: Accurate 4-Bit Quantization Powers 12B FLUX on a 16GB 4090 Laptop with 3x Speedup

Latest Posts

Categories

Topics

Techniques

Efficient AI Computing,
Transforming the Future.