We propose SmoothQuant, a training-free, accuracy-preserving, and general-purpose post-training quantization (PTQ) solution to enable 8-bit weight, 8-bit activation (W8A8) quantization for LLMs.
An engine that selectively performs computations at the edited regions to accelerate image editing applications.
Anycost GAN generates consistent outputs under various, fine-grained computation budgets.
Differentiable augmentation to improve the data efficiency of GAN training.
A general-purpose compression framework for reducing the inference time and model size of the generator in conditional GANs.
In this blog, we introduce DistriFusion, a training-free algorithm to harness multiple GPUs to accelerate diffusion model inference without sacrificing image quality. It can reduce SDXL latency by up to 6.1× on 8 A100s. Our work has been accepted by CVPR 2024 as a highlight. Code: https://github.com/mit-han-lab/distrifusion