Generative AI

Projects

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

CVPR 2024

A training-free algorithm to harness multiple GPUs to accelerate diffusion model inference without sacrificing image quality.

VILA: On Pre-training for Visual Language Models

CVPR 2024

(

)

VILA is a visual language model (VLM) pre-trained with interleaved image-text data at scale, enabling multi-image VLM. VILA is deployable on the edge.

Condition-Aware Neural Network for Controlled Image Generation

CVPR 2024

(

)

A new conditional control method for diffusion models by dynamically adapting their weight.

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

ICLR 2024

(

)

LongLoRA takes advantage of shifted sparse attention to greatly reduce the finetuning cost of long context LLMs.

Tiny Machine Learning Projects

NeurIPS 2020/2021/2022, MICRO 2023, ICML 2023, MLSys 2024, IEEE CAS Magazine 2023

(

Feature

)

This TinyML project aims to enable efficient AI computing on the edge by innovating model compression techniques as well as high-performance system design.

EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction

ICCV 2023

(

)

EfficientViT is a new family of vision models for high-resolution dense prediction. It achieves global receptive field and multi-scale learning with only hardware-efficient operations. EfficientViT delivers remarkable performance gains over previous models with speedup on diverse hardware platforms, including mobile CPU, edge GPU, and cloud GPU.

Blog Posts

Patch Conv: Patch Convolution to Avoid Large GPU Memory Usage of Conv2D

March 10, 2024

In this blog, we introduce Patch Conv to reduce memory footprint when generating high-resolution images. PatchConv significantly cuts down the memory usage by over 2.4× compared to existing PyTorch implementation. Code: https://github.com/mit-han-lab/patch_conv

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

February 29, 2024

In this blog, we introduce DistriFusion, a training-free algorithm to harness multiple GPUs to accelerate diffusion model inference without sacrificing image quality. It can reduce SDXL latency by up to 6.1× on 8 A100s. Our work has been accepted by CVPR 2024 as a highlight.