Welcome to MIT HAN Lab! We focus on making AI faster, smarter, and more efficient. Our research covers a broad spectrum, including generative AI (e.g., LLMs and diffusion models), TinyML, system optimization and hardware design. By integrating algorithm and hardware expertise, we strive to push the frontiers of AI efficiency and performance.
Graduated PhD students: Ji Lin (OpenAI), Hanrui Wang (assistant professor @UCLA), Zhijian Liu (assistant professor @UCSD), Han Cai (NVIDIA Research), Haotian Tang (Google DeepMind), Yujun Lin (NVIDIA Research).
Accelerating LLM and Generative AI [slides]:
SANA is an efficient linear DiT that can generate images up to 4096 × 4096. SANA delivers: 20x smaller & 100x faster than FLUX; Deployable on laptop GPU; Top-notch GenEval & DPGBench results.
By selectively applying full attention to critical attention heads and using "Streaming Attention" on others, DuoAttention significantly reduces both pre-filling and decoding memory usage and latency for long-context LLMs, while maintaining their long-context capabilities.
A new family of high-spatial compression autoencoders for accelerating high-resolution diffusion models.
LongVILA is a full-stack solution for long VLM, which incorporates novel training strategies, dataset, and the Multi-Modal Sequence Parallelism (MM-SP) system to efficiently handle long video understanding, achieving significant scalability, accuracy, and speed improvements on multi-modal benchmarks.
We actively collaborate with industry partners on efficient AI, model compression and acceleration. Our research has influenced and landed in many industrial products: Intel OpenVino, Intel Neural Network Distiller, Intel Neural Compressor, Apple Neural Engine, NVIDIA Sparse Tensor Core, NVIDIA TensorRT LLM, AMD-Xilinx Vitis AI, Qualcomm AI Model Efficiency Toolkit (AIMET), Amazon AutoGluon, Facebook PyTorch, Microsoft NNI, SONY Neural Architecture Search Library, SONY Model Compression Toolkit, ADI MAX78000/MAX78002 Model Training and Synthesis Tool.