Welcome to MIT HAN Lab! We focus on making AI faster, smarter, and more efficient. Our research covers a broad spectrum, including generative AI (e.g., LLMs and diffusion models), TinyML, system optimization and hardware design. By integrating algorithm and hardware expertise, we strive to push the frontiers of AI efficiency and performance.
Graduated PhD students: Ji Lin (OpenAI), Hanrui Wang (assistant professor @UCLA), Zhijian Liu (assistant professor @UCSD), Han Cai (NVIDIA Research), Haotian Tang (Google DeepMind), Yujun Lin (NVIDIA Research).
Accelerating LLM and Generative AI [slides]:
🔥⚡ We release TinyChat 2.0, the latest version with significant advancements in prefilling speed of Edge LLMs and VLMs, 1.5-1.7x faster than the previous version of TinyChat. Please refer to our blog for more details.
DistriFusion is integrated in NVIDIA's TensorRT-LLM for distributed inference on high-resolution image generation.
LServe accelerates long-sequence LLM serving with unified sparse attention for both prefilling and decoding, achieving up to 3.3× speedup over state-of-the-art solution without sacrificing accuracy.
QServe accelerates large-scale LLM serving on GPUs with QoQ (W4A8KV4) quantization, boosting the generation throughputs by up to 3x over the state-of-the-art solution.
A new W4A4 quantization paradigm for diffusion models.
We propose COAT, a memory efficient FP8 training method for large language models.
We actively collaborate with industry partners on efficient AI, model compression and acceleration. Our research has influenced and landed in many industrial products: Intel OpenVino, Intel Neural Network Distiller, Intel Neural Compressor, Apple Neural Engine, NVIDIA Sparse Tensor Core, NVIDIA TensorRT LLM, AMD-Xilinx Vitis AI, Qualcomm AI Model Efficiency Toolkit (AIMET), Amazon AutoGluon, Facebook PyTorch, Microsoft NNI, SONY Neural Architecture Search Library, SONY Model Compression Toolkit, ADI MAX78000/MAX78002 Model Training and Synthesis Tool.