TinyML and Efficient Deep Learning Computing

6.5940

• Fall

• 2023

• https://efficientml.ai

This course focuses on efficient machine learning and systems. This is a crucial area as deep neural networks demand extraordinary levels of computation, hindering its deployment on everyday devices and burdening the cloud infrastructure. This course introduces efficient AI computing techniques that enable powerful deep learning applications on resource-constrained devices. Topics include model compression, pruning, quantization, neural architecture search, distributed training, data/model parallelism, gradient compression, and on-device fine-tuning. It also introduces application-specific acceleration techniques for large language models and diffusion models. Students will get hands-on experience implementing model compression techniques and deploying large language models (Llama2-7B) on a laptop.

Live Streaming:
https://live.efficientml.ai/
Lecture Videos:
https://live.efficientml.ai/
Time:
Tuesday/Thursday 3:35-5:00pm Eastern Time
Location:
36-156
Office Hour:
Thursday 5:00-6:00 pm Eastern Time, 38-344 Meeting Room
Discussion:
Piazza
Homework Submission:
Canvas
Contact:
- For external inquiries, personal matters, or emergencies, you can email us at efficientml-staff [at] mit.edu.
- If you are interested in getting updates, please sign up here to join our mailing list to get notified!

Instructor

Song Han

Associate Professor

Teaching Assistants

Han Cai

Ph.D

Ji Lin

Ph.D

Announcements

2023-12-15
Final project: reports, slides and demo videos

2023-12-14
Final report and course evaluation due

2023-11-09
Mid-term survey: https://forms.gle/xMgCohDLX73cd4af9

2023-10-31
Lab 5 is out.

2023-10-19
Lab 4 is out.

Schedule

Date

Lecture

Logistics

Introduction

Sep 7

Lecture

Introduction

[Slides]

[Video]

[Video (Live)]

Basics of Deep Learning

Sep 12

Lecture

Basics of Deep Learning

[Slides]

[Video]

[Video (Live)]

Lab 0 out

Chapter I: Efficient Inference

Sep 13

Lecture

Chapter I: Efficient Inference

[Slides]

[Video]

[Video (Live)]

Pruning and Sparsity (Part I)

Sep 14

Lecture

Pruning and Sparsity (Part I)

[Slides]

[Video]

[Video (Live)]

Pruning and Sparsity (Part II)

Sep 19

Lecture

Pruning and Sparsity (Part II)

[Slides]

[Video]

[Video (Live)]

Lab 1 out

Quantization (Part I)

Sep 21

Lecture

Quantization (Part I)

[Slides]

[Video]

[Video (Live)]

Lab 0 due

Quantization (Part II)

Sep 26

Lecture

Quantization (Part II)

[Slides]

[Video]

[Video (Live)]

Neural Architecture Search (Part I)

Sep 28

Lecture

Neural Architecture Search (Part I)

[Slides]

[Video]

[Video (Live)]

Lab 1 due (extended to Sep 30 at 11:59 p.m)

Lab 2 out

Neural Architecture Search (Part II)

Oct 3

Lecture

Neural Architecture Search (Part II)

[Slides]

[Video]

[Video (Live)]

Knowledge Distillation

Oct 5

Lecture

Knowledge Distillation

[Slides]

[Video]

[Video (Live)]

Lab 3 out

Student Holiday — No Class

Oct 10

Lecture

Student Holiday — No Class

[Slides]

[Video]

[Video (Live)]

MCUNet: TinyML on Microcontrollers

Oct 12

Lecture

MCUNet: TinyML on Microcontrollers

[Slides]

[Video]

[Video (Live)]

Lab 2 due

TinyEngine and Parallel Processing

Oct 17

Lecture

TinyEngine and Parallel Processing

[Slides]

[Video]

[Video (Live)]

Chapter II: Domain-Specific Optimization

Oct 18

Lecture

Chapter II: Domain-Specific Optimization

[Slides]

[Video]

[Video (Live)]

Transformer and LLM (Part I)

Oct 19

Lecture

Transformer and LLM (Part I)

[Slides]

[Video]

[Video (Live)]

Lab 3 due, Lab 4 out

Transformer and LLM (Part II)

Oct 24

Lecture

Transformer and LLM (Part II)

[Slides]

[Video]

[Video (Live)]

Vision Transformer

Oct 26

Lecture

Vision Transformer

[Slides]

[Video]

[Video (Live)]

Project ideas out (on Canvas)

GAN, Video, and Point Cloud

Oct 31

Lecture

GAN, Video, and Point Cloud

[Slides]

[Video]

[Video (Live)]

Lab 4 due, Lab 5 out

Diffusion Model

Nov 2

Lecture

Diffusion Model

[Slides]

[Video]

[Video (Live)]

Chapter III: Efficient Training

Nov 6

Lecture

Chapter III: Efficient Training

[Slides]

[Video]

[Video (Live)]

Distributed Training (Part I)

Nov 7

Lecture

Distributed Training (Part I)

[Slides]

[Video]

[Video (Live)]

Distributed Training (Part II)

Nov 9

Lecture

Distributed Training (Part II)

[Slides]

[Video]

[Video (Live)]

‍

On-Device Training and Transfer Learning

Nov 14

Lecture

On-Device Training and Transfer Learning

[Slides]

[Video]

[Video (Live)]

Lab 5 due

Efficient Fine-tuning and Prompt Engineering

Nov 16

Lecture

Efficient Fine-tuning and Prompt Engineering

[Slides]

[Video]

[Video (Live)]

Basics of Quantum Computing

Nov 21

Lecture

Basics of Quantum Computing

[Slides]

[Video]

[Video (Live)]

Project proposal due

Thanksgiving — No Class

Nov 23

Lecture

Thanksgiving — No Class

[Slides]

[Video]

[Video (Live)]

Chapter IV: Advanced Topics

Nov 27

Lecture

Chapter IV: Advanced Topics

[Slides]

[Video]

[Video (Live)]

Quantum Machine Learning

Nov 28

Lecture

Quantum Machine Learning

[Slides]

[Video]

[Video (Live)]

Noise Robust Quantum ML

Nov 30

Lecture

Noise Robust Quantum ML

[Slides]

[Video]

[Video (Live)]

Final Project Presentation

Dec 5

Lecture

Final Project Presentation

[Slides]

[Video]

[Video (Live)]

Final Project Presentation

Dec 7

Lecture

Final Project Presentation

[Slides]

[Video]

[Video (Live)]

Final Project Presentation + Course Summary

Dec 12

Lecture

Final Project Presentation + Course Summary

[Slides]

[Video]

[Video (Live)]

Dec 14: Project report and course evaluation due

Logistics

Grading

The class requirements include five labs, and one final project. This is a PhD level course, and by the end of this class you should have a good understanding of efficient deep learning techniques, and be able to deploy large language models (LLMs) on your laptop.

The grading breakdown is as follows:

5 Labs (15% x 5)
Final Project (25%)
Proposal (5%)
Presentation + Final Report (20%)
Participation Bonus (4%)

Note that this class does not have any tests or exams.

Labs

There will be 5 labs over the course of the semester.

Lab1: Pruning
Lab2: Quantization
Lab3: Neural architecture search
Lab4: LLM compression
Lab5: LLM deployment on laptop

Collaboration Policy

Labs must be done individually: each student must hand in their own answers. However, it is acceptable to collaborate when figuring out answers and to help each other solve the problems. We will be assuming that, as participants in a graduate course, you will be taking the responsibility to make sure you personally understand the solution arising from such collaboration. You also must indicate on each homework with whom you have collaborated.

Late Policy

You will be allowed 6 total homework late days without penalty for the entire semester. You may be late by up to 6 days on any homework assignment. Once those days are used, you will be penalized according to the following policy:

Homework is worth full credit at the due time on the due date.
The allowed late days are counted by day (i.e., each new late day starts at 11:59 pm ET).
Once the allowed late days are exceeded, the penalty is 50% per late day counted by day.
The homework is worth zero credit 2 days after exceeding the late day limit.

You must turn in at least 4 of the 5 assignments, even if for zero credit, in order to pass the course.

Regrade Policy

If you feel that we have made a mistake in grading your work, please submit a regrading request to TAs during the office hour and we will consider your request. Please note that regrading of a homework may cause your grade to go either up or down.

Final Project

The class project will be carried out in groups of 2 or 3 people, and has three main parts:

proposal: choose from a list of suggested projects, or propose your own project
oral presentation (~10 mins per group)
final report (4 pages, using the NeurIPS template)

Participation Bonus

We appreciate everyone being actively involved in the class! There are several ways of earning participation bonus credit, which will be capped at 4%:

Completing mid-semester evaluation: Around the middle of the semester, we will send out a survey to help us understand how the course is going, and how we can improve. Completing it is worth 1%.
Karma point: Any other act that improves the class, which a TA or instructor notices and deems worthy: 3%.

Efficient AI Computing,Transforming the Future.

TinyML and Efficient Deep Learning Computing

6.5940

•

Fall

•

2023

•

https://efficientml.ai

Instructor

Song Han

Teaching Assistants

Han Cai

Ji Lin

Announcements

Schedule

Date

Lecture

Logistics

Introduction

Sep 7

Introduction

Basics of Deep Learning

Sep 12

Basics of Deep Learning

Chapter I: Efficient Inference

Sep 13

Chapter I: Efficient Inference

Pruning and Sparsity (Part I)

Sep 14

Pruning and Sparsity (Part I)

Pruning and Sparsity (Part II)

Sep 19

Pruning and Sparsity (Part II)

Quantization (Part I)

Sep 21

Quantization (Part I)

Quantization (Part II)

Sep 26

Quantization (Part II)

Neural Architecture Search (Part I)

Sep 28

Neural Architecture Search (Part I)

Neural Architecture Search (Part II)

Oct 3

Neural Architecture Search (Part II)

Knowledge Distillation

Oct 5

Knowledge Distillation

Student Holiday — No Class

Oct 10

Student Holiday — No Class

MCUNet: TinyML on Microcontrollers

Oct 12

MCUNet: TinyML on Microcontrollers

TinyEngine and Parallel Processing

Oct 17

TinyEngine and Parallel Processing

Chapter II: Domain-Specific Optimization

Oct 18

Chapter II: Domain-Specific Optimization

Transformer and LLM (Part I)

Oct 19

Transformer and LLM (Part I)

Transformer and LLM (Part II)

Oct 24

Transformer and LLM (Part II)

Vision Transformer

Oct 26

Vision Transformer

GAN, Video, and Point Cloud

Oct 31

GAN, Video, and Point Cloud

Diffusion Model

Nov 2

Diffusion Model

Chapter III: Efficient Training

Nov 6

Chapter III: Efficient Training

Distributed Training (Part I)

Efficient AI Computing,
Transforming the Future.