Efficient AI Computing,
Transforming the Future.

TinyML and Efficient Deep Learning Computing

6.5940

Fall

2024

https://efficientml.ai

This course focuses on efficient machine learning and systems. This is a crucial area as deep neural networks demand extraordinary levels of computation, hindering its deployment on everyday devices and burdening the cloud infrastructure. This course introduces efficient AI computing techniques that enable powerful deep learning applications on resource-constrained devices. Topics include model compression, pruning, quantization, neural architecture search, distributed training, data/model parallelism, gradient compression, and on-device fine-tuning. It also introduces application-specific acceleration techniques for large language models and diffusion models. Students will get hands-on experience implementing model compression techniques and deploying large language models (Llama2-7B) on a laptop.

  • Live Streaming:
  • Time:

    Tuesday/Thursday 3:35-5:00pm Eastern Time

  • Location:
    34-101
  • Office Hour:

    Thursday 5:00-6:00 pm Eastern Time, 38-344 Meeting Room

  • Discussion:
    Piazza
  • Homework Submission:
    Canvas
  • Contact:
    • For external inquiries, personal matters, or emergencies, you can email us at efficientml-staff [at] mit.edu.
    • If you are interested in getting updates, please sign up here to join our mailing list to get notified!
  • Prerequisites: 6.191 Computation Structures and 6.390 Intro to Machine Learning. Students who don't full-fill the prerequisites will be de-registered in the second week of class. If you believe you have equivalent prior experience (e.g., a computer architecture course taken during your undergraduate studies at another institution), you may petition for consideration. Please submit your Petition Form by Sept. 6, 2024, 11:59:59 PM EST.

Instructor

Associate Professor

Teaching Assistants

Announcements

  • 2024-08-30

    The TinyML and Efficient Deep Learning Computing course will be returning in Fall, with recorded sessions on YouTube!

Schedule

Date

Lecture

Logistics

Introduction

Sep 5

Lecture
1
:

Introduction

[Slides]
[Slides]
[Video]
[Video]
[Video (Live)]
[Video (Live)]

Basics of Deep Learning

Sep 10

Lecture
2
:

Basics of Deep Learning

[Slides]
[Slides]
[Video]
[Video]
[Video (Live)]
[Video (Live)]

Lab 0 out

Chapter I: Efficient Inference

Sep 11

Lecture
2
:

Chapter I: Efficient Inference

[Slides]
[Slides]
[Video]
[Video]
[Video (Live)]
[Video (Live)]

Pruning and Sparsity (Part I)

Sep 12

Lecture
3
:

Pruning and Sparsity (Part I)

[Slides]
[Slides]
[Video]
[Video]
[Video (Live)]
[Video (Live)]

Pruning and Sparsity (Part II)

Sep 17

Lecture
4
:

Pruning and Sparsity (Part II)

[Slides]
[Slides]
[Video]
[Video]
[Video (Live)]
[Video (Live)]

Lab 1 out

Quantization (Part I)

Sep 19

Lecture
5
:

Quantization (Part I)

[Slides]
[Slides]
[Video]
[Video]
[Video (Live)]
[Video (Live)]

Lab 0 due

Quantization (Part II)

Sep 24

Lecture
6
:

Quantization (Part II)

[Slides]
[Slides]
[Video]
[Video]
[Video (Live)]
[Video (Live)]

Neural Architecture Search (Part I)

Sep 26

Lecture
7
:

Neural Architecture Search (Part I)

[Slides]
[Slides]
[Video]
[Video]
[Video (Live)]
[Video (Live)]

Lab 1 due, Lab 2 out

Neural Architecture Search (Part II)

Oct 1

Lecture
8
:

Neural Architecture Search (Part II)

[Slides]
[Slides]
[Video]
[Video]
[Video (Live)]
[Video (Live)]

Knowledge Distillation

Oct 3

Lecture
9
:

Knowledge Distillation

[Slides]
[Slides]
[Video]
[Video]
[Video (Live)]
[Video (Live)]

MCUNet: TinyML on Microcontrollers

Oct 8

Lecture
10
:

MCUNet: TinyML on Microcontrollers

[Slides]
[Slides]
[Video]
[Video]
[Video (Live)]
[Video (Live)]

Lab 2 due, Lab 3 out

TinyEngine and Parallel Processing

Oct 10

Lecture
11
:

TinyEngine and Parallel Processing

[Slides]
[Slides]
[Video]
[Video]
[Video (Live)]
[Video (Live)]

Student Holiday — No Class

Oct 15

Lecture
11
:

Student Holiday — No Class

[Slides]
[Slides]
[Video]
[Video]
[Video (Live)]
[Video (Live)]

Chapter II: Domain-Specific Optimization

Oct 16

Lecture
12
:

Chapter II: Domain-Specific Optimization

[Slides]
[Slides]
[Video]
[Video]
[Video (Live)]
[Video (Live)]

Transformer and LLM

Oct 17

Lecture
12
:

Transformer and LLM

[Slides]
[Slides]
[Video]
[Video]
[Video (Live)]
[Video (Live)]

Efficient LLM Deployment

Oct 22

Lecture
13
:

Efficient LLM Deployment

[Slides]
[Slides]
[Video]
[Video]
[Video (Live)]
[Video (Live)]

Lab 3 due, Lab 4 out

LLM Post Training

Oct 24

Lecture
14
:

LLM Post Training

[Slides]
[Slides]
[Video]
[Video]
[Video (Live)]
[Video (Live)]

Project ideas out (on Canvas)

Long Context LLM

Oct 29

Lecture
15
:

Long Context LLM

[Slides]
[Slides]
[Video]
[Video]
[Video (Live)]
[Video (Live)]

Vision Transformer

Oct 31

Lecture
16
:

Vision Transformer

[Slides]
[Slides]
[Video]
[Video]
[Video (Live)]
[Video (Live)]

Lab 4 due, Lab 5 out

GAN, Video, and Point Cloud

Nov 5

Lecture
17
:

GAN, Video, and Point Cloud

[Slides]
[Slides]
[Video]
[Video]
[Video (Live)]
[Video (Live)]

Diffusion Model

Nov 7

Lecture
18
:

Diffusion Model

[Slides]
[Slides]
[Video]
[Video]
[Video (Live)]
[Video (Live)]

Chapter III: Efficient Training

Nov 11

Lecture
18
:

Chapter III: Efficient Training

[Slides]
[Slides]
[Video]
[Video]
[Video (Live)]
[Video (Live)]

Distributed Training (Part I)

Nov 12

Lecture
19
:

Distributed Training (Part I)

[Slides]
[Slides]
[Video]
[Video]
[Video (Live)]
[Video (Live)]

Lab 5 due

Distributed Training (Part II)

Nov 14

Lecture
20
:

Distributed Training (Part II)

[Slides]
[Slides]
[Video]
[Video]
[Video (Live)]
[Video (Live)]

Project proposal due

On-Device Training and Transfer Learning

Nov 19

Lecture
21
:

On-Device Training and Transfer Learning

[Slides]
[Slides]
[Video]
[Video]
[Video (Live)]
[Video (Live)]

Chapter IV: Advanced Topics

Nov 20

Lecture
21
:

Chapter IV: Advanced Topics

[Slides]
[Slides]
[Video]
[Video]
[Video (Live)]
[Video (Live)]

Basics of Quantum Computing

Nov 21

Lecture
22
:

Basics of Quantum Computing

[Slides]
[Slides]
[Video]
[Video]
[Video (Live)]
[Video (Live)]

Quantum Machine Learning

Nov 26

Lecture
23
:

Quantum Machine Learning

[Slides]
[Slides]
[Video]
[Video]
[Video (Live)]
[Video (Live)]

Thanksgiving — No Class

Nov 28

Lecture
23
:

Thanksgiving — No Class

[Slides]
[Slides]
[Video]
[Video]
[Video (Live)]
[Video (Live)]

Final Project Presentation

Dec 3

Lecture
24
:

Final Project Presentation

[Slides]
[Slides]
[Video]
[Video]
[Video (Live)]
[Video (Live)]

Final Project Presentation

Dec 5

Lecture
25
:

Final Project Presentation

[Slides]
[Slides]
[Video]
[Video]
[Video (Live)]
[Video (Live)]

Final Project Presentation + Course Summary

Dec 10

Lecture
26
:

Final Project Presentation + Course Summary

[Slides]
[Slides]
[Video]
[Video]
[Video (Live)]
[Video (Live)]

Dec 14: Project report and course evaluation due

Course Videos

Lecture
1
:

Introduction

Lecture
12
:

Transformer and LLM

Lecture
13
:

Efficient LLM Deployment

Lecture
18
:

Diffusion Model

Logistics

Grading

The class requirements include five labs, and one final project. This is a PhD level course, and by the end of this class you should have a good understanding of efficient deep learning techniques, and be able to deploy large language models (LLMs) on your laptop.

The grading breakdown is as follows:

  • 5 Labs (15% x 5)
  • Final Project (25%)
  • Proposal (5%)
  • Presentation + Final Report (20%)
  • Participation Bonus (4%)

Note that this class does not have any tests or exams.

Labs

There will be 5 labs over the course of the semester.

  • Lab1: Pruning
  • Lab2: Quantization
  • Lab3: Neural architecture search
  • Lab4: LLM compression
  • Lab5: LLM deployment on laptop

Collaboration Policy

Labs must be done individually: each student must hand in their own answers. However, it is acceptable to collaborate when figuring out answers and to help each other solve the problems. We will be assuming that, as participants in a graduate course, you will be taking the responsibility to make sure you personally understand the solution arising from such collaboration. You also must indicate on each homework with whom you have collaborated.

Late Policy

You will be allowed 6 total homework late days without penalty for the entire semester. You may be late by up to 6 days on any homework assignment. Once those days are used, you will be penalized according to the following policy:

  • Homework is worth full credit at the due time on the due date.
  • The allowed late days are counted by day (i.e., each new late day starts at 11:59 pm ET).
  • Once the allowed late days are exceeded, the penalty is 50% per late day counted by day.
  • The homework is worth zero credit 2 days after exceeding the late day limit.

You must turn in at least 4 of the 5 assignments, even if for zero credit, in order to pass the course.

Regrade Policy

If you feel that we have made a mistake in grading your work, please submit a regrading request to TAs during the office hour and we will consider your request. Please note that regrading of a homework may cause your grade to go either up or down.

Final Project

The class project will be carried out in groups of 3 or 4 people, and has three main parts:

  • proposal: choose from a list of suggested projects, or propose your own project
  • poster presentation
  • final report (4 pages, using the NeurIPS template)

Participation Bonus

We appreciate everyone being actively involved in the class! Around the middle and end of the semester, we will send out a survey to help us understand how the course is going, and how we can improve. Completing it is worth 4% in total.