TinyML and Efficient Deep Learning Computing
This course focuses on efficient machine learning and systems. This is a crucial area as deep neural networks demand extraordinary levels of computation, hindering its deployment on everyday devices and burdening the cloud infrastructure. This course introduces efficient AI computing techniques that enable powerful deep learning applications on resource-constrained devices. Topics include model compression, pruning, quantization, neural architecture search, distributed training, data/model parallelism, gradient compression, and on-device fine-tuning. It also introduces application-specific acceleration techniques for large language models and diffusion models. Students will get hands-on experience implementing model compression techniques and deploying large language models (Llama2-7B) on a laptop.
- Live Streaming:https://live.efficientml.ai/
- Lecture Videos:https://live.efficientml.ai/
- Time:
Tuesday/Thursday 3:35-5:00pm Eastern Time
- Location:36-156
- Office Hour:
Thursday 5:00-6:00 pm Eastern Time, 38-344 Meeting Room
- Discussion:Piazza
- Homework Submission:Canvas
- Contact:
- For external inquiries, personal matters, or emergencies, you can email us at efficientml-staff [at] mit.edu.
- If you are interested in getting updates, please sign up here to join our mailing list to get notified!
Announcements
- 2023-12-15
Final project: reports, slides and demo videos
- 2023-12-14
Final report and course evaluation due
- 2023-11-09
Mid-term survey: https://forms.gle/xMgCohDLX73cd4af9
- 2023-10-31
Lab 5 is out.
- 2023-10-19
Lab 4 is out.
Schedule
Date
Lecture
Logistics
Introduction
Sep 7
Introduction
Basics of Deep Learning
Sep 12
Basics of Deep Learning
Lab 0 out
Chapter I: Efficient Inference
Sep 13
Chapter I: Efficient Inference
Pruning and Sparsity (Part I)
Sep 14
Pruning and Sparsity (Part I)
Pruning and Sparsity (Part II)
Sep 19
Pruning and Sparsity (Part II)
Lab 1 out
Quantization (Part I)
Sep 21
Quantization (Part I)
Lab 0 due
Quantization (Part II)
Sep 26
Quantization (Part II)
Neural Architecture Search (Part I)
Sep 28
Neural Architecture Search (Part I)
Lab 1 due (extended to Sep 30 at 11:59 p.m)
Lab 2 out
Neural Architecture Search (Part II)
Oct 3
Neural Architecture Search (Part II)
Knowledge Distillation
Oct 5
Knowledge Distillation
Lab 3 out
Student Holiday — No Class
Oct 10
Student Holiday — No Class
MCUNet: TinyML on Microcontrollers
Oct 12
MCUNet: TinyML on Microcontrollers
Lab 2 due
TinyEngine and Parallel Processing
Oct 17
TinyEngine and Parallel Processing
Chapter II: Domain-Specific Optimization
Oct 18
Chapter II: Domain-Specific Optimization
Transformer and LLM (Part I)
Oct 19
Transformer and LLM (Part I)
Lab 3 due, Lab 4 out
Transformer and LLM (Part II)
Oct 24
Transformer and LLM (Part II)
Vision Transformer
Oct 26
Vision Transformer
Project ideas out (on Canvas)
GAN, Video, and Point Cloud
Oct 31
GAN, Video, and Point Cloud
Lab 4 due, Lab 5 out
Diffusion Model
Nov 2
Diffusion Model
Chapter III: Efficient Training
Nov 6
Chapter III: Efficient Training
Distributed Training (Part I)
Nov 7
Distributed Training (Part I)
Distributed Training (Part II)
Nov 9
Distributed Training (Part II)
On-Device Training and Transfer Learning
Nov 14
On-Device Training and Transfer Learning
Lab 5 due
Efficient Fine-tuning and Prompt Engineering
Nov 16
Efficient Fine-tuning and Prompt Engineering
Basics of Quantum Computing
Nov 21
Basics of Quantum Computing
Project proposal due
Thanksgiving — No Class
Nov 23
Thanksgiving — No Class
Chapter IV: Advanced Topics
Nov 27
Chapter IV: Advanced Topics
Quantum Machine Learning
Nov 28
Quantum Machine Learning
Noise Robust Quantum ML
Nov 30
Noise Robust Quantum ML
Final Project Presentation
Dec 5
Final Project Presentation
Final Project Presentation
Dec 7
Final Project Presentation
Final Project Presentation + Course Summary
Dec 12
Final Project Presentation + Course Summary
Dec 14: Project report and course evaluation due
Logistics
Grading
The class requirements include five labs, and one final project. This is a PhD level course, and by the end of this class you should have a good understanding of efficient deep learning techniques, and be able to deploy large language models (LLMs) on your laptop.
The grading breakdown is as follows:
- 5 Labs (15% x 5)
- Final Project (25%)
- Proposal (5%)
- Presentation + Final Report (20%)
- Participation Bonus (4%)
Note that this class does not have any tests or exams.
Labs
There will be 5 labs over the course of the semester.
- Lab1: Pruning
- Lab2: Quantization
- Lab3: Neural architecture search
- Lab4: LLM compression
- Lab5: LLM deployment on laptop
Collaboration Policy
Labs must be done individually: each student must hand in their own answers. However, it is acceptable to collaborate when figuring out answers and to help each other solve the problems. We will be assuming that, as participants in a graduate course, you will be taking the responsibility to make sure you personally understand the solution arising from such collaboration. You also must indicate on each homework with whom you have collaborated.
Late Policy
You will be allowed 6 total homework late days without penalty for the entire semester. You may be late by up to 6 days on any homework assignment. Once those days are used, you will be penalized according to the following policy:
- Homework is worth full credit at the due time on the due date.
- The allowed late days are counted by day (i.e., each new late day starts at 11:59 pm ET).
- Once the allowed late days are exceeded, the penalty is 50% per late day counted by day.
- The homework is worth zero credit 2 days after exceeding the late day limit.
You must turn in at least 4 of the 5 assignments, even if for zero credit, in order to pass the course.
Regrade Policy
If you feel that we have made a mistake in grading your work, please submit a regrading request to TAs during the office hour and we will consider your request. Please note that regrading of a homework may cause your grade to go either up or down.
Final Project
The class project will be carried out in groups of 2 or 3 people, and has three main parts:
- proposal: choose from a list of suggested projects, or propose your own project
- oral presentation (~10 mins per group)
- final report (4 pages, using the NeurIPS template)
Participation Bonus
We appreciate everyone being actively involved in the class! There are several ways of earning participation bonus credit, which will be capped at 4%:
- Completing mid-semester evaluation: Around the middle of the semester, we will send out a survey to help us understand how the course is going, and how we can improve. Completing it is worth 1%.
- Karma point: Any other act that improves the class, which a TA or instructor notices and deems worthy: 3%.