Efficient AI Computing,
Transforming the Future.

Reducing the carbon footprint of AI using the Once-for-All network

TL;DR

“The aim is smaller, greener neural networks,” says Song Han, an assistant professor in the Department of Electrical Engineering and Computer Science. “Searching efficient neural network architectures has until now had a huge carbon footprint. But we reduced that footprint by orders of magnitude with these new methods.”

Related Projects

Figure 1: the Once-for-All network can produce diverse specialized sub-networks without retraining. It removes the need for repeated architecture design and model training, saving orders of magnitude GPU training cost and also produces efficient models for fast inference on mobile devices. Source: Cai et al, Once-for-All Network, ICLR’2020.

Artificial intelligence (AI) has recently become a focus of ethical concerns, but it also has some major sustainability issues — which MIT researchers are already addressing.

An AI application typically consists of two phases: training and deployment. Last June, researchers at the University of Massachusetts at Amherst released a startling report estimating that the amount of power required for training a certain neural network involves the emissions of roughly 626,000 pounds of carbon dioxide. That’s equivalent to nearly five times the lifetime emissions of the average U.S. car, including its manufacturing.

This issue gets even more severe in the model deployment phase, where deep neural networks need to be deployed on diverse resource-constrained hardware platforms from the cloud to the mobile and even to microcontrollers. Different hardware platforms have different properties and computational resources, thus require specialized neural networks to best fit the hardware for efficient inference. If repeating the neural network design and training for each case, it will lead to fast growing carbon emissions.

Figure 2: Deep neural networks are deployed on a wide spectrum of hardware platforms, from cloud GPUs to mobile phones (demo) and even to microcontrollers and AIoT devices (demo).

In an upcoming paper, MIT researchers describe a far greener algorithm  for training and running those neural networks. Results indicate that, by improving the computational efficiency of the system in some key ways, the system can cut down the pounds of carbon emissions involved — in some cases, down to low triple digits.

Figure 3: OFA cuts down the pounds of carbon emissions by 1300-times to triple digits.

The system described in the paper Once-for-All Network trains one large neural network comprising many pretrained subnetworks of different sizes that can be tailored to diverse hardware platforms without retraining. This dramatically reduces the energy usually required to train each specialized neural network for new platforms — including billions of “internet of things” (IoT) devices. Using the system to train a computer-vision model, they estimated around a 1,300-times reduction in emissions compared to today’s state-of-the-art neural architecture search approaches, while achieving 1.5-2.6x inference speedup.

“The aim is smaller, greener neural networks,” says Song Han, an assistant professor in the Department of Electrical Engineering and Computer Science. “Searching efficient neural network architectures has until now had a huge carbon footprint. But we reduced that footprint by orders of magnitude with these new methods.”

The work was carried out on Satori, an efficient computing cluster donated to MIT by IBM that’s capable of performing 2 quadrillion calculations per second. The paper is being presented at the International Conference on Learning Representations in April. Joining Han on the paper are four undergraduate and graduate students from EECS, the MIT-IBM Watson AI Lab, and Shanghai Jiao Tong University.

Creating a “Once-for-All” network

The researchers built the system on a recent AI advance, called “AutoML” (for automatic machine learning), which eliminates manual network design. Neural networks automatically search massive design spaces for network architectures tailored, for instance, to specific hardware platforms. But there’s still a training efficiency issue: each model has to be selected then trained from scratch for its platform architecture.

“How do we train all those networks efficiently for such a broad spectrum of devices — from a $10 IoT device to a $600 smartphone? Given the diversity of IoT devices, the computation cost of neural architecture search will explode,” Han says.  

Figure 4: Once-for-All Network achieves high accuracy at low computation cost, being at the top-left corner of the accuracy-computation trade-off curve.

The researchers invented an AutoML system that trains only a single, large “once-for-all” (OFA) network that serves as a “mother” network, nesting an extremely high number of subnetworks that are sparsely activated from the “mother” network. OFA shares all its learned weights with all subnetworks — meaning they come essentially pretrained. Thus each subnetwork can operate independently at inference time without retraining.

In their work, they trained an OFA convolutional neural network (CNN) — commonly used for image-processing tasks — with versatile architectural configurations, including different numbers of layers and neurons, diverse filter sizes, and diverse input image resolutions. Given a specific platform, the system uses the OFA as the search space to find the best subnetwork based on the accuracy and latency tradeoffs that correlate to the platform’s power and speed limits. For an IoT device, for instance, it will find a smaller subnetwork. For smartphones, it will select larger subnetworks, but with different structures depending on individual battery lifetimes and computation resources. OFA decouples model training and architecture search, and amortizes the one-time training cost across many inference hardware platforms and resource constraints.

This relies on a “progressive shrinking” algorithm that efficiently trains the OFA network to support all of the subnetworks simultaneously. It starts with training the full network with the maximum size, then progressively shrinks the sizes of the network to include smaller subnetworks. Smaller subnetworks are trained with the help of large subnetworks to grow together. In the end, all of the subnetworks with different sizes are supported, allowing fast specialization based on the platform’s power and speed limits. It supports many hardware devices with zero training cost when adding a new device..

Figure 5: OFA speeds up the inference on the mobile phone by 1.5x-2.6x compared to state-of-the-art CNN models (EfficientNet and MobileNet-v3 designed by Google).

In total, one OFA, the researchers found, can comprise more than 10 quintillion — that’s a 1 followed by 19 zeroes — architectural settings, covering probably all platforms ever needed. But training the OFA and searching it ends up being far more efficient than spending hours training each neural network per platform. Moreover, OFA does not scarifice any accuracy and inference efficiency. Instead, OFA provides state-of-the-art ImageNet accuracy on mobile devices (<600M MAC). Compared with state-of-the-art industry-leading CNN models (EfficientNet, MobileNetV3) designed by Google, OFA provides 1.5-2.6x speedup with superior accuracy. OFA is also the winning solution (first place) for the most recent Low-Power Computer Vision Challenges (LPCVC) at ICCV’19 and NeurIPS’19 (both classification track and detection track). The challenge is to compete for the best ImageNet accuracy given a latency constraint on mobile phones. The Once-for-All Network consistently outperformed other team’s manually designed models.