Learning Winning Subnetworks Provably via Greedy Forward Optimization

Feature Photo
Different from traditional pruning algorithms, which delete the less important neurons, our greedy forward method reconstructs the empty network by adding neurons back.
Feature Photo
The loss vs size of the network learned by pruning and directly training. Pruning methods provably learns the network that decrease the loss with faster rate.


Recent empirical works show that large deep neural networks are often highly redundant and one can find much smaller subnetworks without a significant drop of accuracy. However, despite the extensive empirical successes, theoretical properties of network pruning has not been well understood, leaving a variety of open theoretical questions, including 1) whether good subnetworks provably exist, 2) how to to find them efficiently, and 3) if network pruning can be provably better than direct training using gradient descent. In recent works, we propose a provably and practical efficient greedy selection strategy for network pr which answers these questions positively. Our method finds good subnetworks starting from an empty network and greedily adds important neurons from the large network that maximize the a loss function of interest. Theoretically, we show that applying this simple greedy selection strategy on pre-trained networks guarantees to find a small subnetwork with lower loss than networks directly trained with gradient descent. Practically, we improve prior arts of network pruning on learning compact neural architectures on ImageNet, including ResNet, MobilenetV2/V3. Our theory and empirical results on MobileNet suggest that we should fine-tune the pruned subnetworks to leverage the information from the large model, instead of re-training from new random initialization as suggested by some other works.


Funding Support

NSF CAREER 1846421

Publications pertaining to the work

Mao Ye, Chengyue Gong , Lizhen Nie, Denny Zhou, Adam Klivans and Qiang Liu. Good Subnetworks Provably Exists: Pruning via Greedy Forward Selection. ICML 2020

Mao Ye , Lemeng Wu and Qiang Liu. Greedy Optimization Provably Wins the Lottery: Logarithmic Number of Winning Tickets is Enough. NeurIPS 2020