Foundational Research Seminar
The Role of Explicit Regularization in Overparameterized Neural Networks
Abstract: Recent theoretical works on over-parameterized neural nets have focused on two aspects: optimization and generalization. Many existing works that study optimization and generalization together are based on the neural tangent kernel and require a very large width. In this talk, we are interested in the following two questions: for a binary classification problem with two-layer mildly over-parameterized ReLU network, (1) does every local minimum memorize and generalize well? and (2) can we find a set of parameters that result in small test error in polynomial time?We first show that the landscape of loss functions with explicit regularization has the following property: all local minima, and certain other points which are only stationary in certain directions, achieve small test error. We then prove that, for convolutional neural nets, there is an algorithm which finds one of these points in polynomial time (in the input dimension and the number of data points). In addition, we prove that for a fully connected neural net, with an additional assumption on the data distribution, there is a polynomial-time algorithm to find one of these points.