Deep neural networks beyond the limit of infinite width
- Yasaman Bahri, Google Inc.
A scientific understanding of modern deep learning is still in its early stages. As a first step towards understanding the learning dynamics of neural networks, one can simplify the problem by studying limits that might have theoretical tractability and practical relevance. I’ll begin with a brief survey of our earlier body of work that has investigated the infinite width limit of deep networks, a topic of active study recently. With these results in hand, it nonetheless appears there is still a gap towards theoretically describing neural networks at finite width. I’ll argue that the choice of learning rate is one crucial factor in dynamics away from the infinite width limit and naturally classifies deep networks into two classes separated by a sharp transition. This is elucidated in a class of solvable simple models we present, which give quantitative predictions for the two classes. Quite remarkably, we test these predictions empirically in practical settings and find excellent agreement.
Yasaman Bahri is a research scientist on the Google Brain team. Her current research program is to build a scientific understanding of deep learning using a combination of theoretical analysis and empirical investigation. Prior to Google, she was at the University of California, Berkeley, where she received her Ph.D. in physics in 2017, specializing in theoretical quantum condensed matter.