Place: Chamberlin 5280 (Zoom link for those attending online: )
Speaker: Ro Jefferson, Nordita
Abstract: Recently, exciting progress has been made in the study of deep neural networks (DNNs) by applying ideas and techniques from physics, and in particular QFT. In this talk, I will first give a brief overview of some key aspects of the approach to DNNs from effective theory, and highlight the information-theoretic language that unites these two seemingly disparate fields. Then, I will explain how one can go beyond the level of analogy by explicitly constructing a bona-fide QFT corresponding to a general class of DNNs encompassing both recurrent and feedforward architectures. The resulting theory closely resembles the well-studied O(N) vector model, in which the variance of the weight initializations plays the role of the 't Hooft coupling. In this framework, the Gaussian process approximation used in machine learning corresponds to a free field theory, and finite-width effects can be computed perturbatively in the ratio of depth to width, T/N. These provide corrections to the correlation length that controls the depth to which information can propagate through the network, and thereby sets the scale at which such networks are trainable by gradient descent. This analysis provides a first-principles approach to the rapidly emerging NN-QFT correspondence, and opens several interesting avenues to the study of criticality in deep neural networks.
Based on 2109.13247 with Kevin T. Grosvenor.