BEGIN:VCALENDAR
VERSION:2.0
CALSCALE:GREGORIAN
PRODID:UW-Madison-Physics-Events
BEGIN:VEVENT
SEQUENCE:0
UID:UW-Physics-Event-6347
DTSTART:20210224T170000Z
DTEND:20210224T181500Z
DTSTAMP:20260715T050039Z
LAST-MODIFIED:20210217T155713Z
LOCATION:Online Seminar: Please sign up for our mailing list at www.ph
 ysicsmeetsml.org for zoom link
SUMMARY:Neural Mechanics: Symmetry and Broken Conservation Laws in Dee
 p Learning Dynamics\, Physics ∩ ML Seminar\, Daniel Kunin and Hideno
 ri Tanaka\, Stanford University
DESCRIPTION:Understanding the dynamics of neural network parameters du
 ring training is one of the key challenges in building a theoretical f
 oundation for deep learning. A central obstacle is that the motion of 
 a network in high-dimensional parameter space undergoes discrete finit
 e steps along complex stochastic gradients derived from real-world dat
 asets. We circumvent this obstacle through a unifying theoretical fram
 ework based on intrinsic symmetries embedded in a network’s architec
 ture that are present for any dataset. We show that any such symmetry 
 imposes stringent geometric constraints on gradients and Hessians\, le
 ading to an associated conservation law in the continuous-time limit o
 f stochastic gradient descent (SGD)\, akin to Noether’s theorem in p
 hysics. We further show that finite learning rates used in practice ca
 n actually break these symmetry induced conservation laws. We apply to
 ols from finite difference methods to derive modified gradient flow\, 
 a differential equation that better approximates the numerical traject
 ory taken by SGD at finite learning rates. We combine modified gradien
 t flow with our framework of symmetries to derive exact integral expre
 ssions for the dynamics of certain parameter combinations. We empirica
 lly validate our analytic expressions for learning dynamics on VGG-16 
 trained on Tiny ImageNet. Overall\, by exploiting symmetry\, our work 
 demonstrates that we can analytically describe the learning dynamics o
 f various parameter combinations at finite learning rates and batch si
 zes for state of the art architectures trained on any dataset.
URL:https://www.physics.wisc.edu/events/?id=6347
END:VEVENT
END:VCALENDAR