BEGIN:VCALENDAR
VERSION:2.0
CALSCALE:GREGORIAN
PRODID:UW-Madison-Physics-Events
BEGIN:VEVENT
SEQUENCE:0
UID:UW-Physics-Event-5427
DTSTART:20200603T160000Z
DTEND:20200603T170000Z
DTSTAMP:20260415T024453Z
LAST-MODIFIED:20200428T223840Z
LOCATION:Please register for this online event: http://physicsmeetsml.
 org
SUMMARY:Why do neural networks generalise in the overparameterised reg
 ime?\, Physics ∩ ML Seminar\, Ard Louis\, University of Oxford
DESCRIPTION:One of the most surprising properties of deep neural netwo
 rks (DNNs) is that they typically perform best in the overparameterise
 d regime. Physicists are taught from a young age that having more para
 meters than datapoints is a terrible idea. This intuition can be forma
 lised in standard learning theory approaches\, based for example on mo
 del capacity\, which also predict that DNNs should heavily over-fit in
  this regime\, and therefore not generalise at all. So why do DNNs wor
 k so well? We use a version of the coding theorem from Algorithmic Inf
 ormation Theory to argue that DNNs are generically biased towards simp
 le solutions. Such an inbuilt Occam’s razor means that they are bias
 ed towards solutions that typically generalise well. We further explor
 e the interplay between this simplicity bias and the error spectrum on
  a dataset to develop a detailed Bayesian theory of training and gener
 alisation that explains why and when SGD trained DNNs generalise\, and
  when they should not. This picture also allows us to derive tight PAC
 -Bayes bounds that closely track DNN learning curves and can be used t
 o rationalise differences in performance across architectures. Finally
 \, we will discuss some deep analogies between the way DNNs explore fu
 nction space\, and biases in the arrival of variation that explain cer
 tain trends observed in biological evolution.
URL:https://www.physics.wisc.edu/events/?id=5427
END:VEVENT
END:VCALENDAR
