BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//talks.cam.ac.uk//v3//EN
BEGIN:VTIMEZONE
TZID:Europe/London
BEGIN:DAYLIGHT
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
TZNAME:BST
DTSTART:19700329T010000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
TZNAME:GMT
DTSTART:19701025T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
CATEGORIES:AI+Pizza
SUMMARY:Pizza &\; AI April 2019 - Microsoft Research/Un
iversity of Cambridge
DTSTART;TZID=Europe/London:20190426T173000
DTEND;TZID=Europe/London:20190426T190000
UID:TALK123142AThttp://talks.cam.ac.uk
URL:http://talks.cam.ac.uk/talk/index/123142
DESCRIPTION:*Speaker 1* - Andrey Malinin\n\n*Title* - This is
the EnDD: Ensemble Distribution Distillation\n\n*A
bstract* - Ensemble of Neural Network (NN) models
are known to yield improvements \nin accuracy as w
ell as robust measures of uncertainty. However\, e
nsembles come at high computational and memory cos
t\, which may be prohibitive for certain applicati
on. Previously\, the distillation of an ensemble i
nto a single model has been investigated. Such app
roaches \ndecrease computational cost and allow a
single model to achieve accuracy comparable to tha
t of an ensemble. However\, information about the
diversity of the ensemble\, which can yield estima
tes of epistemic uncertainty\, is lost. Recently\,
a new type of model\, called a Prior Network\, ha
s been introduced\, which allows a single DNN to e
xplicitly model a distribution over output distrib
utions conditioned on the input by parameterizing
a Dirichlet distribution. This work proposes an ap
proach called Ensemble Distribution Distillation\,
which allows distilling an ensemble into a single
Prior Network model\, retaining both the improved
classification performance as well as measures of
diversity of the ensemble. The properties of Ense
mble \nDistribution Distillation are investigated
on a synthetic spiral dataset.\n\n\n*Speaker 2*- Y
ingzhen Li\n\n*Title* - Meta-Learning for Stochast
ic Gradient MCMC\n\n*Abstract* - Stochastic gradie
nt Markov chain Monte Carlo (SG-MCMC) has become i
ncreasingly popular for simulating posterior sampl
es in large-scale Bayesian modeling. However\, exi
sting SG-MCMC schemes are not tailored to any spec
ific probabilistic model\, even a simple modificat
ion of the underlying dynamical system requires si
gnificant physical intuition. This paper presents
the first meta-learning algorithm that allows auto
mated design for the underlying continuous dynamic
s of an SG-MCMC sampler. The learned sampler gener
alizes Hamiltonian dynamics with state-dependent d
rift and diffusion\, enabling fast traversal and e
fficient exploration of neural network energy land
scapes. Experiments validate the proposed approach
on both Bayesian fully connected neural network a
nd Bayesian recurrent neural network tasks\, showi
ng that the learned sampler out-performs generic\,
hand-designed SG-MCMC algorithms\, and generalize
s to different datasets and larger architectures.\
n\nThis is a joint work with Wenbo Gong and Jose M
iguel Hernandez-Lobato from the University of Camb
ridge. The paper will be presented at ICLR 2019.\n
LOCATION:Auditorium\, Microsoft Research Ltd\, 21 Station R
oad\, Cambridge\, CB1 2FB
CONTACT:Microsoft Research Cambridge Talks Admins
END:VEVENT
END:VCALENDAR