News

Invited Talks and Lectures

Research
These days, most of my time goes in thinking how brains do credit assignment through time.
For RL problems, I'd like to figure out a method that estimates the gradient of the reward with respect
to the action probabilities in a way that mimics some of the fundamental properties of
backprop. Backprop works by *composing* local estimates of effects (Jacobians).
In general I'm interested in unsupervised exploration, and training generative models! :)


Generalization of Equilibrium Propagation to Vector Field Dynamics
Benjamin Scellier,
Anirudh Goyal,
Jonathan Binas,
Thomas Mesnard,
Yoshua Bengio,
ICLR'18 Workshop
The biological plausibility of the backpropagation algorithm has long been doubted by neuroscientists. Two major reasons are that neurons would need to send two different types of signal in the forward and backward phases, and that pairs of neurons would need to communicate through symmetric bidirectional connections.
We present a simple twophase learning procedure for fixed point recurrent networks that addresses both these issues.
In our model, neurons perform leaky integration and synaptic weights are updated through a local mechanism.
Our learning method extends the framework of Equilibrium Propagation to general dynamics, relaxing the requirement of an energy function.
As a consequence of this generalization, the algorithm does not compute the true gradient of the objective function,
but rather approximates it at a precision which is proven to be directly related to the degree of symmetry of the feedforward and feedback weights.
We show experimentally that the intrinsic properties of the system lead to alignment of the feedforward and feedback weights, and that our algorithm optimizes the objective function.


Sparse Attentive Backtracking: Temporal Credit Assignment Through Reminding
Rosemary Nan Ke, Anirudh Goyal, Olexa Bilaniuk, Jonathan Binas, Michael C. Mozer, Chris Pal, Yoshua Bengio
Neural Information Processing System (NIPS), 2018 (Oral Presentation)
Learning longterm dependencies in extended temporal sequences requires credit assignment to events far back in the past. The most common method for training recurrent neural networks, backpropagation through time (BPTT), requires credit information to be propagated backwards through every single step of the forward computation, potentially over thousands or millions of time steps. This becomes computationally expensive or even infeasible when used with long sequences. Importantly,
biological brains are unlikely to perform such detailed reverse replay over very long sequences of internal states (consider days, months, or years.) However, humans are often reminded of past memories or mental states which are associated with the current mental state. We consider the hypothesis that such memory associations between past and present could be used for credit assignment through arbitrarily long sequences, propagating the credit assigned to the current state to
the associated past state. Based on this principle, we study a novel algorithm which only backpropagates through a few of these temporal skip connections, realized by a learned attention mechanism that associates current states with relevant past states. We demonstrate in experiments that our method matches or outperforms regular BPTT and truncated BPTT in tasks involving particularly longterm dependencies, but without requiring the biologically implausible backward
replay through the whole history of states. Additionally, we demonstrate that the proposed method transfers to longer sequences significantly better than LSTMs trained with BPTT and LSTMs trained with full selfattention.


Fortified Networks: Improving the Robustness of Deep Networks by Modeling the Manifold of Hidden Representations
Alex Lamb,
Jonathan Binas,
Anirudh Goyal,
Dmitriy Serdyuk,
Sandeep Subramanian,
Ioannis Mitliagkas,
Yoshua Bengio,
arXiv
/
Code
Deep networks have achieved impressive results across a variety of important tasks. However a known weakness is a failure to perform well when evaluated on data which differ from the training distribution, even if these differences are very small, as is the case with adversarial examples. We propose Fortified Networks, a simple transformation of existing networks, which fortifies the hidden layers in a deep network by identifying when the hidden states are off of the
data manifold, and maps these hidden states back to parts of the data manifold where the network performs well. Our principal contribution is to show that fortifying these hidden states improves the robustness of deep networks and our experiments (i) demonstrate improved robustness to standard adversarial attacks in both blackbox and whitebox threat models; (ii) suggest that our improvements are not primarily due to the gradient masking problem and (iii) show the
advantage of doing this fortification in the hidden layers instead of the input space.


Recall Traces: Backtracking Models for Efficient Reinforcement Learning
Anirudh Goyal,
Philemon Brakel,
William Fedus,
Timothy Lillicrap,
Sergey Levine,
Hugo Larochelle,
Yoshua Bengio,
arXiv
/
code (coming soon)
In many environments only a tiny subset of all states yield high reward. In these cases, few of the interactions with the environment provide a relevant learning signal. Hence, we may want to preferentially train on those highreward states and the probable trajectories leading to them. To this end, we advocate for the use of a backtracking model that predicts the preceding states that terminate at a given highreward state. We can train a model which, starting from a high
value state (or one that is estimated to have high value), predicts and sample for which the (state, action)tuples may have led to that high value state. These traces of (state, action) pairs, which we refer to as Recall Traces, sampled from this backtracking model starting from a high value state, are informative as they terminate in good states, and hence we can use these traces to improve a policy. We provide a variational interpretation for this idea and a practical
algorithm in which the backtracking model samples from an approximate posterior distribution over trajectories which lead to large rewards. Our method improves the sample efficiency of both on and offpolicy RL algorithms across several environments and tasks.


Z Forcing: Training Stochastic RNN's
Anirudh Goyal,
Alessandro Sordoni,
MarcAlexandre Côté,
Rosemary Nan Ke,
Yoshua Bengio,
Neural Information Processing System (NIPS), 2017
arXiv
/
code
We proposed a novel approach to incorporate stochastic latent variables in sequential neural networks. The method builds on recent architectures that use latent variables to condition the recurrent dynamics of the network. We augmented the inference network with an RNN that runs backward through the sequence and added a new auxiliary cost that forces the latent variables to reconstruct the state of that backward RNN, i.e. predict a summary of future observations.


Variational Walkback: Learning a Transition Operator as a Stochastic Recurrent Net
Anirudh Goyal,
Nan Rosemary Ke,
Surya Ganguli,
Yoshua Bengio
Neural Information Processing System (NIPS), 2017
arXiv
/
code
We propose a novel method to directly learn a stochastic transition operator whose repeated application provides generated samples. Traditional undirected graphical models approach this problem indirectly by learning a Markov chain model whose stationary distribution obeys detailed balance with respect to a parameterized energy function.


Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations
David Krueger, Tegan Maharaj, Janos Kramar, Mohammad Pezeshki, Nicolas Ballas, Nan Rosemary Ke, Anirudh Goyal
Yoshua Bengio,
Aaron Courville
Chris Pal
International Conference on Learning Representations (ICLR), 2017
arXiv
/
code
We propose zoneout, a novel method for regularizing RNNs. At each timestep, zoneout stochastically forces some hidden units to maintain their previous values. Like dropout, zoneout uses random noise to train a pseudoensemble, improving generalization. But by preserving instead of dropping hidden units, gradient information and state information are more readily propagated through time, as in feedforward stochastic depth networks.


ACtuAL: ActorCritic Under Adversarial Learning
Anirudh Goyal, Nan Rosemary Ke, Alex Lamb, R Devon Hjelm, Chris Pal, Joelle Pineau,
Yoshua Bengio
arXiv
/
code
Generative Adversarial Networks (GANs) are a powerful framework for deep generative modeling. Posed as a twoplayer minimax problem, GANs are typically trained endtoend on realvalued data and can be used to train a generator of highdimensional and realistic images. However, a major limitation of GANs is that training relies on passing gradients from the discriminator through the generator via backpropagation. This makes it fundamentally difficult to train GANs with
discrete data, as generation in this case typically involves a nondifferentiable function. These difficulties extend to the reinforcement learning setting when the action space is composed of discrete decisions. We address these issues by reframing the GAN framework so that the generator is no longer trained using gradients through the discriminator, but is instead trained using a learned critic in the actorcritic framework with a Temporal Difference (TD) objective. This
is a natural fit for sequence modeling and we use it to achieve improvements on language modeling tasks over the standard TeacherForcing methods.


An ActorCritic Algorithm for Sequence Prediction
Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan Lowe, Joelle Pineau,
Aaron Courville,
Yoshua Bengio
International Conference on Learning Representations (ICLR), 2017
arXiv
/
code
We present an approach to training neural networks to generate sequences using actorcritic methods from reinforcement learning (RL). Current loglikelihood training methods are limited by the discrepancy between their training and testing modes, as models must generate tokens conditioned on their previous guesses rather than the groundtruth tokens. We address this problem by introducing a critic network that is trained to predict the value of an output token, given the policy of an actor network


Professor Forcing: A New Algorithm for Training Recurrent Networks
Anirudh Goyal, Alex Lamb, Ying Zhang, Saizheng Zhang,
Aaron Courville,
Yoshua Bengio,
Neural Information on Processing System(NIPS), 2016
arXiv
/
video
/
code
The Teacher Forcing algorithm trains recurrent networks by supplying observed sequence values as inputs during training and using the network’s own onestepahead predictions to do multistep sampling. We introduce the Professor Forcing algorithm, which uses adversarial domain adaptation to encourage the dynamics of the recurrent network to be the same when training the network and when sampling from the network over multiple time steps.

