Playing Atari with Deep Reinforcement Learning
Give the pseudocode for Deep Q-learning with Experience Replay.
Initialize replay memory to capacity
Initialize action-value function with random weights for episode = 1, do Initialize sequence and preprocessed sequence for do With probability select a random action Otherwise select Execute action in emulator and observe reward and image Set and preprocess Store transition in Sample random minibatch of transitions from set Perform a gradient descent step on with respect to the network parameters Every steps reset End for End for
Why do we create a replay memory in Deep Q-learning?
Experience replay in Deep Q-Learning has two functions:
- Make more efficient use of the experiences during training. Usually, in online RL, the agent interacts in the environment, gets the experiences (state, action, reward, and next state), learns from them (updates the neural network), and discards them. This is not efficient.
Experience replay helps using the experiences of the training more efficiently. We use a replay buffer that saves experience samples that we can reuse during the training. This allows the agent to learn from the same experiences multiple times.
- Avoid forgetting previous experiences and reduce the correlation between experiences. Experience replay also has other benefits. By randomly sampling the experiences, we remove correlation in the observation sequences and avoid action values from oscillating or diverging catastrophically.
See also: https://huggingface.co/deep-rl-course/unit3/deep-q-algorithm