Double DQN agent for Atari pong and RL tooling

Publication: 2025-06-17

By: Marc Anoma

Introduction

This is a reimplementation of an old project from back when I was going through the Stanford cs234 coursework before COVID…

The older implementation I made was in tensorflow. And since then, I had honestly not touched reinforcement learning that much (if at all). So as many years passed, and looking at all the amazing tools that got developed for supervised & unsupervised learning tasks, I thought that now would be a great time to move that implementation I had from tensorflow to pytorch, naively hoping that there would be a vibrant ecosystem of reinforcement learning libraries that would make the ancient DQN implementation for Atari games just a few lines of code, with tons of tooling and great documentation.

As you might have guessed, it seems I was wrong…

Thoughts on the tooling

I did not do an exhaustive or detailed evaluation of all the RL libraries out there, so take this with a grain of salt.

But after looking at a few, my feeling is that RL is still very hard to do compared to supervised learning, and it’s still a bit of the wild west in terms of tooling. There are no clear “winner” libraries.

As I was going for a pytorch implementation, I thought it would make sense to just give a try to torchrl. And I want to give a lot of credit to the team that made it, because a lot of the patterns they use make sense, and I can see how hard they’ve worked on it.

But I cannot call it “mature” yet, and actually as of writing this article, the latest version is still 0.8. I think they can improve the documentation and the examples they have. For instance I was very excited to give a try to their replay buffer implementation and the CatFrames transform as I previously had a bit of a messy and not scalable one, but it took me quite some time to understand exactly how it was working under the hood, and what I needed to make it work end-to-end.

In my opinion, the most readable pytorch RL library is by far pfrl. The code is very easy to follow, and the design makes sense (although I wish it would be closer to the torchrl one). But the sad thing is that it looks a bit dead… the last commit in the main branch is from almost 1 year ago.

On the other hand, what I was happy to see is that thanks to OpenAI, there is now a maintained fork of the gym library, and they also made MuJoCo free! Big thanks for that!

I’ve seen some (sometimes critical) threads on the RLlib, which seems to be a bit of a default for scalable RL training. But although I’m a big fan of Ray in general, for the simple stuff I’m doing at home I thought it was overkill.

There are other libraries that provide SOTA implementations like Stable Baselines, …

Contribution

In the end, I had some success with torchrl, but I ended up making my own pytorch implementation for the final double DQN training.

In the pong game, one episode ends when one of the players reaches 21 points, and in the animation below my RL agent is the green one on the right. As you can see, this episode is quite a fun one as my agent starts the game pretty poorly before doing a beautiful comeback to win it!

It looks like it learned by itself some special techniques: for instance giving great acceleration to the ball such that the opponent is too slow to come back. Or multiple times it seems to hit the ball in such a way that the ball goes back in a straight line, sometimes destabilizing the opponent.

Pong win

Here is the code (with the torch checkpoint) if you want to run this agent for more games. Hope you enjoy it!