AI Learns to play Snake

William Verhaeghe
5 min readFeb 5, 2021


This is an addition to a YouTube video, you can find it here. You can also see the source code on GitHub.

The premise

For the past few years, I’ve been diving into AI & ML. I’ve already experimented with ML before and followed some courses and specializations on Coursera, but there’s nothing that beats the real thing: practice. RL is easier to do since you can see results better and don’t need data. It can be much more difficult in practice however.

I’ve decided to improve my ML and RL skills by doing a series of tasks, starting with more easy objectives like Snake, 2048, etc. moving up to Super Mario Bro’s, possibly Euro truck simulator and if that works, I’ll find even harder challenges.

The game

The first part of RL is either coding a game, or developing some way to interact with an existing game. OpenAI developed gym for this purpose, but I love to do things myself and therefore I coded my own Snake game. I tried to create it in a similar way, so that you can switch between them easily if you desire to do so.

The game is pretty easy. You start with a 2-long snake in the center of the board and there is an apple on a random different spot. Every tick you move 1 tile and that can result in 3 things (or none):

  • Pick-up the apple
  • Go off the board
  • Hit yourself

If you pick up the apple, a new apple appears in any random empty spot and your snake grows with one tile. You win the game if your snake is as long as the board.

If you go off the board or hit yourself, you die and start over.

I will skip over the further details as this isn’t a post about developing a Snake game, but playing it. See GitHub for the code.

Reinforcement learning

Reinforcement learning is imo easiest explained as teaching a dog. You say sit, if the dog sits we give it a cookie (reward), if the dog runs away we shout (punish) and if it does nothing it gets nothing or maybe an angry stare (small punishment). After this we repeat the process until the dog acts as we want it to.

If you want more information about RL, there are some great articles about it online.

After creating the game, I made a class for RL. This needs to do 3 main things:

  1. Create a neural network
  2. Train the model
  3. Predict from the model
  4. Optionally save and load (if you want to use it later, this might be useful :-))

Create a NN

I’ve opted to input a list of int’s and convert this to a NN with these sizes for layers. Every layer (except input and output) has a ReLu activation function (positive stays positive, negative values become 0).

We then compile the model and tell it how it should evaluate and which optimizer to use.


The training is done with a single game-step. So after every action we train the NN.


The prediction is the easiest part, we simply make a prediction using the framework

Full RL Code

Together this forms the following class we can plug in to our game:


Let’s bring it all together, (skipping the imports) we start of with our parameters. Putting them together allows for faster iterations and easier debugging.

We then create instances of the game and our RL-model

Then follows the actual training. We have two for loops, one that repeats for every game we will be playing and one that loops until the game is over (either won or died). After 100 games, we save the model and print its current results

Every game step follows the following process:

  1. Get the current board
  2. Predict the best action
  3. Sometimes randomly pick another action (so we can learn more obscure strategies)
  4. Taking the action and retrieving the reward and if the game is over
  5. Train on how well the action was

Going back to teaching the dog, this is a very similar process. The dog will try some “random” things until it receives a reward and will try to do better next time.

Putting this into code we get the following:

In the end, we can plot some graphs. This will help guide our future decisions. The left graph is how much reward each game got (200 means victory), the right graph is how long the game lasted. The max length is 1024, after that it is interrupted.

GUI and more

I’ve also added a visual representation of a game so we can see the model in action. You can see the code on GitHub or on YouTube.

Both models below trained for ~7500 games. We can clearly see how much faster it learns on a smaller board. Note that a game on a bigger board lasts much much longer. I let it train for a couple of days while the model on a smaller board only trained for 24 hours.

The following steps will be to do the same process for more difficult games, improve our model and our understanding.

Thank you for reading, I hope you try it yourself and play a little bit with the model. Let me know how well it did or if you have any questions. I tried not to go too in-depth as to keep the post “lighter”.