I always thought that it would be fun to have some sort of AI pet that I could just watch grow over time as I leave it running on an old TV or laptop. I used to think that this would look like an AI ant colony (and I might be doing that project later), but the idea of an AI fish tank has recently caught my attention. So for the next couple of blog posts, we’re going to be building a virtual aquarium where all of the fish are controlled by neural networks.
What is a neural network anyway? The Wikipedia page is an interesting read, but is pretty difficult to understand. The simple definition is that it is a set of nodes connected by edges. The input nodes take in information, this information propagates through some other nodes, and then the answer to the question you’re trying to answer is whatever value ends up being held by the output nodes. That was a lot to take in at once, but it’s a lot simpler to understand if you look at an example.
A Simple Moth Example
Let’s consider a toy example of a moth which wants to go towards light and avoid darkness. In our simple example, the moth has just two inputs and three outputs: it can sense if there is light to the left or right of it, and it can choose to turn left, turn right, or go straight. If the moth senses light to one side, it should want to go towards that side and if it senses darkness to one side, then it should try to avoid that side. We can construct a neural network to represent this.
The two nodes on the left are our input nodes, which represent the moth’s eyes. Each eye can have a value of +1 (senses light), 0 (neutral amount of light), or -1 (darkness). If we ever wanted to use this neural network, we’d have to hook up a system to change those node values to the observed values of light. For our purposes, we can just call those nodes “magic” and say that they always represent the amount of light observed in each eye.
Our next two nodes are “processing” nodes. We could have done our moth example without these, but our fish is going to need processing nodes so it’s better that we explain them now. Each of these nodes takes in data from each of the input nodes and totals them up to form a better representation of the overall problem. The “left eye” processing node (the one on top) represents the amount of light coming in to the left eye minus the amount of light coming in to the right eye. That’s what the +1 and the -1 on the arrows refer to, and we will call these numbers “weights”.
Our last three nodes are the output nodes. These are hooked up to the actions that the moth can take, so these nodes will be controlling which way the moth turns. Whichever node has the highest number will be the action that the moth takes. In different problem spaces, this final step could work differently. For the purposes of our problem, I’ve just set the actions as “turn left”, “turn right”, or “don’t turn”, so it makes sense to just look at which node has the highest value and take that action. If we were in a situation where the moth could choose any angle to turn to between 90° to the left and 90° to the right, then it might make sense to somehow combine the final values of the nodes to calculate which angle the moth should turn to. This isn’t something that our fish is going to need to worry about since I stick to a discrete action space, but it’s at least interesting to think about.
A Slightly More Complicated Moth Example
Just to make sure we’ve got this all down pat, let’s work through a quick example with the moth. If you feel like you already have a good handle on how this works, feel free to skip ahead to the “Backpropogation” section, or even farther to the “Our Fish’s Environment” section if you already understand how backpropogation works (or if you just don’t want to get too deep into the weeds with how neural networks learn).
In this example, our moth has seen light in its left eye, and has seen darkness in its right eye (so the inputs are +1 for the left eye and -1 for the right eye). Based on this, we would hope that the moth would turn to the left. The processing nodes end up with values of +2 and -2 because each one factors in both the light from the left eye and the darkness from the right. This gets passed to the output nodes and “left” is the action with the highest points, just like we would expect. The action “don’t turn” has value 0, because with light on the left and darkness on the right, continuing straight between those two extremes is completely neutral.
As one more quick example, here’s what would happen if both eyes saw light. It’s pretty self-explanatory, so I won’t walk through it, but think about it and make sure you understand what each number refers to (and feel free to leave a comment if you’d like any further explanation)
Every frame, an agent (whether moth or fish) follows these steps:
1. Get its observations about the environment.
2. Run the observations through its neural network and get an action as output.
3. Perform the action.
4. Get a reward from the environment (or a punishment, which is just a negative reward).
5. Use that reward to update the neural network.
Backpropogation is step 5, and without it, an agent won’t get better at responding to its environment over time. The basic idea of backpropogation is that if we receive a positive reward, we should strengthen all of the weights that lead to that decision (because they appear to have been correct), and if we receive a negative reward, we should weaken all of the weights that lead to that decision (because they lead to a mistake). Here is what our moth network would look like after seeing a light through its left eye, turning left, and then receiving a reward. Our action was “good” because of the reward, so we update our weights to strengthen those rewards. Had we gotten a negative reward, the weights would have updated in the opposite direction. Note that the two weights that are 0.5 didn’t update because that middle output node was 0. Those weights didn’t “contribute” to the correct answer. Also, this update step depends on some other factors like how your inputs and actions are set up, so this exact update isn’t the only possible solution.
Our Fish’s Environment
Our fish lives in a 600×400 HTML5 canvas. This means that if you’re using IE8 or lower, it won’t work for you. I would argue that if you’re still using IE8 or lower, then you’ve got bigger problems than not being able to watch a virtual fish swim around your screen. There are 30 pieces of food in our environment, and every time a piece of food is eaten, another piece of food spawns in a random location. Our agent will receive a reward of +100 for eating a piece of food, -2 for every frame that it spends stuck against a wall, and -1 for every frame that it spends not stuck against a wall but also not eating food. I also gave it a +1 reward for getting itself unstuck from a wall to help encourage it to not spend most of the first few minutes swimming straight against the edge.
Our fish has sensors for eyes and for walls, which will be the inputs to our neural network. This is easier to see than to explain, but the basic idea is that the fish shoots out 11 rays in different directions in front of it, and if that ray contacts with food or a wall, then that distance goes in to the neural network. We’re not just passing in a binary “true/false” for whether food or a wall was found, but rather the distance to the food or wall.
Here’s what the wall detection looks like (the wall is along the bottom, green represents hit):
And the food detection, note that more than one ray can contact a piece of food:
And finally, here’s our neural network controlling the fish. The 22 nodes on the left represent the 11 sensors for wall detection and the 11 sensors for food detection. The two fully connected layers of 50 nodes in the middle (not all shown to save space) are the processing nodes where the fish is able to store learned information, although there’s not an intuitive explanation of what each of those nodes does. For example, if you were to ask what the third node down in the second row of 50 “means”, then I wouldn’t be able to give you an answer. These nodes will be trained over time to lead to actions that maximize rewards through backpropogation, and we don’t really need to understand it in any more detail than that. The five nodes on the right represent the five actions that our fish can take (turn left .2 radians, turn left .1 radians, don’t turn at all, turn right .1 radians, and turn right .2 radians). The fish will automatically move forward 2 pixels per time step, so our only actions need to be turning.
Let’s see it work!
If you’ve been reading this post from the top, then this will already have been running for a while. If you want to see it from the beginning, refresh the page. The “learning phase” happens during the first 50,000 steps (although it will actually continue learning after that time). During this time, it will sometimes take random actions in order to help it explore its environment. The likelihood of it taking a random action instead of what it thinks will be best will gradually decreases during the learning phase (although it never quite hits 0 to ensure that the agent hasn’t completely missed an action that it would need to learn from). This learning phase should take around 10 minutes.
I’ve got some ideas for future work on this project, and I’d like to make another couple of blog posts to explore some of these ideas:
- Add more fish, each with slightly different parameters to see which one ends up learning better.
- Maybe the fish could have a hunger level and die off if they aren’t fit enough? I could actually use genetic algorithms to simulate fish evolution, which would be a really fun direction to take this project in.
- Add in predators like sharks to eat the fish.
- Change the fish’s reward function so encourage more “fish-like” behavior like swimming back and forth.
- Make plants that slowly grow instead of just having food randomly appear.
- Add better graphics.
- Add controls like fish speed, amount of food in environment, learning rate, etc.
Thanks for reading!
A few minor blog related things while I’ve still (hopefully) got your attention:
• Follow me on twitter!
• I’ve got some more ideas for projects that I can do for blog posts, but I’d love ideas if you’ve got any.
• I’m rather new to this whole blogging and wordpress thing, so if you’ve got some constructive criticism or suggestions, please reach out to me on twitter or email me at email@example.com