The Mathematics Of AI

Concept

Having a solid foundation in mathematics will always serve you well in the field of AI. This page describes how neural networks are trained. We'll first try to develop an intuition for what is going on in the training process and then we'll look at a detailed mathematical description of how the weights of a neural network are actually updated.

The steps to train a neural network using supervised learning.

Step 1 is often referred to as the forward pass and is used in both training a neural network and running inference using a network. It is the step where the network makes a prediction for some input data. Steps 3 and 4 are referred to as the backwards pass and is only done during training a neural network. This process of training is known as supervised learning because we have predictions that we can check ( or "supervise") against a ground-truth. Unsupervised learning approaches exist as well but they are beyond the scope of this explanation.

The key to understanding how neural networks are trained is by understanding what it means to minimize the loss function. If you recall from Calculus, if you have some function f(x) and you wanted to find a local minima/maxima, you would take the first derivative of the function and find an x value where f'(x) = 0. In other words, a point where the slope is equal to 0. However, imagine the function you are trying to optimize is incredibly large and complex. This is what loss functions are like. They are functions in n-dimensional space and a simple derivative cannot get you the point where this loss function is minimized. So how do you find local minima in such a space?

There's a famous analogy to traversing a mountain which can give you some insight. Imagine you were blindfolded at the top of a mountain and were asked to find your way down. How would you do it?

Gradient descent and the analogy of a blindfolded person finding their way down a mountain.

One way would be for the blindfolded person to check around them for the area where the slope is locally the steepest. The equivalent concept in mathematics would be to check for the gradient or the partial derivative of the loss function with respect to each of the weights in the network. Then, just like the blindfolded person might decide to take a step in the direction of steepest descent (before repeating the process) the weights of a neural network can also be updated in this manner as shown in the image above. There is no guarantee that you will reach a global minima with this process (i.e. reach the bottom of the mountain) but if things are working correctly you should find yourself in a local minima. If the performance of the network is sufficient in such a local minima, then you would have successfully trained a neural network.

The following writeup includes the nitty gritty of the mathematics behind this process which is better referred to as backpropagation.