MaDL – Vector Calculus
Let’s now move from differential calculus to vector calculus. That is the differentiation and also integration of vector fields. But in this unit we’ll only cover differentiation. And of course that’s very relevant for lectures like computer vision and the learning because we’re dealing a lot with matrices there. So we need to know how to do calculus and differentiate those as well. So let’s consider a function and we write this function in both phase notation because it has not only multi-variate input but it also has a multi-dimensional output. So we have a function that maps in this case from Rn to Rm. In this case the partial derivative of f of respect to x is given by a matrix because both the input as well as the output are high-dimensional. So we write the derivative of f of respect to x evaluated at x is a matrix, in particular this is an m times n matrix. In each element we have the individual partial derivatives. So we have the derivative of f1 with respect to x1 and so forth until f1 with respect to xn in the first row and then like we increment the index of the output, the index that is attached to the f until we get in the last row the partial derivative of fm with respect to x1 until fm with respect to xn. So you can see why this is an m times n matrix because this function has or takes n-dimensional arguments and outputs an m-dimensional value. In other words the partial derivative of f i with respect to x j that appears inside here is the partial derivative of the function f i with output with respect to the j component x j of the function f input vector x. Okay so no magic here we just write for vector valued function for derivative of a vector to vector function. We write this, we arrange this as a matrix. There’s two frequent special cases of this. The first one is a scalar to scalar function where both n and m equal 1. In this case we have a function that maps from r1 to r1. So in this case we have just the one-dimensional derivative that can be written either as such or as such as we’ve seen in the previous unit. And which is simply an element of r1 by 1 or equivalently element of r. The other special case is the vector to scalar function case where n is bigger than 1 and m equals 1. Now in words the function maps from rn to r1. This is a very common case for example in deep learning because we have many different parameters. So this n might be the number of parameters but the function which is the loss function maps only to a single value that we want to optimize. And if we are in this situation then we’re dealing with the so-called gradient of f and this gradient is denoted with this symbol here. This inverted triangle here. The gradient symbol. So we can write this as the gradient of f with respect to the vector x or alternatively the derivative of f with respect to x of x. But this is the most common notation here to indicate that really this is the gradient. This is a vector. And this vector is just an element of r1 times n or element of rn where each element is the function with respect derived with respect to each individual argument of that function, each individual parameter of neural network. But because we don’t have multiple output dimensions of f we just have a single row here, a single vector. That’s called the gradient of f. And this is used exclusively for vector to scale our functions, this gradient notation. Let’s now look at how the chain rule for vector to vector functions works. And this is quite similar to the chain rule that we’ve seen before. Let now f be a function that maps from rn to rm. And we call the argument of fx, we need to name this argument in order to specify what we derive with respect to what we derive later on. And let g be a function that maps from rm to rp. And you can see that we’ve chosen the same m here because this should be a composition in the end. So we need to choose the same here in order for this composition to be valid. And the argument of g we call y. So we have vector-valued function f and vector-valued function g with the respective arguments x and y respectively. And we consider the following composition h of x equals g of f of x as before. But now we’ve vector to vector functions. In this case h is a mapping from n to p. We go directly from n to p if we have this nested combination here. So h is rn to rp. And the following holds the derivative of h with respect to x is equal to the derivative of g with respect to y evaluated at f of x times the derivative of f with respect to x evaluated at x. This is exactly the same expression as before and the chain will just for vector-valued functions now. We have the outer derivative here and the inner derivative here. The only difference is that now we don’t have scalars here but we have matrices. In fact the p times m matrix here, this is a p times m matrix because g maps from m to p so we have a p times m matrix here. Gats multiplied with the m times n matrix f with respect to x because f is a mapping from n to m we have a m times n matrix here to form the resulting p times n matrix h. And there’s an important special case for the chain rule of vector to vector functions as well. Let f be a function that maps from a one-dimensional input to an n-dimensional output, what we call the argument x again, and let g be a function that maps from an n-dimensional input to a one-dimensional output where again the argument of g is y. So the only thing that we’ve changed now is that we’ve assumed that the intermediate quantity in this chain is a vector but both the output and the input of h are one-dimensional quantities. We go from one to n to one. So if you make this restriction, if you consider this special case from before, then we can write the concatenation of functions as g of f of x again. Then in this case, the function h maps from one or one-dimensional quantity to a one-dimensional quantity. It’s a scalar to scalar function, one-dimensional input, one-dimensional output. And the derivative of h with respect to x is again as before the derivative of g with respect to y, the scalar-valued function with respect to a vector y times the derivative of the vector-valued function f with respect to the scalar x. You can already see that basically both of those are vectors. So this is basically a row vector, one times n matrix, and this is a column vector. It’s n times one matrix. And so we have the inner product here, which we can also write as the sum i over i equals one to n of g, respect to y i, evaluated at f of x times the partial derivative of f i with respect to x, evaluated at x. So here we have now scalar quantities that we multiply because we have written this explicitly as the sum over elements i equal one to n. To form the resulting one times one matrix. So we can see that this is a simple sum. It’s a simple inner product. And if we inspect this closely, what do we observe? While we observe that we have through the rules of vector to vector-valued derivatives derived the total derivative that we’ve seen before. This is exactly the expression for the total derivative, where we again have, a vector-valued quantity in between. We have a function that depends on multiple variables. And each of these variables may depend on the quantity that we seek to differentiate with respect to. And that’s why we have this sum here. So this is the connection between the vector calculus and the total derivative that we’ve seen before.