# MaDL – Differential Calculus

Let’s now talk a little bit about differential calculus. And because this is pretty much high school calculus, we’ll go a little bit faster here. The derivative of a function f of x, and on this light we only consider univariate functions, which means functions of a single, one-dimensional argument x. The derivative of a function f of x measures the sensitivity to change of the function value, f with respect to a change in its argument x, and is given by the limit of f of h of a plus h minus f of a over h when we let h go toward zero. If this limit exists, then the derivative exists, and we write the derivative as a over f prime evaluated at a or df with respect to dx evaluated at a. These are just two different notations for the same thing. A function is differentiable if the limit exists, in particular it’s differentiable at a if the limit exists at a. And the slope of the tangan line, here an example from Wikipedia, is equivalent to the derivative at the tangan point here in dark red. So this tangan line slope here is equivalent to the derivative of the tangan point, the point where the tangan touches the function f here and red. Similarly, we can also write down higher order derivatives. We can take the derivative of a function and compute the derivative of a function and so on. So for example, the second derivative is written as f prime prime of a or d squared f, respect to dx squared of a. These are just two different expressions that denote the same, the second derivative of function f evaluated at a. Here are some examples of non-differentiable functions. The first example is a non-differentiable function that’s also not continuous. This is a function that has one value and then suddenly jumps to another value. And at the jump we define this function to have the higher value of these two. Now this is clearly not continuous because we have this jump here. It’s also not a differentiable because if we take the limit from the left and the limit from the right, we get different limits here. If we take the limit from the left, we get an infinitely steep slope. If we take the limit from the right, we get zero. Here’s another example. This is the L1 norm. This is the absolute norm. Y equals absolute of X. In this case the function is actually continuous. But it’s not differentiable at that particular location at zero. It’s differentiable everywhere else. It’s just not differentiable at zero. Because if we take the limit from the left, then we have a derivative of minus one. And if we take the limit from the right hand side we have a derivative of plus one. A positive slope. All of you have heard of the chain rule. So what does the chain rule tell us now? The chain rule expresses the derivative of the composition of two differentiable functions f and g in terms of the derivatives of f and g. More precisely, if we denote the composition of functions f and g, which are typically denoted by this little circle here, as another function h. And this is such that h of x equals f of g of x. This is the composition of the function f and g. We take x as an input to g. And then the output of g as an input to f. And that’s what we call the function h of x. This composed function. This nested function. Then the chain rule tells us the following again in both notations. The first is called the Lagrange notation. And the second is called the Lape Nutz notation. The chain rule tells us that the derivative of the composed function, respect to x, is the outer derivative, which means the derivative of f with respect to its argument, times the inner derivative, which is the argument with respect to x. Similarly here, df with respect to dg, times dg with respect to dx. Here’s a simple example. Let’s suppose the function h of x is equal to 2x squared plus x to the power of 3. Then if we take the derivative of that function h of x, then we have to use the chain rule in order to calculate this derivative because it’s a composition of 2 functions, namely the function 2x squared plus x and the function x to the power of 3. We first take the derivative of this expression here, which is 3 times 2x squared plus x to the power of 2. Then we have the inner derivative, which is the derivative of this expression here, respect to x, which is 4x plus 1. Now let’s move to derivatives of multivariate functions. So far we’ve only considered functions of a single argument x. But now let’s consider multivariate functions, which are functions that depend on multiple arguments, that depend on an argument of higher dimensionality. Let’s consider the simplest case where we have a function that depends on two arguments, let f of x, y be a function where in addition the variable, the argument y also depends on x. y is also a function that depends itself on x. Let’s suppose this is the situation and we have created a situation with a special purpose in mind, with the purpose of distinguishing the terms partial derivative and total derivative. So far for the case of a single variable, we didn’t need to this distinction, but now because we have multiple variables, we must have this distinction as you will see. So let’s first consider the partial derivative. The partial derivative is defined as follows. The partial derivative of the function f with arguments x and y, with respect to x, is denoted with these partial symbols here. And that’s simply the partial derivative of f with respect to x times, well, the derivative of x with respect to x, which is 1. So this is just the chain rule. As you can see, written out explicitly, we could have also directly removed this term here. In other words, in this case, we don’t consider that y is a function of x. We consider y to be a constant. And that’s why we call this a partial derivative. We just differentiate this term as if y was a constant. We don’t see that y has a dependency on x in this case. It’s like a constant c. So as an example here, we have the function x, y equals x times y. And if you take the partial derivative of that function with respect to x, we simply get y. Now in contrast, the total derivative considers this dependency here. It considers that y is a function of x. And the total derivative is defined as follows. D, now we have D instead of the partial symbol here, Df with respect to x equals the partial derivative of f with respect to x times dx dx, which is 1 again plus now an additional term that wasn’t there before, which is the partial derivative of f with respect to y times dy dx. And you can see that we need this additional term here because y depends on x now. If y would not depend on x, if y would just be a constant, then dy with respect to dx would be 0. And this term would go away. And we would be left with this term here, which would be the same as the partial derivative. But because we have chosen that function such that y depends on x, this makes a difference now. And this is called, this expression is called also the multivariate or multi variable chain rule. So this is the difference between the partial and total derivative. The partial derivative considers y constant. And the total derivative considers the dependencies of all the variables inside the argument with respect to the variable that we differentiate with respect to. So here’s the example now for the same example here for this case for the total derivative. Now we have again f equals x times y. But we also assume that y depends on x. And in particular, we make a very simple choice here, y equals x. So in particular, it’s not a constant here. It depends on x. It is equal to x. And in this case, if you compute the total derivative, we see that the right hand side does not vanish because y depends on x. So first of all, we have the left hand side, which is the partial derivative of f with respect to x times 1. So we have y times 1 equals y. And then on the right hand side, we have the partial derivative of f with respect to y, which is x times dy with respect to dx, which is in this case, 1 because y equals x. And if we differentiate that with respect to x, we have 1. So if x times 1 or x. And because y equals x, we can write this as 2x. So you can see that these two expressions are different because in the first case, we’ve computed the partial derivative. And then the second case, we’ve computed the total derivative. And that makes a difference for multivariate functions. As little remark, we sometimes write the partial symbol instead of d in the lecture for simplicity. But we, in this case, still refer to the total derivative because otherwise it wouldn’t make sense. So this has to be inferred from context sometimes. What is meant actually? Let’s now move to implicit functions and implicit differentiation. An implicit equation is a relation described as such. For example, f of x comma y equals 0. Where the function y of x is defined only implicitly. It cannot explicitly be defined. And maybe it’s not even a function. Maybe it’s a curve that cannot be described as a function that doesn’t have a unique y value for every x. Implicit differentiation computes the total derivative of both sides of respect to x. So we have the total derivative of the left hand side and the right hand side. Obviously, the derivative of the right hand side is 0. But on the left hand side, we have the expression from before. And then we can solve this for dy with respect to dx. To understand this better, let’s consider a simple example. Let’s assume x squared plus y squared equals 1. And as you’ll recognize, this is the equation for a circle. Now, the circle is not a function. There’s not a unique y value for every x. In fact, there’s almost for every x, there’s two y values. So we can describe it as a function. But we can still differentiate this curve. We can still find the derivative of every point. In other words, the slope of every tangent along this curve along this circle. Despite this not being, we can’t really represent the shape explicitly as a function, but just through this implicit equation that is given here. So let’s see how this works now for this example. If you apply the total derivative on the left hand side, we get, well, on the right hand side, 0, of course, and on the left hand side, we get 2x plus 2y times dy with respect to dx. And then we can solve for dy with respect to dx and get minus x with respect to y. And note that this expression here actually contains the variable y, which is the variable that we have differentiated with respect to x. This is something that would not occur for explicit equations, if we would use regular differentiation on explicit equations. But here this can happen. And this exactly this fact allows us to describe derivatives also for implicit equations like this circle here.