MaDL – Adding and Multiplying Matrices and Vectors
In this unit, we will see how we can add and multiply matrices and vectors. We add or subtract vectors or matrices by adding or subtracting them element-wise. Here on the left we can see an example for a vector where the input vectors A and B are element-wise summed to yield respectively each element C where I is here the index of the element within each vector. So for example, we add the first element of the vector A to the first element of the vector B and we get the first element of vector C. And similarly for matrices, we just have two indices here. We take for example the upper left element of the matrix A which is A11 and add the upper left element of matrix B which is element B11 and this yields the upper left element of matrix C, now we will see 11. And so this works analogously for subtraction but I will show you an example with numbers for the addition operator. So let’s suppose we have the vector 1, 3, it’s we call A and the vector 4, 2 which we call B and both of those are column vectors but it would also work if both of them are row vectors. The output would just not be a color vector would be a row vector. So if we add these two column vectors here, A and B, we get 5, 5 because 1 plus 4 is 5 and 3 plus 2 is 5. Similarly for matrices, we just take each of the elements individually and add them together and in this particular case we get 5, 5, 5 and 5 because 1 and 4 is 5, 2 and 3 is 5 and so forth. Note that the vectors of matrices must have the same shape for this to be possible. I can’t add a 2 by 3 matrix to a 3 by 2 matrix for example. We can also add a scalar to a matrix or multiply a matrix by a scalar. This is mathematically well defined. So for example, if we have the matrix B, we can multiply that with a scalar A and then we can add a scalar C and then we get a matrix D as an outcome which has the same shape as the matrix B because we multiply this scalar to each element of B and we add the scalar C to each element of B. So in other words, this corresponds to performing the following operation on each element for element Ij of matrix D, we have the scalar A times the corresponding element of matrix B, B Ij plus the scalar C. We do everything element wise. Now in deep learning we sometimes also allow the addition of a matrix and a vector. So how does that work? And it works only if the matrix in one of the dimensions has the same shape, the same number of rows or same number of columns as the vector. So let’s suppose we have a column vector B and we want to add that to matrix A to yield matrix C that has the same shape as matrix A. What we do is we replicate that vector B to all of the columns of matrix A and add the identical vector B to all of the columns of matrix A. That’s called broadcasting because we replicate. So element wise we can write C Ij equals A Ij plus B I. Note that I doesn’t have a J index because it’s just a column vector. This means that for all of the J’s, for a particular I, we always add the same B I, no matter what the J index is. In this example the vector B is added to each column of matrix A. And this implicit copying of B to many locations is called broadcasting. So here is an example. The first example is for scalar addition and multiplication. If we have this matrix 1, 2, 2, 0, multiply it with 2 and add 1, then we get this matrix 3, 5, 5, 1. We multiply 2 with 1 and add 1 so we get 3. We multiply 2 with 2 which is 4 and get add 1 so we get 5 and so on. And here’s the example for broadcasting where the shape of the vector determines which type of broadcasting to apply. So we have two examples here. On the left we have broadcasting with a column vector where the vector gets replicated into all columns of the matrix. So we add this vector to the first column and to the second column and this gives us this matrix here. Similarly we can do it. Also row wise in this case this row vector here is replicated for all rows. In this case we have two rows here of that matrix. Number words we add that vector, we add each element here to the first and to the second row of that matrix. And then for example for the second row we get 2 plus 1 equal 3 and 0 plus 3 equals 3 as well. If you use Python in particular numpy then you will see that numpy supports this type of broadcasting natively and you can directly use it out of the box and you will see also that the educational deep learning framework is using it. If you want to know more about how numpy interprets broadcasting have a look at this link. Let’s now move on to multiplication. Two vectors or matrices A and B can be multiplied if A has the same number of columns as B has rows. In other words if A is element of the space of M times N real matrices in this case and B is element of the space of N times L matrices. Note that the number of columns of matrix A is the same as the number of rows of matrix B. Only then we can multiply them together which is denoted as such A times B equals C. Element wise this matrix product is defined as follows. We have the element the I have j element of matrix C equal to the sum over this common axis here indicated with the N and here we have the index K that runs until N and so we sum A I K multiplied with B K J. So we sum over this entire axis and multiply these two elements from these two different matrices and because N is the same this is running until the same N here. So this is why A has the same number has to has the same number of columns as B has rows. And so we take for example for like the I have row we take the element in the I have row the elements of the I have row of matrix A and we take the elements of the jth column of matrix B and these are vectors. We multiply all the elements of these vectors with each other and then this sum them up. Note that the matrix product is not a matrix containing the product of the individual elements which is called the Hadama product. So this is different from addition where we just added matrices element wise. This is not the case here for multiplication. The matrix product is not a matrix containing the product of individual elements. And the product of individual elements which is also sometimes useful is called the Hadama product and denoted as such but of course in this case the matrix A and B have to have exactly the same shape. Let’s look at some examples. So here’s an example for matrix product. We have the matrix A 3121 and the matrix B1231 and the product of these two is 6755. Let’s look at one of these entries. So y6. Well we consider for this element the first row of A and the first column of B. So we multiply 3 by 1 and we multiply 1 by 3. So we have 3 plus 3 is 6. Or for example for the second element in the first row of that resulting matrix we have 3 times 2 which is 6 plus 1 times 1 which is 1 is 7. Here’s an example for an inner product between two vectors. Because vectors are basically just matrices where 1 dimension is collapse to 1. We can also do the same thing for vectors where we now can do it in two different ways. One is called the inner product or sometimes also called the dot product where we take two vectors and we get a scalar out. So assume we have the vector 3 2 and the column vector 1 3 and we take the transpose of that first vector. So we make it into a row vector and we multiply that with the column vector 1 3 which means we multiply 3 with 1 and add 2 multiplied with 3 which is 3 plus 6 equals 9. So we get the scalar output 9. The inner product between two matrices always yields a scalar. The opposite is the so called outer product between two vectors which yields a rank 1 matrix. A matrix of rank 1 and we’ll define the concept of rank later on. So in this case we take the same vectors again but instead of transposing vector A we transpose vector B. So we make B into a row vector. And in this case we get a matrix out where the dimensions or the shape of that matrix correspond to the dimensions of the individual vectors. So the first element here would be a 3 because we multiply 3 with 1. The second would be 9 because now we multiply 3 by 3 and the last one would be 6 because in this case we multiply 2 by 3. This is called the outer product. In contrast to the inner product which produces a scalar the outer product which produces a matrix. And now we can also have an example for broad casting with the help of the outer product. So we can define the broad casting that we have defined just mathematically before as a valid operation we can also define it explicitly using the outer product. So we have the same example from before we want a broad cast 1, 3 to 2 by 2 matrix and then added to this matrix. And this is the matrix that should result. So we can do that by computing the outer product first like we compute the outer product between this vector 1, 3 and the vector 1, 1. And what this does basically is replicating this column vector twice. So now we have this matrix and we added to this matrix and we got the broad cast result. Let’s now look at some useful properties of matrices. Sorry. First is matrix multiplication is distributive. So we have a times b plus c. We can write this as a times b plus a times c. The second is the matrix multiplication is also associative. If we have a times b times c, so we compute this first, we can also write a times b. We can compute a times b first and then multiply this with c. However, in general matrix multiplication is not commutative. So we can’t in general write a times b equals b times a. And that’s already clear of course from the requirement of the dimensions as we’ve seen before. The transpose of a matrix product has a simple form. So if we take the transpose of a matrix product, this is the same as transposing each of the elements of the product and swapping them. So a b transpose is the same as b transpose times a transpose. Therefore, the vector dot product is indeed commutative. Right. While in general, the matrix multiplication is not commutative except for very special cases of square matrices and diagonal matrices, etc. The vector dot product is actually commutative. And we can see this by this law here. So we can see this because if we take these vectors here, x, t, y and we take the transpose of them and we know that we have to swap them and take the transpose of the individuals, which is shown here. So we’ve swapped them and we have y transpose x. And because y transpose x is a scalar and the transpose of a scalar is the scalar itself. We can also write x transpose y, which is what we wanted to have. So we have shown that this is commutative. Matrix vector products also allow to completely write systems of linear equations, which we often need to do. For example, in linearly squares or other problems. So let’s suppose we have a linear system, ax equals b, which we can write as a vector a transpose. This is a row vector multiplied with a column vector x. That’s what we are looking for. That’s what we are after. A and b is known. And we have such an equation. Right. This is a scalar. This is a inner product. So we get a scalar. So we have such a linear equation many, many times. Let’s say we have a dam times. So we have the first equation, the second equation, many more equations until the M equation. We can write this compactly as a matrix vector product ax equals b, where each row in that matrix can write this as a transpose i as shown here or just by extracting the row of that matrix. So a i all columns. So this is the if row. And so this denotes the if row of matrix a. This is just a very compact way of writing all of these equations together in one equation.