In the first story [1] of this series, we have:
- Addressed multiplication of a matrix by a vector,
- Introduced the concept of X-diagram for a given matrix,
- Observed behavior of several special matrices, when being multiplied by a vector.
In the current 2nd story, we will grasp the physical meaning of matrix-matrix multiplication, understand why multiplication is not a symmetrical operation (i.e., why “A*B ≠ B*A“), and finally, we will see how several special matrices behave when being multiplied over each other.
So let’s start, and we’ll do it by recalling the definitions that I use throughout this series:
- Matrices are denoted with uppercase (like ‘A‘ and ‘B‘), while vectors and scalars are denoted with lowercase (like ‘x‘, ‘y‘ or ‘m‘, ‘n‘).
- |x| – is the length of vector ‘x‘,
- rows(A) – number of rows of matrix ‘A‘,
- columns(A) – number of columns of matrix ‘A‘.
The concept of multiplying matrices
Multiplication of 2 matrices “A” and “B” is probably the most common operation in matrix analysis. A known fact is that “A” and “B” can be multiplied only if “columns(A) = rows(B)”. At the same time, “A” can have any number of rows, and “B” can have any number of columns. Cells of the product matrix “C = A*B” are calculated by the following formula:
\[
\begin{equation*}
c_{i,j} = \sum_{k=1}^{p} a_{i,k}*b_{k,j}
\end{equation*}
\]
where “p = columns(A) = rows(B)”. The result matrix “C” will have the dimensions:
rows(C) = rows(A),
columns(C) = columns(B).
Acting upon the multiplication formula, when calculating “A*B” we should scan i-th row of “A” in parallel to scanning j-th column of “B“, and after summing up all the products “ai,k*bk,j” we will have the value of “ci,j“.
Another well-known fact is that matrix multiplication is not a symmetrical operation, i.e., “A*B ≠ B*A“. Without going into details, we can already see that when multiplying 2 rectangular matrices:
For newbies, the fact that matrix multiplication is not a symmetrical operation often seems strange, as multiplication defined for almost any other object is a symmetrical operation. Another fact that is often unclear is why matrix multiplication is performed by such a strange formula.
In this story, I am going to give my answers to both of these questions, and not only to them…
Derivation of the matrices multiplication formula
Multiplying “A*B” should produce such a matrix ‘C‘, that:
y = C*x = (A*B)*x = A*(B*x).
In other words, multiplying any vector ‘x‘ by the product matrix “C=A*B” should result in the same vector ‘y‘, which we will receive if at first multiplying ‘B‘ by ‘x‘, and then multiplying ‘A‘ by that intermediate result.
This already explains why in “C=A*B“, the condition that “columns(A) = rows(B)” should be kept. That is because of the length of the intermediate vector. Let’s denote it as ‘t‘:
t = B*x,
y = C*x = (A*B)*x = A*(B*x) = A*t.
Obviously, as “t = B*x“, we will receive a vector ‘t‘ of length “|t| = rows(B)”. But later, matrix ‘A‘ is going to be multiplied by ‘t‘, which requires ‘t‘ to have the length “|t| = columns(A)”. From those 2 facts, we can already figure out that:
rows(B) = |t| = columns(A), or
rows(B) = columns(A).
In the first story [1] of this series, we have learned the “X-way interpretation” of matrix-vector multiplication “A*x“. Considering that for “y = (A*B)x“, vector ‘x‘ goes at first through the transformation of matrix ‘B‘, and then it continues through the transformation of matrix ‘A‘, we can broaden the concept of “X-way interpretation” and present matrix-matrix multiplication “A*B” as 2 adjacent X-diagrams:
Now, what should a certain cell “ci,j” of matrix ‘C‘ be equal to? From part 1 – “matrix-vector multiplication” [1], we remember that the physical meaning of “ci,j” is – how much the input value ‘xj‘ affects the output value ‘yi‘. Considering the picture above, let’s see how some input value ‘xj‘ can affect some other output value ‘yi‘. It can affect through the intermediate value ‘t1‘, i.e., through arrows “ai,1” and “b1,j“. Also, the affection can take place through the intermediate value ‘t2‘, i.e., through arrows “ai,2” and “b2,j“. Generally, the affection of ‘xj‘ on ‘yi‘ can take place through any value ‘tk‘ of the intermediate vector ‘t‘, i.e., through arrows “ai,k” and “bk,j“.
So there are ‘p‘ possible ways in which the value ‘xj‘ influences ‘yi‘, where ‘p‘ is the length of the intermediate vector: “p = |t| = |B*x|”. The influences are:
\[\begin{equation*}
\begin{matrix}
a_{i,1}*b_{1,j}, \\
a_{i,2}*b_{2,j}, \\
a_{i,3}*b_{3,j}, \\
\dots \\
a_{i,p}*b_{p,j}
\end{matrix}
\end{equation*}\]
All those ‘p‘ influences are independent of each other, which is why in the formula of matrices multiplication they participate as a sum:
\[\begin{equation*}
c_{i,j} =
a_{i,1}*b_{1,j} + a_{i,2}*b_{2,j} + \dots + a_{i,p}*b_{p,j} =
\sum_{k=1}^{p} a_{i,k}*b_{k,j}
\end{equation*}\]
This is my visual explanation of the matrix-matrix multiplication formula. By the way, interpreting “A*B” as a concatenation of X-diagrams of “A” and “B” explicitly shows why the condition “columns(A) = rows(B)” should be held. That’s simple, because otherwise it will not be possible to concatenate the two X-diagrams:
Why is it that “A*B ≠ B*A”
Interpreting matrix multiplication “A*B” as a concatenation of X-diagrams of “A” and “B” also explains why multiplication is not symmetrical for matrices, i.e., why “A*B ≠ B*A“. Let me show that on two certain matrices:
\[\begin{equation*}
A =
\begin{bmatrix}
0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 \\
a_{3,1} & a_{3,2} & a_{3,3} & a_{3,4} \\
a_{4,1} & a_{4,2} & a_{4,3} & a_{4,4}
\end{bmatrix}
,\ \ B =
\begin{bmatrix}
b_{1,1} & b_{1,2} & 0 & 0 \\
b_{2,1} & b_{2,2} & 0 & 0 \\
b_{3,1} & b_{3,2} & 0 & 0 \\
b_{4,1} & b_{4,2} & 0 & 0
\end{bmatrix}
\end{equation*}\]
Here, matrix ‘A‘ has its upper half filled with zeroes, while ‘B‘ has zeroes on its right half. Corresponding X-diagrams are:
The fact that ‘A’ has zeroes on its upper rows results in the upper items of its left stack being disconnected.
The fact that ‘B’ has zeroes on its right columns results in the lower items of its right stack being disconnected.
What will happen if trying to multiply “A*B“? Then A’s X-diagram should be placed to the left of B’s X-diagram.
Having such a placement, we see that input values ‘x1‘ and ‘x2‘ can affect both output values ‘y3‘ and ‘y4‘. Particularly, this means that the product matrix “A*B” is non-zero.
\[
\begin{equation*}
A*B =
\begin{bmatrix}
0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 \\
c_{3,1} & c_{3,2} & 0 & 0 \\
c_{4,1} & c_{4,2} & 0 & 0
\end{bmatrix}
\end{equation*}
\]
Now, what will happen if we try to multiply these two matrices in the opposite order? For presenting the product “B*A“, B’s X-diagram should be drawn to the left of A’s diagram:
We see that now there is no connected path, by which any input value “xj” can affect any output value “yi“. In other words, in the product matrix “B*A” there is no affection at all, and it is actually a zero-matrix.
\[\begin{equation*}
B*A =
\begin{bmatrix}
0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0
\end{bmatrix}
\end{equation*}\]
This example clearly illustrates why order is important for matrix-matrix multiplication. Of course, many other examples can also be figured out.
Multiplying chain of matrices
X-diagrams can also be concatenated when we multiply 3 or more matrices. As an example, for the case of:
G = A*B*C,
we can draw the concatenation in the following way:
Here we now have 2 intermediate vectors:
t = C*x, and
s = (B*C)*x = B*(C*x) = B*t
while the result vector is:
y = (A*B*C)*x = A*(B*(C*x)) = A*(B*t) = A*s.
The number of possible ways in which some input value “xj” can affect some output value “yi” grows here by an order of magnitude.
More precisely, the influence of certain “xj” over “yi” can come through any item of the first intermediate stack “t“, and any item of the second intermediate stack “s“. So the number of ways of influence becomes “|t|*|s|”, and the formula for “gi,j” becomes:
\[\begin{equation*}
g_{i,j} = \sum_{v=1}^{|s|} \sum_{u=1}^{|t|} a_{i,v}*b_{v,u}*c_{u,j}
\end{equation*}\]
Multiplying matrices of special types
We can already visually interpret matrix-matrix multiplication. In the first story of this series [1], we also learned about several special types of matrices – the scale matrix, shift matrix, permutation matrix, and others. So let’s take a look at how multiplication works for those types of matrices.
Multiplication of scale matrices
A scale matrix has non-zero values only on its diagonal:
From theory, we know that multiplying two scale matrices results in another scale matrix. Why is it that way? Let’s concatenate X-diagrams of two scale matrices:
The concatenation X-diagram clearly shows that any input item “xi” can still affect only the corresponding output item “yi“. It has no way of influencing any other output item. Therefore, the result structure behaves the same way as some other scale matrix.
Multiplication of shift matrices
A shift matrix is one that, when multiplied over some input vector ‘x‘, shifts upwards or downwards values of ‘x‘ by some ‘k‘ positions, filling the emptied slots with zeroes. To achieve that, a shift matrix ‘V‘ must have 1(s) on a line parallel to its main diagonal, and 0(s) at all other cells.
The theory says that multiplying 2 shift matrices ‘V1‘ and ‘V2‘ results in another shift matrix. Interpretation with X-diagrams gives a clear explanation of that. Multiplying the shift matrices ‘V1‘ and ‘V2‘ corresponds to concatenating their X-diagrams:
We see that if shift matrix ‘V1‘ shifts values of its input vector by ‘k1‘ positions upwards, and shift matrix ‘V2‘ shifts values of the input vector by ‘k2‘ positions upwards, then the results matrix “V3 = V1*V2” will shift values of the input vector by ‘k1+k2‘ positions upwards, which means that “V3” is also a shift matrix.
Multiplication of permutation matrices
A permutation matrix is one that, when multiplied by an input vector ‘x‘, rearranges the order of values in ‘x‘. To act like that, the NxN-sized permutation matrix ‘P‘ must satisfy the following criteria:
- it should have N 1(s),
- no two 1(s) should be on the same row or the same column,
- all remaining cells should be 0(s).
Upon theory, multiplying 2 permutation matrices ‘P1‘ and ‘P2‘ results in another permutation matrix ‘P3‘. While the reason for this might not be clear enough if looking at matrix multiplication in the ordinary way (as scanning rows of ‘P1‘ and columns of ‘P2‘), it becomes much clearer if looking at it through the interpretation of X-diagrams. Multiplying “P1*P2” is the same as concatenating X-diagrams of ‘P1‘ and ‘P2‘.
We see that every input value ‘xj‘ of the right stack still has only one path for reaching some other position ‘yi‘ on the left stack. So “P1*P2” still acts as a rearrangement of all values of the input vector ‘x‘, in other words, “P1*P2” is also a permutation matrix.
Multiplication of triangular matrices
A triangular matrix has all zeroes either above or below its main diagonal. Here, let’s concentrate on upper-triangular matrices, where zeroes are below the main diagonal. The case of lower-triangular matrices is similar.
The fact that non-zero values of ‘B‘ are either on its main diagonal or above, makes all the arrows of its X-diagram either horizontal or directed upwards. This, in turn, means that any input value ‘xj‘ of the right stack can affect only those output values ‘yi‘ of the left stack, which have a lesser or equal index (i.e., “i ≤ j“). That is one of the properties of an upper-triangular matrix.
According to theory, multiplying two upper-triangular matrices results in another upper-triangular matrix. And here too, interpretation with X-diagrams provides a clear explanation of that fact. Multiplying two upper-triangular matrices ‘A‘ and ‘B‘ is the same as concatenating their X-diagrams:
We see that putting two X-diagrams of triangular matrices ‘A‘ and ‘B‘ near each other results in such a diagram, where every input value ‘xj‘ of the right stack still can affect only those output values ‘yi‘ of the left stack, which are either on its level or above it (in other words, “i ≤ j“). This means that the product “A*B” also behaves like an upper-triangular matrix; thus, it must have zeroes below its main diagonal.
Conclusion
In the current 2nd story of this series, we observed how matrix-matrix multiplication can be presented visually, with the help of so-called “X-diagrams”. We have learned that doing multiplication “C = A*B” is the same as concatenating X-diagrams of those two matrices. This method clearly illustrates various properties of matrix multiplications, like why it is not a symmetrical operation (“A*B ≠ B*A“), as well as explains the formula:
\[\begin{equation*}
c_{i,j} = \sum_{k=1}^{p} a_{i,k}*b_{k,j}
\end{equation*}\]
We have also observed why multiplication behaves in certain ways when operands are matrices of special types (scale, shift, permutation, and triangular matrices).
I hope you enjoyed reading this story!
In the coming story, we will address how matrix transposition “AT” can be interpreted with X-diagrams, and what we can gain from such interpretation, so subscribe to my page to not miss the updates!
My gratitude to:
– Roza Galstyan, for careful review of the draft ( https://www.linkedin.com/in/roza-galstyan-a54a8b352 )
– Asya Papyan, for the precise design of all the used illustrations ( https://www.behance.net/asyapapyan ).If you enjoyed reading this story, feel free to follow me on LinkedIn, where, among other things, I will also post updates ( https://www.linkedin.com/in/tigran-hayrapetyan-cs/ ).
All used images, unless otherwise noted, are designed by request of the author.
References
[1] – Understanding matrices | Part 1: matrix-vector multiplication : https://towardsdatascience.com/understanding-matrices-part-1-matrix-vector-multiplication/
Source link
#Understanding #Matrices #Part #MatrixMatrix #Multiplication