Understanding Matrices | Part 2: Matrix-Matrix Multiplication

In the first story [1] of this series, we have:

Addressed multiplication of a matrix by a vector,
Introduced the concept of X-diagram for a given matrix,
Observed behavior of several special matrices, when being multiplied by a vector.

In the current 2nd story, we will grasp the physical meaning of matrix-matrix multiplication, understand why multiplication is not a symmetrical operation (i.e., why “A*B ≠ B*A“), and finally, we will see how several special matrices behave when being multiplied over each other.

So let’s start, and we’ll do it by recalling the definitions that I use throughout this series:

Matrices are denoted with uppercase (like ‘A‘ and ‘B‘), while vectors and scalars are denoted with lowercase (like ‘x‘, ‘y‘ or ‘m‘, ‘n‘).
|x| – is the length of vector ‘x‘,
rows(A) – number of rows of matrix ‘A‘,
columns(A) – number of columns of matrix ‘A‘.

The concept of multiplying matrices

Multiplication of 2 matrices “A” and “B” is probably the most common operation in matrix analysis. A known fact is that “A” and “B” can be multiplied only if “columns(A) = rows(B)”. At the same time, “A” can have any number of rows, and “B” can have any number of columns. Cells of the product matrix “C = A*B” are calculated by the following formula:

\[
\begin{equation*}
c_{i,j} = \sum_{k=1}^{p} a_{i,k}*b_{k,j}
\end{equation*}
\]

where “p = columns(A) = rows(B)”. The result matrix “C” will have the dimensions:

rows(C) = rows(A),
columns(C) = columns(B).

Acting upon the multiplication formula, when calculating “A*B” we should scan i-th row of “A” in parallel to scanning j-th column of “B“, and after summing up all the products “a_i,k*b_k,j” we will have the value of “c_i,j“.

The row and the column that should be scanned, to calculate cell “c_i,j” of the product matrix “C = A*B”. Here we scan the 3rd row of “A” and the 2nd column of “B”, by which we obtain the value “c_3,2“.

Another well-known fact is that matrix multiplication is not a symmetrical operation, i.e., “A*B ≠ B*A“. Without going into details, we can already see that when multiplying 2 rectangular matrices:

Two matrices “A” and “B”, with sizes 2×4 and 4×2, respectively. Multiplying “A*B” will result in a 2×2-sized matrix, while multiplying “B*A” will result in a 4×4-sized matrix. The highlighted areas show directions of scans – red areas for calculating one cell of “A*B”, and green areas for calculating a cell of “B*A”.

For newbies, the fact that matrix multiplication is not a symmetrical operation often seems strange, as multiplication defined for almost any other object is a symmetrical operation. Another fact that is often unclear is why matrix multiplication is performed by such a strange formula.

In this story, I am going to give my answers to both of these questions, and not only to them…

Derivation of the matrices multiplication formula

Multiplying “A*B” should produce such a matrix ‘C‘, that:

y = C*x = (A*B)*x = A*(B*x).

In other words, multiplying any vector ‘x‘ by the product matrix “C=A*B” should result in the same vector ‘y‘, which we will receive if at first multiplying ‘B‘ by ‘x‘, and then multiplying ‘A‘ by that intermediate result.

This already explains why in “C=A*B“, the condition that “columns(A) = rows(B)” should be kept. That is because of the length of the intermediate vector. Let’s denote it as ‘t‘:

t = B*x,
y = C*x = (A*B)*x = A*(B*x) = A*t.

Obviously, as “t = B*x“, we will receive a vector ‘t‘ of length “|t| = rows(B)”. But later, matrix ‘A‘ is going to be multiplied by ‘t‘, which requires ‘t‘ to have the length “|t| = columns(A)”. From those 2 facts, we can already figure out that:

rows(B) = |t| = columns(A), or
rows(B) = columns(A).

In the first story [1] of this series, we have learned the “X-way interpretation” of matrix-vector multiplication “A*x“. Considering that for “y = (A*B)x“, vector ‘x‘ goes at first through the transformation of matrix ‘B‘, and then it continues through the transformation of matrix ‘A‘, we can broaden the concept of “X-way interpretation” and present matrix-matrix multiplication “A*B” as 2 adjacent X-diagrams:

The transformation of vector ‘x’ (the right stack), passing through the product matrix “C=A*B”, from right to left. At first, it passes through matrix ‘B’, and an intermediate vector ‘t’ is produced (the middle stack). Then ‘t’ passes through the transformation of ‘A’ and the final vector ‘y’ is produced (the left stack).

Now, what should a certain cell “c_i,j” of matrix ‘C‘ be equal to? From part 1 – “matrix-vector multiplication” [1], we remember that the physical meaning of “c_i,j” is – how much the input value ‘x_j‘ affects the output value ‘y_i‘. Considering the picture above, let’s see how some input value ‘x_j‘ can affect some other output value ‘y_i‘. It can affect through the intermediate value ‘t₁‘, i.e., through arrows “a_i,1” and “b_1,j“. Also, the affection can take place through the intermediate value ‘t₂‘, i.e., through arrows “a_i,2” and “b_2,j“. Generally, the affection of ‘x_j‘ on ‘y_i‘ can take place through any value ‘t_k‘ of the intermediate vector ‘t‘, i.e., through arrows “a_i,k” and “b_k,j“.

Illustration of all possible ways in which the input value ‘x₂‘ can influence the output value ‘y₃‘. The influence can go through intermediate value ‘t₁‘ (as “a_3,1*b_1,2“), as well as through intermediate value ‘t₂‘ (as “a_3,2*b_2,2“), or any other k-th value of the intermediate vector ‘t’ (as “a_3,k*b_k,2“). All 4 possible ways are highlighted here in red.

So there are ‘p‘ possible ways in which the value ‘x_j‘ influences ‘y_i‘, where ‘p‘ is the length of the intermediate vector: “p = |t| = |B*x|”. The influences are:

\[\begin{equation*}
\begin{matrix}
a_{i,1}*b_{1,j}, \\
a_{i,2}*b_{2,j}, \\
a_{i,3}*b_{3,j}, \\
\dots \\
a_{i,p}*b_{p,j}
\end{matrix}
\end{equation*}\]

All those ‘p‘ influences are independent of each other, which is why in the formula of matrices multiplication they participate as a sum:

\[\begin{equation*}
c_{i,j} =
a_{i,1}*b_{1,j} + a_{i,2}*b_{2,j} + \dots + a_{i,p}*b_{p,j} =
\sum_{k=1}^{p} a_{i,k}*b_{k,j}
\end{equation*}\]

This is my visual explanation of the matrix-matrix multiplication formula. By the way, interpreting “A*B” as a concatenation of X-diagrams of “A” and “B” explicitly shows why the condition “columns(A) = rows(B)” should be held. That’s simple, because otherwise it will not be possible to concatenate the two X-diagrams:

*Trying to multiply such two matrices “C” and “D”, where “columns(C) ≠ rows(D)”. Their X-diagrams will just not match each other, and can’t be concatenated.*

Why is it that “AB ≠ BA”

Interpreting matrix multiplication “A*B” as a concatenation of X-diagrams of “A” and “B” also explains why multiplication is not symmetrical for matrices, i.e., why “A*B ≠ B*A“. Let me show that on two certain matrices:

\[\begin{equation*}
A =
\begin{bmatrix}
0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 \\
a_{3,1} & a_{3,2} & a_{3,3} & a_{3,4} \\
a_{4,1} & a_{4,2} & a_{4,3} & a_{4,4}
\end{bmatrix}
,\ \ B =
\begin{bmatrix}
b_{1,1} & b_{1,2} & 0 & 0 \\
b_{2,1} & b_{2,2} & 0 & 0 \\
b_{3,1} & b_{3,2} & 0 & 0 \\
b_{4,1} & b_{4,2} & 0 & 0
\end{bmatrix}
\end{equation*}\]

Here, matrix ‘A‘ has its upper half filled with zeroes, while ‘B‘ has zeroes on its right half. Corresponding X-diagrams are:

The X-diagrams which correspond to the matrices “A” and “B” mentioned above. Note, for the zero-cells, we just don’t draw corresponding arrows.
The fact that ‘A’ has zeroes on its upper rows results in the upper items of its left stack being disconnected.
The fact that ‘B’ has zeroes on its right columns results in the lower items of its right stack being disconnected.

What will happen if trying to multiply “A*B“? Then A’s X-diagram should be placed to the left of B’s X-diagram.

*Concatenation of X-diagrams of “A” and “B”, corresponding to “A*B”. There are 4 pairs of left and right items, which actually can influence each other. An example pair (y*₃*, x*₁*) is highlighted.*

Having such a placement, we see that input values ‘x₁‘ and ‘x₂‘ can affect both output values ‘y₃‘ and ‘y₄‘. Particularly, this means that the product matrix “A*B” is non-zero.

\[
\begin{equation*}
A*B =
\begin{bmatrix}
0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 \\
c_{3,1} & c_{3,2} & 0 & 0 \\
c_{4,1} & c_{4,2} & 0 & 0
\end{bmatrix}
\end{equation*}
\]

Now, what will happen if we try to multiply these two matrices in the opposite order? For presenting the product “B*A“, B’s X-diagram should be drawn to the left of A’s diagram:

Concatenation of X-diagrams of “B” and “A”, which corresponds to the product “B*A”. This results in two disjoint parts, so there is no way in which any item ‘x_j‘ of the right stack can influence any item ‘y_i‘ of the left stack.

We see that now there is no connected path, by which any input value “x_j” can affect any output value “y_i“. In other words, in the product matrix “B*A” there is no affection at all, and it is actually a zero-matrix.

\[\begin{equation*}
B*A =
\begin{bmatrix}
0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0
\end{bmatrix}
\end{equation*}\]

This example clearly illustrates why order is important for matrix-matrix multiplication. Of course, many other examples can also be figured out.

Multiplying chain of matrices

X-diagrams can also be concatenated when we multiply 3 or more matrices. As an example, for the case of:

G = A*B*C,

we can draw the concatenation in the following way:

Concatenation of 3 X-diagrams, corresponding to matrices “A”, “B”, and “C”. Sizes of the matrices are 4×3, 3×2, and 2×4, respectively. The 2 intermediate vectors ‘t’ and ‘s’ are presented with light green and teal items.

Here we now have 2 intermediate vectors:

t = C*x, and
s = (B*C)*x = B*(C*x) = B*t

while the result vector is:

y = (A*B*C)*x = A*(B*(C*x)) = A*(B*t) = A*s.

The number of possible ways in which some input value “x_j” can affect some output value “y_i” grows here by an order of magnitude.

*Two of six possible ways, highlighted with red and light blue, by which input value “x₁” can influence output value “y₃“.*

More precisely, the influence of certain “x_j” over “y_i” can come through any item of the first intermediate stack “t“, and any item of the second intermediate stack “s“. So the number of ways of influence becomes “|t|*|s|”, and the formula for “g_i,j” becomes:

\[\begin{equation*}
g_{i,j} = \sum_{v=1}^{|s|} \sum_{u=1}^{|t|} a_{i,v}*b_{v,u}*c_{u,j}
\end{equation*}\]

Multiplying matrices of special types

We can already visually interpret matrix-matrix multiplication. In the first story of this series [1], we also learned about several special types of matrices – the scale matrix, shift matrix, permutation matrix, and others. So let’s take a look at how multiplication works for those types of matrices.

Multiplication of scale matrices

A scale matrix has non-zero values only on its diagonal:

*The X-diagram of a 4×4 scale matrix. Every input item “x_i” can affect only the corresponding output item “y_i“.*

From theory, we know that multiplying two scale matrices results in another scale matrix. Why is it that way? Let’s concatenate X-diagrams of two scale matrices:

*Multiplication of two scale matrices “Q” and “S”, as a concatenation of their X-diagrams.*

The concatenation X-diagram clearly shows that any input item “x_i” can still affect only the corresponding output item “y_i“. It has no way of influencing any other output item. Therefore, the result structure behaves the same way as some other scale matrix.

Multiplication of shift matrices

A shift matrix is one that, when multiplied over some input vector ‘x‘, shifts upwards or downwards values of ‘x‘ by some ‘k‘ positions, filling the emptied slots with zeroes. To achieve that, a shift matrix ‘V‘ must have 1(s) on a line parallel to its main diagonal, and 0(s) at all other cells.

*Example of a shift matrix ‘V’ and its X-diagram. The matrix shifts upwards all values of the input vector ‘x’ by 2 positions.*

The theory says that multiplying 2 shift matrices ‘V1‘ and ‘V2‘ results in another shift matrix. Interpretation with X-diagrams gives a clear explanation of that. Multiplying the shift matrices ‘V1‘ and ‘V2‘ corresponds to concatenating their X-diagrams:

The concatenation of X-diagrams of 2 shift matrices ‘V1’ and ‘V2’ behaves like another shift matrix, as every value of the input vector ‘x’ is still being shifted by a certain number of positions upwards.

We see that if shift matrix ‘V1‘ shifts values of its input vector by ‘k1‘ positions upwards, and shift matrix ‘V2‘ shifts values of the input vector by ‘k2‘ positions upwards, then the results matrix “V3 = V1*V2” will shift values of the input vector by ‘k1+k2‘ positions upwards, which means that “V3” is also a shift matrix.

Multiplication of permutation matrices

A permutation matrix is one that, when multiplied by an input vector ‘x‘, rearranges the order of values in ‘x‘. To act like that, the NxN-sized permutation matrix ‘P‘ must satisfy the following criteria:

it should have N 1(s),
no two 1(s) should be on the same row or the same column,
all remaining cells should be 0(s).

*An example of a 5×5-sized permutation matrix ‘P’, and corresponding X-diagram. We see that values of input vector “(x1, x2, x3, x4, x5)” are being rearranged as “(x4, x1, x5, x3, x2)”.*

Upon theory, multiplying 2 permutation matrices ‘P1‘ and ‘P2‘ results in another permutation matrix ‘P3‘. While the reason for this might not be clear enough if looking at matrix multiplication in the ordinary way (as scanning rows of ‘P1‘ and columns of ‘P2‘), it becomes much clearer if looking at it through the interpretation of X-diagrams. Multiplying “P1*P2” is the same as concatenating X-diagrams of ‘P1‘ and ‘P2‘.

*The concatenation of X-diagrams of permutation matrices ‘P1’ and ‘P2’ behaves as another rearrangement of values.*

We see that every input value ‘x_j‘ of the right stack still has only one path for reaching some other position ‘y_i‘ on the left stack. So “P1*P2” still acts as a rearrangement of all values of the input vector ‘x‘, in other words, “P1*P2” is also a permutation matrix.

Multiplication of triangular matrices

A triangular matrix has all zeroes either above or below its main diagonal. Here, let’s concentrate on upper-triangular matrices, where zeroes are below the main diagonal. The case of lower-triangular matrices is similar.

*Example of an upper-triangular matrix ‘B’ and its X-diagram.*

The fact that non-zero values of ‘B‘ are either on its main diagonal or above, makes all the arrows of its X-diagram either horizontal or directed upwards. This, in turn, means that any input value ‘x_j‘ of the right stack can affect only those output values ‘y_i‘ of the left stack, which have a lesser or equal index (i.e., “i ≤ j“). That is one of the properties of an upper-triangular matrix.

According to theory, multiplying two upper-triangular matrices results in another upper-triangular matrix. And here too, interpretation with X-diagrams provides a clear explanation of that fact. Multiplying two upper-triangular matrices ‘A‘ and ‘B‘ is the same as concatenating their X-diagrams:

*Concatenation of X-diagrams of two upper-triangular matrices ‘A’ and ‘B’.*

We see that putting two X-diagrams of triangular matrices ‘A‘ and ‘B‘ near each other results in such a diagram, where every input value ‘x_j‘ of the right stack still can affect only those output values ‘y_i‘ of the left stack, which are either on its level or above it (in other words, “i ≤ j“). This means that the product “A*B” also behaves like an upper-triangular matrix; thus, it must have zeroes below its main diagonal.

Conclusion

In the current 2nd story of this series, we observed how matrix-matrix multiplication can be presented visually, with the help of so-called “X-diagrams”. We have learned that doing multiplication “C = A*B” is the same as concatenating X-diagrams of those two matrices. This method clearly illustrates various properties of matrix multiplications, like why it is not a symmetrical operation (“A*B ≠ B*A“), as well as explains the formula:

\[\begin{equation*}
c_{i,j} = \sum_{k=1}^{p} a_{i,k}*b_{k,j}
\end{equation*}\]

We have also observed why multiplication behaves in certain ways when operands are matrices of special types (scale, shift, permutation, and triangular matrices).

I hope you enjoyed reading this story!

In the coming story, we will address how matrix transposition “A^T” can be interpreted with X-diagrams, and what we can gain from such interpretation, so subscribe to my page to not miss the updates!

My gratitude to:
– Roza Galstyan, for careful review of the draft ( https://www.linkedin.com/in/roza-galstyan-a54a8b352 )
– Asya Papyan, for the precise design of all the used illustrations ( https://www.behance.net/asyapapyan ).

If you enjoyed reading this story, feel free to follow me on LinkedIn, where, among other things, I will also post updates ( https://www.linkedin.com/in/tigran-hayrapetyan-cs/ ).

All used images, unless otherwise noted, are designed by request of the author.

References

[1] – Understanding matrices | Part 1: matrix-vector multiplication : https://towardsdatascience.com/understanding-matrices-part-1-matrix-vector-multiplication/

Source link

#Understanding #Matrices #Part #MatrixMatrix #Multiplication

Understanding Matrices | Part 2: Matrix-Matrix Multiplication

The concept of multiplying matrices

Derivation of the matrices multiplication formula

Why is it that “AB ≠ BA”

Multiplying chain of matrices