...

How to Interpret Matrix Expressions — Transformations | by Jaroslaw Drapala | Dec, 2024

[ad_1]

Let’s return to the matrix

and apply the transformation to a few sample points.

The effects of transformation B on various input vectors

Notice the following:

  • point x₁​ has been rotated counterclockwise and brought closer to the origin,
  • point x₂​, on the other hand, has been rotated clockwise and pushed away from the origin,
  • point x₃​ has only been scaled down, meaning it’s moved closer to the origin while keeping its direction,
  • point x₄ has undergone a similar transformation, but has been scaled up.

The transformation compresses in the x⁽¹⁾-direction and stretches in the x⁽²⁾-direction. You can think of the grid lines as behaving like an accordion.

Directions such as those represented by the vectors x₃ and x₄ play an important role in machine learning, but that’s a story for another time.

For now, we can call them eigen-directions, because vectors along these directions might only be scaled by the transformation, without being rotated. Every transformation, except for rotations, has its own set of eigen-directions.

Recall that the transformation matrix is constructed by stacking the transformed basis vectors in columns. Perhaps you’d like to see what happens if we swap the rows and columns afterwards (the transposition).

Let us take, for example, the matrix

where Aᵀ stands for the transposed matrix.

From a geometric perspective, the coordinates of the first new basis vector come from the first coordinates of all the old basis vectors, the second from the second coordinates, and so on.

In NumPy, it’s as simple as that:

import numpy as np

A = np.array([
[1, -1],
[1 , 1]
])

print(f'A transposed:\n{A.T}')

A transposed:
[[ 1 1]
[-1 1]]

I must disappoint you now, as I cannot provide a simple rule that expresses the relationship between the transformations A and Aᵀ in just a few words.

Instead, let me show you a property shared by both the original and transposed transformations, which will come in handy later.

Here is the geometric interpretation of the transformation represented by the matrix A. The area shaded in gray is called the parallelogram.

Parallelogram spanned by the basis vectors transformed by matrix A

Compare this with the transformation obtained by applying the matrix Aᵀ:

Parallelogram spanned by the basis vectors transformed by matrix A

Now, let us consider another transformation that applies entirely different scales to the unit vectors:

The parallelogram associated with the matrix B is much narrower now:

Parallelogram spanned by the basis vectors transformed by matrix B

but it turns out that it is the same size as that for the matrix Bᵀ:

Parallelogram spanned by the basis vectors transformed by matrix B

Let me put it this way: you have a set of numbers to assign to the components of your vectors. If you assign a larger number to one component, you’ll need to use smaller numbers for the others. In other words, the total length of the vectors that make up the parallelogram stays the same. I know this reasoning is a bit vague, so if you’re looking for more rigorous proofs, check the literature in the references section.

And here’s the kicker at the end of this section: the area of the parallelograms can be found by calculating the determinant of the matrix. What’s more, the determinant of the matrix and its transpose are identical.

More on the determinant in the upcoming sections.

You can apply a sequence of transformations — for example, start by applying A to the vector x, and then pass the result through B. This can be done by first multiplying the vector x by the matrix A, and then multiplying the result by the matrix B:

You can multiply the matrices B and A to obtain the matrix C for further use:

This is the effect of the transformation represented by the matrix C:

Transformation described by the composite matrix BA

You can perform the transformations in reverse order: first apply B, then apply A:

Let D represent the sequence of multiplications performed in this order:

And this is how it affects the grid lines:

Transformation described by the composite matrix AB

So, you can see for yourself that the order of matrix multiplication matters.

There’s a cool property with the transpose of a composite transformation. Check out what happens when we multiply A by B:

and then transpose the result, which means we’ll apply (AB)ᵀ:

You can easily extend this observation to the following rule:

To finish off this section, consider the inverse problem: is it possible to recover matrices A and B given only C = AB?

This is matrix factorization, which, as you might expect, doesn’t have a unique solution. Matrix factorization is a powerful technique that can provide insight into transformations, as they may be expressed as a composition of simpler, elementary transformations. But that’s a topic for another time.

You can easily construct a matrix representing a do-nothing transformation that leaves the standard basis vectors unchanged:

It is commonly referred to as the identity matrix.

Take a matrix A and consider the transformation that undoes its effects. The matrix representing this transformation is A⁻¹. Specifically, when applied after or before A, it yields the identity matrix I:

There are many resources that explain how to calculate the inverse by hand. I recommend learning Gauss-Jordan method because it involves simple row manipulations on the augmented matrix. At each step, you can swap two rows, rescale any row, or add to a selected row a weighted sum of the remaining rows.

Take the following matrix as an example for hand calculations:

You should get the inverse matrix:

Verify by hand that equation (4) holds. You can also do this in NumPy.

import numpy as np

A = np.array([
[1, -1],
[1 , 1]
])

print(f'Inverse of A:\n{np.linalg.inv(A)}')

Inverse of A:
[[ 0.5 0.5]
[-0.5 0.5]]

Take a look at how the two transformations differ in the illustrations below.

Transformation A
Transformation A⁻¹

At first glance, it’s not obvious that one transformation reverses the effects of the other.

However, in these plots, you might notice a fascinating and far-reaching connection between the transformation and its inverse.

Take a close look at the first illustration, which shows the effect of transformation A on the basis vectors. The original unit vectors are depicted semi-transparently, while their transformed counterparts, resulting from multiplication by matrix A, are drawn clearly and solidly. Now, imagine that these newly drawn vectors are the basis vectors you use to describe the space, and you perceive the original space from their perspective. Then, the original basis vectors will appear smaller and, secondly, will be oriented towards the east. And this is exactly what the second illustration shows, demonstrating the effect of the transformation A⁻¹.

This is a preview of an upcoming topic I’ll cover in the next article about using matrices to represent different perspectives on data.

All of this sounds great, but there’s a catch: some transformations can’t be reversed.

The workhorse of the next experiment will be the matrix with 1s on the diagonal and b on the antidiagonal:

where b is a fraction in the interval (0, 1). This matrix is, by definition, symmetrical, as it happens to be identical to its own transpose: A=Aᵀ, but I’m just mentioning this by the way; it’s not particularly relevant here.

Invert this matrix using the Gauss-Jordan method, and you will get the following:

You can easily find online the rules for calculating the determinant of 2×2 matrices, which will give

This is no coincidence. In general, it holds that

Notice that when b = 0, the two matrices are identical. This is no surprise, as A reduces to the identity matrix I.

Things get tricky when b = 1, as the det(A) = 0 and det(A⁻¹) becomes infinite. As a result, A⁻¹ does not exist for a matrix A consisting entirely of 1s. In algebra classes, teachers often warn you about a zero determinant. However, when we consider where the matrix comes from, it becomes apparent that an infinite determinant can also occur, resulting in a fatal error. Anyway,

a zero determinant means the transformation is non-ivertible.

Source link

#Interpret #Matrix #ExpressionsTransformations #Jaroslaw #Drapala #Dec

[ad_2]