In linear algebra, a linear transformation such as stretching, rotation, reflection etc. can be represented by matrices [1]. For example, if $v$ is vector in the XY plane and is represented as a column vector, then a stretching along the X dimension can be represented as
where,
is called the transformation matrix. Note that $A$ need not be a square matrix, in which case the transformation can increase or decrease the dimensionality of $v$.
There are many cases in which a square transformation matrix, $A$, is applied repeatedly to a vector, $v$, e.g. during back-propagation in recurrent neural networks. This results in equations of the form
where, $A^n$ is the matrix $A$ multiplied $n$ times i.e. a matrix power. One way to solve for $Y$ is to carry out the above matrix multiplication. However, sometimes the value of $n$ can be large and the above computation can become expensive and so we are interested in reducing the amount of computation.
Now, imagine the matrix $A$ can be factorized as $A = PDP^{-1}$, where $P$ is a square matrix, $P^{-1}$ is its inverse and $D$ is a diagonal matrix. Then,
Note that $P^{-1}P = I$. Also, since, D is a diagonal matrix, $D^n$ is simply the diagonal elements raised to the power $n$. Overall, $A^n$ has been simplified to calculating the power of the diagonal elements of D and two matrix multiplications. But of course, we need to find matrices $P$ and $D$.
Consider a square (2x2) transformation matrix $A$. Let, $v_1 = \mattwo{x_1}{y_1}$ and $v_2 = \mattwo{x_2}{y_2}$ be two vectors and $\lambda_1$ and $\lambda_2$ be two scalars such that
Note that the last equation is in the form, $A = PDP^{-1}$, where $P$ is a square matrix and $D$ is a diagonal matrix. In the above example, vectors $v_1, v_2$ and scalars $\lambda_1, \lambda_2$ are called the eigen vectors and eigen values of the square matrix $A$. In the next section, we will determine the values of $v_1, v_2, \lambda_1, \lambda_2$.
Given a square transformation matrix $A$, we are interested in finding vectors, $v$ and scalars, $\lambda$ such that
i.e. a linear transformation as defined by $A$, produces a scaled version of the vector $v$. Then, $\lambda$ is an eigen value and $v$ is an eigen vector of the square transformation matrix $A$. The above equation can be rewritten as,
\begin{align} (A-\lambda I)v &= 0 \end{align}
where, $I$ is the identity matrix. The above equation has a non-zero solution $v$ if the determinant of the matrix ($A-\lambda I$) is zero, i.e.
$$ |A - \lambda I| = 0 $$
Let,
Solving for
we get, $\lambda = 1$ and $\lambda = 3$ which are the two eigen values of $A$. Using $\lambda = 1$, we get $v = \mattwo{1}{-1}$, and using $\lambda = 3$, we get $v = \mattwo{1}{1}$. Therefore,
This is called an eigen decomposition or eigen factorization of a square matrix and can be used to calculate matrix powers. Note that eigen decomposition is one of the many matrix decomposition techniques [2].