Basics of Linear Algebra

Weikai Chen, 2021/03/11

This is a lecture note for Marxian Economic Thoery, a course at Renmin University of China. This note is mainly for senior or graduate students in econ major, so I assume that students have taken a course in linear algebra before.

The purpose of this note is to review the basic concepts and methods in linear algebra and prepare the students for the Perron-Frobenius Theorems about positive and nonnegative matrices. Nonnegative matrices arise in many areas such as economics, population models, graph theory, Markov chains and so on. The Perron-Frobenius theory is one of the most powerful tools on nonnegative matrices and the workhorse in mathematical Marxian economics. Given its importance and the fact that it is new to most students, I will discuss P-F theorems in a separate note.

This Note is written in Pluto Notebook, a reactive notebook for Julia.

Linear algebra studies the linear transformation on vector spaces, which can be represented by matrix. We will focus on the $n$ -dimensional Euclidean space $R^{n}$ , though most results discussed in this note can be easily generalized.

2.5 ms

Vectors in $R^{n}$

A vector in $R^{n}$ is a list with $n$ real number. For example, $x = (1, 2, 3)$ is a vector in $R^{3}$ . Let $x = (x_{1}, \dots, x_{n}), y = (y_{1}, \dots, y_{n}) \in R^{n}$ and $k \in R$ , define vector addition and scalar muliplication by the following:

$x + y = (x_{1} + y_{1}, \dots, x_{n} + y_{n})$

$k x = (k x_{1}, \dots, k x_{n})$

10.2 μs

15.4 s

Float641

1.5

2.0

xxxxxxxxxx
 
x = [1.5, 2.0]

1.4 μs

Float641

0.5

2.0

24.5 ms

Float641

2.0

4.0

xxxxxxxxxx
 
z = x + y

2.1 μs

Float641

3.0

4.0

xxxxxxxxxx
 
w = 2*x

1.9 μs

Now let's plot those vectors.

4.7 μs

29.2 s

Linear Combinations

Given a set of vectors ${a^{1}, \dots, a^{k}}$ in $R^{n}$ , the new vectors we can create by performing linear operations are called linear combinations of ${a^{1}, \dots, a^{k}}$ .

That is, $b \in R^{n}$ is a linear combination of ${a^{1}, \dots, a^{k}}$ if

$b = x_{1} a^{1} + \dots + x_{k} a^{k} for some scalars x_{1}, \dots, x_{k}$

In this context, the values $x_{1}, \dots, x_{k}$ are called the coefficients of the linear combination.

The set of these linear combinations $V$ is called the span of ${a^{1}, \dots, a^{k}}$ , denoted by $V = s p a n {a^{1}, \dots, a^{k}}$

Note that the cofficients of a linear combinbation $b \in s p a n {a^{1}, \dots, a^{k}}$ may not be unique. If for any $b \in s p a n {a^{1}, \dots, a^{k}}$ , the coefficients are unique, then the set of vectors ${a^{1}, \dots, a^{k}}$ are said to be independent.

A set of vectors ${a^{1}, \dots, a^{k}}$ is dependent if they are not independent. Then one of them can be expressed as the linear combination of the rest.

A set of vectors ${a^{1}, \dots, a^{k}}$ is a basis for the span $V$ if it independent.

It can be shown that

If $s p a n {a^{1}, \dots, a^{k}} = R^{n}$ , then $k \geq n$ .
If ${a^{1}, \dots, a^{k}}$ is independent, then $k \leq n$ .

Therefore, ${a^{1}, \dots, a^{k}}$ is a basis for $R^{n}$ if and only if it is linearly independent and $k = n$ .

Below is an example of basis for $R^{3}$ , called the standard basis:

$e_{1} = [\begin{matrix} 1 \\ 0 \\ 0 \end{matrix}], e_{2} = [\begin{matrix} 0 \\ 1 \\ 0 \end{matrix}], e_{3} = [\begin{matrix} 0 \\ 0 \\ 1 \end{matrix}]$

For any $x = (x_{1}, x_{2}, x_{3}) \in R^{3}$ , we can write

$x = x_{1} e_{1} + x_{2} e_{2} + x_{3} e_{3}$

44.2 μs

Inner Product and Norm

The inner product of vectors $x, y \in R^{n}$ is defined as

$x \cdot y = \sum_{i = 1}^{n} x_{i} y_{i}$

The inner product is also denoted by $x^{'} y$ .

Two vectors are called orthogonal if their inner product is zero.

The norm of a vector $x$ represents its “length” (i.e., its distance from the zero vector) and is defined as

$∥ x ∥ = \sqrt{x \cdot x} = {(\sum_{i = 1}^{n} x_{i}^{2})}^{1 / 2}$

The expression $∥ x - y ∥$ is thought of as the distance between $x$ and $y$ .

15.3 μs

4.75

xxxxxxxxxx
 
dot(x,y) # the inner product of x and y

2.4 μs

4.75

xxxxxxxxxx
 
x'*y  # give the same result

1.8 μs

2.5

xxxxxxxxxx
 
norm(x)  # the norm of vector x

679 ns

2.5

xxxxxxxxxx
 
sqrt(x'*x) # give the same result

1.7 μs

1.0

xxxxxxxxxx
 
norm(x-y) # the distance between x and y

2.4 μs

Matrices

A Matrice is a rectangular array of numbers.

$A = [\begin{array}{cccc} a_{11} & a_{12} & \dots & a_{1 n} \\ a_{21} & a_{22} & \dots & a_{2 n} \\ ⋮ & ⋮ & ⋮ \\ a_{m 1} & a_{m 2} & \dots & a_{m n} \end{array}]$

is called an $m \times n$ matrix. If $m = n$ , then $A$ is called square.

The matrix formed by replacing $a_{i j}$ by $a_{j i}$ for every $i$ and $j$ is called the transpose of $A$ , and denoted $A^{'}$ or $A^{⊤}$ . If $A = A^{'}$ , then $A$ is called symmetric.

For a square matrix $A$ , the $i$ elements of the form $a_{i i}$ for $i = 1, \dots, n$ are called the principal diagonal.

$A$ is called diagonal if the only nonzero entries are on the principal diagonal, i.e., $a_{i j} = 0$ for all $i \neq j$ .

A diagonal matrix $A$ is called the identity matrix, and denoted by $I$ if $a_{i i} = 1$ for all $i$ .

Denote each column of the matrix $A$ as a vector by $a^{k}$ for $k = 1, \dots, n$ , then A can be rewritten using these column vectors as

$A = [a^{1}, \dots, a^{n}]$

Similarly we can write the matrix $A$ using row vectors

$A = [\begin{matrix} a_{1} \\ a_{2} \\ ⋮ \\ a_{m} \end{matrix}]$

18.1 μs

Matrice Operation

Just as was the case for vectors, a number of algebraic operations are defined for matrices.

Scalar multiplication and addition are immediate generalizations of the vector case:

$γ A = γ [\begin{array}{ccc} a_{11} & \dots & a_{1 k} \\ ⋮ & ⋮ & ⋮ \\ a_{n 1} & \dots & a_{n k} \end{array}] = [\begin{array}{ccc} γ a_{11} & \dots & γ a_{1 k} \\ ⋮ & ⋮ & ⋮ \\ γ a_{n 1} & \dots & γ a_{n k} \end{array}]$

and

$[\begin{array}{ccc} a_{11} & \dots & a_{1 k} \\ ⋮ & ⋮ & ⋮ \\ a_{n 1} & \dots & a_{n k} \end{array}] + [\begin{array}{ccc} b_{11} & \dots & b_{1 k} \\ ⋮ & ⋮ & ⋮ \\ b_{n 1} & \dots & b_{n k} \end{array}] = [\begin{array}{ccc} a_{11} + b_{11} & \dots & a_{1 k} + b_{1 k} \\ ⋮ & ⋮ & ⋮ \\ a_{n 1} + b_{n 1} & \dots & a_{n k} + b_{n k} \end{array}]$

In the latter case, the matrices must have the same shape in order for the definition to make sense.

14.8 μs

2×2 Array{Float64,2}:
 3.0  1.0
 2.0  5.0

xxxxxxxxxx
 
A = [3.0 1
     2 5]

4.8 μs

2×2 Adjoint{Float64,Array{Float64,2}}:
 3.0  2.0
 1.0  5.0

xxxxxxxxxx
 
B = A'

112 ns

2×2 Array{Float64,2}:
 6.0   3.0
 3.0  10.0

xxxxxxxxxx
 
C = A + B

2.6 μs

We also have a convention for multiplying matrix with vector.

For a square matrix $A$ and a column vector $x$ , we have

$A x = [\begin{matrix} a_{1} \cdot x \\ a_{2} \cdot x \\ ⋮ \\ a_{m} \cdot x \end{matrix}]$

Another usefull form of $A x$ is

$\begin{matrix} (1) & A x = [a^{1}, a^{2}, \dots, a^{n}] [\begin{matrix} x_{1} \\ x_{2} \\ ⋮ \\ x_{n} \end{matrix}] = x_{1} a^{1} + x_{2} a^{2} + \dots + x_{n} a^{n} \end{matrix}$

which is a linear combination of the set of column vectors ${a^{1}, \dots, a^{n}}$ .

19.0 μs

Float641

3.0

2.0

xxxxxxxxxx
 
a1 = A[:,1] # the first column

4.5 μs

Float641

1.0

5.0

xxxxxxxxxx
 
a2 = A[:,2] # the second column

3.6 μs

Float641

6.5

13.0

xxxxxxxxxx
 
b = A*x

4.7 μs

Float641

6.5

13.0

xxxxxxxxxx
 
a1 * x[1] + a2 * x[2] # the same result 

3.0 μs

26.8 ms

Matrix and Linear Transformation

Linear transformation and matrix representation

A function $f : R^{n} \to R^{n}$ is a linear if for all $x, y \in R^{n}$ and all scalar $α$ and $β$ ,

$f (α x + β y) = α f (x) + β f (y)$

For any $n \times n$ matrix $A$ , it is easy to check that the function $f (x) = A x$ is linear.

In effect, a function $f$ is linear if and only if there exists a matrix $A$ such that $f (x) = A x$ for all $x$ .

10.2 μs

Proof

First, let $α = β = 0$ , we have $f (0) = 0$ .

Second, construct a matrix as follows: choose the standard basis $e_{1}, \dots, e_{n}$ , let $a^{1} = f (e_{1}), \dots, a^{n} = f (e_{n})$ , and $A = [a^{1}, \dots, a^{n}]$ .

Finally, show that $f (x) = A x$ . By $x = \sum_{i} x_{i} e_{i}$ and the linearity of $f$ , we have

$f (x) = x_{1} f (e_{1}) + \dots + x_{n} f (e_{n}) = x_{1} a^{1} + \dots + x_{n} a^{n} = A x$

70.7 ms

Inverse of linear transformation and inverse matrix

What is the range of the function $f (x) = A x$ ?

Since $f (x) = A x = x_{1} a^{1} + \dots x_{n} a^{n}$ , the range is just the span of the columns, i.e.,

$R a n g e (f) = s p a n {a^{1}, \dots, a^{n}}$

Moreover, if the columns are linearly independent, then the range is $R^{n}$ . That is, for any $b \in R^{n}$ , there exist a unique $x \in R^{n}$ such that $f (x) = A x = b$ . Then we say the function $f$ is invertable, and denote its inverse function by $f^{- 1}$ .

It could be verified that $f^{- 1}$ is also linear and thus there exist a matrix $A^{- 1}$ such that

$f^{- 1} (y) = A^{- 1} y \forall y \in R^{n}$

We called the matrix $A^{- 1}$ the inverse matrix of $A$ , and by definition we have

$A^{- 1} A = A A^{- 1} = I$

and then

$b = A x \Rightarrow A^{- 1} b = A^{- 1} A x = I x = x$

12.9 μs

2×2 Array{Float64,2}:
  0.384615  -0.0769231
 -0.153846   0.230769

xxxxxxxxxx
 
inv(A) # the inverse of matrix A

3.2 ms

Float641

1.5

2.0

xxxxxxxxxx
 
inv(A) * b  # x = inverse(A)*b 

17.7 μs

Composition of linear transformation and matrix multiplication

If $A_{m \times n}$ and $B_{n \times p}$ are two matrices, the linear transformation $f (v) = A v$ mapping from $R^{n}$ to $R^{m}$ , while $g (u) = B u$ from $R^{p}$ to $R^{m}$ . Then the composition $f \circ g : R^{p} \to R^{n}$ defined by

$(f \circ g) (u) = f (g (u)), \forall u \in R^{p}$

is also linear, and thus can be represented by a $m \times p$ matrix $D$ . We say $D$ is the multiplication of $A$ and $B$ , i.e., $D = A B$ .

How can we calculate $A B$ ? If we write the matrices as

$A = [\begin{matrix} a_{1} \\ a_{2} \\ ⋮ \\ a_{m} \end{matrix}], B = [b^{1}, \dots, b^{p}]$

then their product $D = A B$ is formed by taking as its $i, j$ -th element the inner product of the $i$ -th row of $A$ and the $j$ -th column of $B$ .

That is, $D = A B = (d_{i j})_{m \times p}$ where

$d_{i j} = a_{i} \cdot b^{j} = \sum_{k = 1}^{n} a_{i k} b_{k j}$

9.2 μs

2×2 Array{Float64,2}:
 10.0  11.0
 11.0  29.0

xxxxxxxxxx
 
D = A * B

34.7 μs

Matrix and System of Linear Equations

Often, the numbers in the matrix represent coefficients in a system of linear equations

$\begin{matrix} b_{1} = a_{11} x_{1} + a_{12} x_{2} + \dots + a_{1 n} x_{n} \\ ⋮ \\ b_{n} = a_{n 1} x_{1} + a_{n 2} x_{2} + \dots + a_{n n} x_{n} \end{matrix}$

The objective here is to solve for the “unknowns” $x_{1}, \dots, x_{n}$ given $a_{11}, \dots, a_{n n}$ and $b_{1}, \dots, b_{n}$ .

This system of equations can be written as

$A x = b$

$x_{1} a^{1} + x_{2} a^{2} + \dots + x_{n} a^{n} = b$

Therefore, to solve $A x = b$ is to find the coefficients of the linear combination.

14.3 μs

Note

(1) If the columns of $A$ are linearly independent, then their span is $R^{n}$ , so for any $b \in R^{n}$ there is a unique solution.

(2) If the columns of $A$ are linearly dependent, then the span is a subset of $R^{n}$ . If $b$ is in the span, then there are multiple solutions; otherwise, there is no solution.

8.4 ms

Float641

1.5

2.0

xxxxxxxxxx
 
A\b # solve the system of equations Ax = b

15.4 μs

Determinant

Given a square matrix $A$ , how can we tell if its columns are linearly independent or not?

There is a function $det (A)$ or $| A |$ called the determinant assigning a real number to any square matrix $A$ , which could help us answer the above question.

In effect, $det (A) \neq 0$ if and only if the columns of $A$ are linearly indpendent.

When $n = 2$ , let

$A = [\begin{array}{cc} a_{11} & a_{12} \\ a_{21} & a_{22} \end{array}]$

we have $det (A) = a_{11} a_{22} - a_{12} a_{21}$ .

18.3 μs

The determinant of a matrix determines whether the column vectors are linearly independent or not.

9.4 ms

determinant

13.0

xxxxxxxxxx
 
determinant = det(A) # the determinant of matrix A

12.4 μs

13.0

xxxxxxxxxx
 
A[1,1] * A[2,2] - A[1,2] * A[2,1] # same result

294 ns

I won't dig into details for the calculation of determinants in general. Instead, let's look at its geometric intuition.

Take the example of $n = 2$ . Note that $A e_{1} = a^{1}$ and $A e^{2} = a^{2}$ . The linear transformation $f (x) = A x$ transforms the square spanned by $e_{1}$ and $e_{2}$ into the parallelogram spanned by $a^{1}$ and $a^{2}$ .

The determinant $det (A)$ is the area scale factor of the transformation $f (x) = A x$ . That is

$det (A) = \frac{Area of the parallelogram}{Area of the square}$

Since the area of the square is 1,

$det (A) = Area of the parallelogram$

11.8 μs

378 ms

In this case, the area of the parallelogram is $det (A) =$ 13.0. Therefore, the linear transformation $f (x) = A x$ stretch the space with scale factor 13.0.

If $a^{1}$ and $a^{2}$ are linearly dependent, the area of the 'parallelogram' becomes zero, i.e., $det (A) = 0$ . If the columns of $A$ are linearly dependent, the linear transformation $f (x) = A x$ compresses the space into a lower-dimensional space, a line in this case.

Note that the determinant could be negative when the linear transformation flips the space. For example,

2.6 ms

A_flip

2×2 Array{Int64,2}:
 1  3
 5  2

xxxxxxxxxx
 
A_flip = [1 3; 5 2]      # compare it with A = [3 1; 2 5]

44.2 ms

873 ms

-13.0

xxxxxxxxxx
 
det(A_flip)  # det(A_flip) = -det(A)

14.1 μs

13.0

xxxxxxxxxx
 
det(A') # det(A') = det(A)

16.9 μs

Eigenvalue and Eigenvector

Let $A$ be an $n \times n$ square matrix.

If $λ$ is scalar and $v$ is a non-zero vector in $R^{n}$ such that

$A v = λ v$

then we say that $λ$ is an eigenvalue of $A$ , and $v$ is an eigenvector.

Thus, an eigenvector of $A$ is a vector such that when the map $f (x) = A x$ is applied, $v$ is merely scaled.

11.7 μs

Eigen{Float64,Float64,Array{Float64,2},Array{Float64,1}}
values:
2-element Array{Float64,1}:
 2.267949192431123
 5.732050807568878
vectors:
2×2 Array{Float64,2}:
 -0.806898  -0.343724
  0.59069   -0.939071

xxxxxxxxxx
 
evals, evecs = eigen(A)  # find all eigenvalues and corresponding eigenvacters

112 μs

Float641

-0.806898

0.59069

xxxxxxxxxx
 
v = evecs[:,1] # one eigenvector

6.3 μs

Float641

-1.83

1.33966

xxxxxxxxxx
 
A * v  # Av

3.7 μs

Float641

-1.83

1.33966

xxxxxxxxxx
 
evals[1] * v # lambda_1 * v

2.3 μs

Float641

-0.343724

-0.939071

xxxxxxxxxx
 
u =  evecs[:,2] # another eigenvector

5.6 μs

Float641

-1.97024

-5.3828

xxxxxxxxxx
 
A * u # Au

3.4 μs

Float641

-1.97024

-5.3828

xxxxxxxxxx
 
evals[2] * evecs[:,2] # lambda_2 u

6.0 μs

The next figure shows two eigenvectors, $v, u$ , and their images under the linear transformation, $A u$ and $A v$ .

4.7 μs

771 ms

Suppose that $A v = λ v$ , then the system of equatins

$(λ I - A) v = 0$

has a non-zero solution. Or the columns of the matrix $(λ I - A)$ is linearly dependent. Therefore,

$det (λ I - A) = 0$

The next figure shows the plot of the characteristic polynominal $det (λ I - A)$ as a function of $λ$ .

14.3 μs

xxxxxxxxxx
 
begin
    Determinant(λ;matrix=A) = det(λ * Matrix(I,2,2) - matrix)
    λ = 1:0.01:7; 
    determinant_λ = [Determinant(i) for i in λ];
    plot(λ, determinant_λ,legend = false, framestyle = :origin)
    plot!(xlab = L"\lambda", ylab = L"\det(\lambda I- A)")
end

1.1 s

25.4 μs

26.2 μs

23.5 μs

Basics of Linear Algebra

Vectors in Rn

Linear Combinations

Inner Product and Norm

Matrices

Matrice Operation

Matrix and Linear Transformation

Linear transformation and matrix representation

Inverse of linear transformation and inverse matrix

Composition of linear transformation and matrix multiplication

Matrix and System of Linear Equations

Determinant

Eigenvalue and Eigenvector

Vectors in $R^{n}$