Basics of Linear Algebra

Weikai Chen, 2021/03/11

This is a lecture note for Marxian Economic Thoery, a course at Renmin University of China. This note is mainly for senior or graduate students in econ major, so I assume that students have taken a course in linear algebra before.

The purpose of this note is to review the basic concepts and methods in linear algebra and prepare the students for the Perron-Frobenius Theorems about positive and nonnegative matrices. Nonnegative matrices arise in many areas such as economics, population models, graph theory, Markov chains and so on. The Perron-Frobenius theory is one of the most powerful tools on nonnegative matrices and the workhorse in mathematical Marxian economics. Given its importance and the fact that it is new to most students, I will discuss P-F theorems in a separate note.

This Note is written in Pluto Notebook, a reactive notebook for Julia.

Linear algebra studies the linear transformation on vector spaces, which can be represented by matrix. We will focus on the n-dimensional Euclidean space Rn, though most results discussed in this note can be easily generalized.

2.5 ms

Vectors in Rn

A vector in Rn is a list with n real number. For example, x=(1,2,3) is a vector in R3. Let x=(x1,,xn),y=(y1,,yn)Rn and kR, define vector addition and scalar muliplication by the following:

x+y=(x1+y1,,xn+yn)

kx=(kx1,,kxn)

10.2 μs
15.4 s
x
1.4 μs
y
24.5 ms
z
2.1 μs
w
1.9 μs

Now let's plot those vectors.

4.7 μs
29.2 s

Linear Combinations

Given a set of vectors {a1,,ak} in Rn, the new vectors we can create by performing linear operations are called linear combinations of {a1,,ak}.

That is, bRn is a linear combination of {a1,,ak} if

b=x1a1++xkak for some scalars x1,,xk

In this context, the values x1,,xk are called the coefficients of the linear combination.

The set of these linear combinations V is called the span of {a1,,ak}, denoted by V=span{a1,,ak}

Note that the cofficients of a linear combinbation bspan{a1,,ak} may not be unique. If for any bspan{a1,,ak}, the coefficients are unique, then the set of vectors {a1,,ak} are said to be independent.

A set of vectors {a1,,ak} is dependent if they are not independent. Then one of them can be expressed as the linear combination of the rest.

A set of vectors {a1,,ak} is a basis for the span V if it independent.

It can be shown that

  1. If span{a1,,ak}=Rn, then kn.

  2. If {a1,,ak} is independent, then kn.

Therefore, {a1,,ak} is a basis for Rn if and only if it is linearly independent and k=n.

Below is an example of basis for R3, called the standard basis:

e1=[100],e2=[010],e3=[001]

For any x=(x1,x2,x3)R3, we can write

x=x1e1+x2e2+x3e3

44.2 μs

Inner Product and Norm

The inner product of vectors x,yRn is defined as

xy=i=1nxiyi

The inner product is also denoted by xy.

Two vectors are called orthogonal if their inner product is zero.

The norm of a vector x represents its “length” (i.e., its distance from the zero vector) and is defined as

x=xx=(i=1nxi2)1/2

The expression xy is thought of as the distance between x and y.

15.3 μs
4.75
2.4 μs
4.75
1.8 μs
2.5
679 ns
2.5
1.7 μs
1.0
2.4 μs

Matrices

A Matrice is a rectangular array of numbers.

A=[a11a12a1na21a22a2nam1am2amn]

is called an m×n matrix. If m=n, then A is called square.

The matrix formed by replacing aij by aji for every i and j is called the transpose of A, and denoted A or A. If A=A, then A is called symmetric.

For a square matrix A, the i elements of the form aii for i=1,,n are called the principal diagonal.

A is called diagonal if the only nonzero entries are on the principal diagonal, i.e., aij=0 for all ij.

A diagonal matrix A is called the identity matrix, and denoted by I if aii=1 for all i.

Denote each column of the matrix A as a vector by ak for k=1,,n, then A can be rewritten using these column vectors as

A=[a1,,an]

Similarly we can write the matrix A using row vectors

A=[a1a2am]

18.1 μs

Matrice Operation

Just as was the case for vectors, a number of algebraic operations are defined for matrices.

Scalar multiplication and addition are immediate generalizations of the vector case:

γA=γ[a11a1kan1ank]=[γa11γa1kγan1γank]

and

[a11a1kan1ank]+[b11b1kbn1bnk]=[a11+b11a1k+b1kan1+bn1ank+bnk]

In the latter case, the matrices must have the same shape in order for the definition to make sense.

14.8 μs
A
2×2 Array{Float64,2}:
 3.0  1.0
 2.0  5.0
4.8 μs
B
2×2 Adjoint{Float64,Array{Float64,2}}:
 3.0  2.0
 1.0  5.0
112 ns
C
2×2 Array{Float64,2}:
 6.0   3.0
 3.0  10.0
2.6 μs

We also have a convention for multiplying matrix with vector.

For a square matrix A and a column vector x, we have

Ax=[a1xa2xamx]

Another usefull form of Ax is

(1)Ax=[a1,a2,,an][x1x2xn]=x1a1+x2a2++xnan

which is a linear combination of the set of column vectors {a1,,an}.

19.0 μs
a1
4.5 μs
a2
3.6 μs
b
4.7 μs
3.0 μs
26.8 ms

Matrix and Linear Transformation

Linear transformation and matrix representation

A function f:RnRn is a linear if for all x,yRn and all scalar α and β,

f(αx+βy)=αf(x)+βf(y)

For any n×n matrix A, it is easy to check that the function f(x)=Ax is linear.

In effect, a function f is linear if and only if there exists a matrix A such that f(x)=Ax for all x.

10.2 μs

Proof

First, let α=β=0, we have f(0)=0.

Second, construct a matrix as follows: choose the standard basis e1,,en, let a1=f(e1),,an=f(en), and A=[a1,,an].

Finally, show that f(x)=Ax. By x=ixiei and the linearity of f, we have

f(x)=x1f(e1)++xnf(en)=x1a1++xnan=Ax

70.7 ms

Inverse of linear transformation and inverse matrix

What is the range of the function f(x)=Ax?

Since f(x)=Ax=x1a1+xnan, the range is just the span of the columns, i.e.,

Range(f)=span{a1,,an}

Moreover, if the columns are linearly independent, then the range is Rn. That is, for any bRn, there exist a unique xRn such that f(x)=Ax=b. Then we say the function f is invertable, and denote its inverse function by f1.

It could be verified that f1 is also linear and thus there exist a matrix A1 such that

f1(y)=A1yyRn

We called the matrix A1 the inverse matrix of A, and by definition we have

A1A=AA1=I

and then

b=AxA1b=A1Ax=Ix=x

12.9 μs
2×2 Array{Float64,2}:
  0.384615  -0.0769231
 -0.153846   0.230769
3.2 ms
17.7 μs

Composition of linear transformation and matrix multiplication

If Am×n and Bn×p are two matrices, the linear transformation f(v)=Av mapping from Rn to Rm, while g(u)=Bu from Rp to Rm. Then the composition fg:RpRn defined by

(fg)(u)=f(g(u)),uRp

is also linear, and thus can be represented by a m×p matrix D. We say D is the multiplication of A and B, i.e., D=AB.

How can we calculate AB? If we write the matrices as

A=[a1a2am],B=[b1,,bp]

then their product D=AB is formed by taking as its i,j-th element the inner product of the i-th row of A and the j-th column of B.

That is, D=AB=(dij)m×p where

dij=aibj=k=1naikbkj

9.2 μs
D
2×2 Array{Float64,2}:
 10.0  11.0
 11.0  29.0
34.7 μs

Matrix and System of Linear Equations

Often, the numbers in the matrix represent coefficients in a system of linear equations

b1=a11x1+a12x2++a1nxnbn=an1x1+an2x2++annxn

The objective here is to solve for the “unknowns” x1,,xn given a11,,ann and b1,,bn.

This system of equations can be written as

Ax=b

or

x1a1+x2a2++xnan=b

Therefore, to solve Ax=b is to find the coefficients of the linear combination.

14.3 μs

Note

(1) If the columns of A are linearly independent, then their span is Rn, so for any bRn there is a unique solution.

(2) If the columns of A are linearly dependent, then the span is a subset of Rn. If b is in the span, then there are multiple solutions; otherwise, there is no solution.

8.4 ms
15.4 μs

Determinant

Given a square matrix A, how can we tell if its columns are linearly independent or not?

There is a function det(A) or |A| called the determinant assigning a real number to any square matrix A, which could help us answer the above question.

In effect, det(A)0 if and only if the columns of A are linearly indpendent.

When n=2, let

A=[a11a12a21a22]

we have det(A)=a11a22a12a21.

18.3 μs

The determinant of a matrix determines whether the column vectors are linearly independent or not.

9.4 ms
determinant
13.0
12.4 μs
13.0
294 ns

I won't dig into details for the calculation of determinants in general. Instead, let's look at its geometric intuition.

Take the example of n=2. Note that Ae1=a1 and Ae2=a2. The linear transformation f(x)=Ax transforms the square spanned by e1 and e2 into the parallelogram spanned by a1 and a2.

The determinant det(A) is the area scale factor of the transformation f(x)=Ax. That is

det(A)=Area of the parallelogramArea of the square

Since the area of the square is 1,

det(A)=Area of the parallelogram

11.8 μs
378 ms

In this case, the area of the parallelogram is det(A)= 13.0. Therefore, the linear transformation f(x)=Ax stretch the space with scale factor 13.0.

If a1 and a2 are linearly dependent, the area of the 'parallelogram' becomes zero, i.e., det(A)=0. If the columns of A are linearly dependent, the linear transformation f(x)=Ax compresses the space into a lower-dimensional space, a line in this case.

Note that the determinant could be negative when the linear transformation flips the space. For example,

2.6 ms
A_flip
2×2 Array{Int64,2}:
 1  3
 5  2
44.2 ms
873 ms
-13.0
14.1 μs
13.0
16.9 μs

Eigenvalue and Eigenvector

Let A be an n×n square matrix.

If λ is scalar and v is a non-zero vector in Rn such that

Av=λv

then we say that λ is an eigenvalue of A, and v is an eigenvector.

Thus, an eigenvector of A is a vector such that when the map f(x)=Ax is applied, v is merely scaled.

11.7 μs
Eigen{Float64,Float64,Array{Float64,2},Array{Float64,1}}
values:
2-element Array{Float64,1}:
 2.267949192431123
 5.732050807568878
vectors:
2×2 Array{Float64,2}:
 -0.806898  -0.343724
  0.59069   -0.939071
112 μs
v
6.3 μs
3.7 μs
2.3 μs
u
5.6 μs
3.4 μs
6.0 μs

The next figure shows two eigenvectors, v,u, and their images under the linear transformation, Au and Av.

4.7 μs
771 ms

Suppose that Av=λv, then the system of equatins

(λIA)v=0

has a non-zero solution. Or the columns of the matrix (λIA) is linearly dependent. Therefore,

det(λIA)=0

The next figure shows the plot of the characteristic polynominal det(λIA) as a function of λ.

14.3 μs
1.1 s
25.4 μs
26.2 μs
23.5 μs