2 of 202

Lecture 1

Systems of Linear Equations

In this lecture, we will introduce linear systems and the method of row reduction to solve them. We will introduce matrices as a convenient structure to represent and solve linear systems. Lastly, we will discuss geometric interpretations of the solution set of a linear system in 2- and 3-dimensions.

1.1 What is a system of linear equations?

Definition 1.1: A system of m linear equations in n unknown variables x₁, x₂, . . . , x_n

is a collection of m equations of the form

^a11^x1 ^a21^x1 ^a31^x1

^a12^x2 ^a22^x2 ^a32^x2

^a13^x3 ^a23^x3 ^a33^x3

+ · · · +

^a1n^xn ^a2n^xn ^a3n^xn

= b₁

= b₂

= b₃

^am1^x1

^am2^x2

^am3^x3

. .

+ · · · +

^amn^xn

= b_m

(1.1)

The numbers a_ijare called the coefficients of the linear system; because there are m equa- tions and n unknown variables there are thefore m × n coefficients. The main problem with a linear system is of course to solve it:

Problem: Find a list of n numbers (s₁, s₂, . . . , s_n) that satisfy the system of linear equa- tions (1.1).

In other words, if we substitute the list of numbers (s₁, s₂, . . . , s_n) for the unknown variables (x₁, x₂, . . . , x_n) in equation (1.1) then the left-hand side of the ith equation will equal b_i. We call such a list (s₁, s₂, . . . , s_n) a solution to the system of equations. Notice that we say “a solution” because there may be more than one. The set of all solutions to a linear system is called its solution set. As an example of a linear system, below is a linear

3 of 202

Systems of Linear Equations

system consisting of m = 2 equations and n = 3 unknowns:

x₁− 5x₂− 7x₃= 0 5x₂+ 11x₃= 1

Here is a linear system consisting of m = 3 equations and n = 2 unknowns:

−5x₁+ x₂= −1

πx₁−_√5x₂= 0

63x₁− 2x₂= −7

And finally, below is a linear system consisting of m = 4 equations and n = 6 unknowns:

−5x₁+ x₃− 44x₄− 55x₆= −1

πx₁− 5x₂− x₃+ 4x₄− 5x₅+ ^√5x₆= 0

√

1 2

1 1

63x − 2x − x + ln(3)x + 4x − x = 0

√

63x − 2x

1 2

— x

4 6

— x − 5x = 5

Example 1.2. Verify that (1, 2, −4) is a solution to the system of equations

2x₁+ 2x₂+ x₃= 2

x₁+ 3x₂− x₃= 11.

Is (1, −1, 2) a solution to the system?

Solution. The number of equations is m = 2 and the number of unknowns is n = 3. There are m × n = 6 coefficients: a₁₁= 2, a₁₂= 1, a₁₃= 1, a₂₁= 1, a₂₂= 3, and a₂₃= −1. And b₁= 0 and b₂= 11. The list of numbers (1, 2, −4) is a solution because

2 · (1) + 2(2) + (−4) = 2

(1) + 3 · (2) − (−4) = 11

On the other hand, for (1, −1, 2) we have that

2(1) + 2(−1) + (2) = 2

but

1 + 3(−1) − 2 = −4 /= 11.

Thus, (1, −1, 2) is not a solution to the system.

A linear system may not have a solution at all. If this is the case, we say that the linear

system is inconsistent:

4 of 202

Lecture 1

INCONSISTENT ⇔ NO SOLUTION

A linear system is called consistent if it has at least one solution:

CONSISTENT ⇔ AT LEAST ONE SOLUTION

We will see shortly that a consistent linear system will have either just one solution or infinitely many solutions. For example, a linear system cannot have just 4 or 5 solutions. If it has multiple solutions, then it will have infinitely many solutions.

Example 1.3. Show that the linear system does not have a solution.

−x₁+ x₂= 3

x₁− x₂= 1.

Solution. If we add the two equations we get

0 = 4

which is a contradiction. Therefore, there does not exist a list (s₁, s₂) that satisfies the system because this would lead to the contradiction 0 = 4.

Example 1.4. Let t be an arbitrary real number and let

s = − − 2t

s₂= ³+ t

s₃= t.

Show that for any choice of the parameter t, the list (s₁, s₂, s₃) is a solution to the linear system

x₁+ x₂+ x₃= 0

x₁+ 3x₂− x₃= 3.

Solution. Substitute the list (s₁, s₂, s₃) into the left-hand-side of the first equation

− − 2t + + t + t = 0

and in the second equation

2 2 2 2

− − 2t + 3( + t) − t = − + = 3

Both equations are satisfied for any value of t. Because we can vary t arbitrarily, we get an infinite number of solutions parameterized by t. For example, compute the list (s₁, s₂, s₃)

for t = 3 and confirm that the resulting list is a solution to the linear system.

5 of 202

Systems of Linear Equations

1.2 Matrices

We will use matrices to develop systematic methods to solve linear systems and to study the properties of the solution set of a linear system. Informally speaking, a matrix is an array or table consisting of rows and columns. For example,

5x₃− x₄= 7.

−2 1

A = 0 2 −8 8

 _1 0

 

−4 7 11 −5

is a matrix having m = 3 rows and n = 4 columns. In general, a matrix with m rows and

n columns is a m × n matrix and the set of all such matrices will be denoted by M_m_×_n.

Hence, A above is a 3 × 4 matrix. The entry of A in the ith row and jth column will be denoted by a_ij. A matrix containing only one column is called a column vector and a matrix containing only one row is called a row vector. For example, here is a row vector

u =1 −3

v =

₄

and here is a column vector

−1

We can associate to a linear system three matrices: (1) the coefficient matrix, (2) the output column vector, and (3) the augmented matrix. For example, for the linear system

5x₁− 3x₂+ 8x₃= −1

x₁+ 4x₂− 6x₃= 0 2x₂+ 4x₃= 3

the coefficient matrix A, the output vector b, and the augmented matrix [A b] are:

A = 1 4 −

 

6 , b = 0 ,

_5 8 ₋₁ ₅

0 2 4 3 0 2 4 3

−3 −3 8 −1^

 

[A b] = 1 4 −6 0 .

we can write this as A ∈ M_m_×_n.

If a linear system has m equations and n unknowns then the coefficient matrix A must be a

m × n matrix, that is, A has m rows and n columns. Using our previously defined notation,

If we are given an augmented matrix, we can write down the associated linear system in an obvious way. For example, the linear system associated to the augmented matrix

₁



0 1 −7 2 −4

0 0 5 −1 7

4 −2 8 12 ^



x₁+ 4x₂− 2x₃+ 8x₄= 12

x₂− 7x₃+ 2x₄= −4

6 of 202

Lecture 1

We can study matrices without interpreting them as coefficient matrices or augmented ma- trices associated to a linear system. Matrix algebra is a fascinating subject with numerous applications in every branch of engineering, medicine, statistics, mathematics, finance, biol- ogy, chemistry, etc.

1.3 Solving linear systems

In algebra, you learned to solve equations by first “simplifying” them using operations that do not alter the solution set. For example, to solve 2x = 8 − 2x we can add to both sides

2x and obtain 4x = 8 and then multiply both sides by ¹yielding x = 2. We can do

similar operations on a linear system. There are three basic operations, called elementary

operations, that can be performed:

Interchange two equations.
Multiply an equation by a nonzero constant.
Add a multiple of one equation to another.

These operations do not alter the solution set. The idea is to apply these operations itera- tively to simplify the linear system to a point where one can easily write down the solution set. It is convenient to apply elementary operations on the augmented matrix [A b] repre- senting the linear system. In this case, we call the operations elementary row operations, and the process of simplifying the linear system using these operations is called row reduc- tion. The goal with row reducing is to transform the original linear system into one having a triangular structure and then perform back substitution to solve the system. This is best explained via an example.

Example 1.5. Use back substitution on the augmented matrix

₁



0 1 −1 0

0 −2 −4^



0 0 1 1

to solve the associated linear system.

Solution. Notice that the augmented matrix has a triangular structure. The third row corresponds to the equation x₃= 1. The second row corresponds to the equation

x₂− x₃= 0

and therefore x₂= x₃= 1. The first row corresponds to the equation

x₁− 2x₃= −4

and therefore

x₁= −4 + 2x₃= −4 + 2 = −2.

Therefore, the solution is (−2, 1, 1).

7 of 202

Systems of Linear Equations

Example 1.6. Solve the linear system using elementary row operations.

−3x₁+ 2x₂+ 4x₃= 12

x₁− 2x₃= −4 2x₁− 3x₂+ 4x₃= −3

Solution. Our goal is to perform elementary row operations to obtain a triangular structure and then use back substitution to solve. The augmented matrix is



^−3



2 4 12 ^

1 0 −2 −4 .

2 −3 4 −3

Interchange Row 1 (R₁) and Row 2 (R₂):

1 0 −2 −4

^−3 2 4 12 ^

 

R ↔R

1 2

−−−−→

 ₁



−3 2 4 12

0 −2 −4^



2 −3 4 −3 2 −3 4 −3

As you will see, this first operation will simplify the next step. Add 3R₁to R₂:



0 −2 −4^

−3 2 4 12

2 −3 4 −3



3R₁+R₂



 ₁₁

−−−−→ 0

0 −2 −4^

2 −2 0

2 −3 4 −3



Add −2R₁to R₃:



₁

0 −2 −4^

0 2 −2 0

2 −3 4 −3



−2R +R

1 3



₁

−−−−−→ 0

0 −2 −4^

2 −2 0

0 −3 8 5



Multiply R₂by ¹:













1 1

R₂





Add 3R₂to R₃:



₁

1 0 −2 −4 0 −2 −4

0 2 −2 0 −−→ 0 1 −1 0

0 −3 8 5 0 −3 8 5

0 −2 −4^

0 1 −1 0

0 −3 8 5



3R₂+R₃



₁

−−−−→ 0

0 −2 −4^

1 −1 0

0 0 5 5



Multiply R₃by ¹:







So now use back substitution to solve. The linear system associated to the row reduced



 

1 0 −2 −4

0 1 −1 0 −−→ 0

0 0 5 5 0

1 1 0 −2 −4

R₃

1 −1 0

0 1 1





We can continue row reducing but the row reduced augmented matrix is in triangular form.

8 of 202

Lecture 1

augmented matrix is

x₁− 2x₃= −4

x₂− x₃= 0

x₃= 1

The last equation gives that x₃= 1. From the second equation we obtain that x₂− x₃= 0, and thus x₂= 1. The first equation then gives that x₁= −4 + 2(1) = −2. Thus, the solution to the original system is (−2, 1, 1). You should verify that (−2, 1, 1) is a solution to the original system.

The original augmented matrix of the previous example is

0 0 0 −1

M = 1 0 −2 −4

^−3 2 4 12 ^

 

→

2 −3 4 −3

After row reducing we obtained the row reduced matrix

−3x₁+ 2x₂+ 4x₃= 12

x₁− 2x₃= −4

2x₁− 3x₂+ 4x₃= −3.

₁



N = 0 1 −1 0

0 0 1 1

0 −2 −4^



→

x₁− 2x₃= −4

x₂− x₃= 0

x₃= 1.

Although the two augmented matrices M and N are clearly distinct, it is a fact that they

have the same solution set.

Example 1.7. Using elementary row operations, show that the linear system is inconsistent.

x₁+ 2x₃= 1 x₂+ x₃= 0 2x₁+ 4x₃= 1

Solution. The augmented matrix is



0 2

0 1 1 0

2 0 4 1

_1 1



Perform the operation −2R₁+ R₃:

₁



0 2

₁

0 1 1 0

2 0 4 1



−2R +R

1 3



₁

0 2

−−−−−→ 0 1 1 0

0 0 0 −1

₁



The last row of the simplified augmented matrix

₁

₀

0 2

1 1

₁

₀

9 of 202

Systems of Linear Equations

corresponds to the equation

0x₁+ 0x₂+ 0x₃= −1

Obviously, there are no numbers x₁, x₂, x₃that satisfy this equation, and therefore, the linear system is inconsistent, i.e., it has no solution. In general, if we obtain a row in an augmented matrix of the form

0 0 0 · · · 0 c

where c is a nonzero number, then the linear system is inconsistent. We will call this type of row an inconsistent row. However, a row of the form

0 1 0 0 0

corresponds to the equation x₂= 0 which is perfectly valid.

1.4 Geometric interpretation of the solution set

The set of points (x₁, x₂) that satisfy the linear system

x₁− 2x₂= −1

−x₁+ 3x₂= 3

(1.2)

is the intersection of the two lines determined by the equations of the system. The solution for this system is (3, 2). The two lines intersect at the point (x₁, x₂) = (3, 2), see Figure 1.1.

Figure 1.1: The intersection point of the two lines is the solution of the linear system (1.2)

Similarly, the solution of the linear system

x₁− 2x₂+ x₃= 0 2x₂− 8x₃= 8

−4x₁+ 5x₂+ 9x₃= −9

(1.3)

10 of 202

Lecture 1

is the intersection of the three planes determined by the equations of the system. In this case, there is only one solution: (29, 16, 3). In the case of a consistent system of two equations, the solution set is the line of intersection of the two planes determined by the equations of the system, see Figure 1.2.

the solution set is this line

x₁− 2x₂+ x₃= 0

−4x₁+ 5x₂+ 9x₃= −9

Figure 1.2: The intersection of the two planes is the solution set of the linear system (1.3)

After this lecture you should know the following:

what a linear system is
what it means for a linear system to be consistent and inconsistent
what matrices are
what are the matrices associated to a linear system
what the elementary row operations are and how to apply them to simplify a linear system
what it means for two matrices to be row equivalent
how to use the method of back substitution to solve a linear system
what an inconsistent row is
how to identify using elementary row operations when a linear system is inconsistent
the geometric interpretation of the solution set of a linear system

11 of 202

Systems of Linear Equations

12 of 202

Lecture 2

Row Reduction and Echelon Forms

In this lecture, we will get more practice with row reduction and in the process introduce two important types of matrix forms. We will also discuss when a linear system has a unique solution, infinitely many solutions, or no solution. Lastly, we will introduce a convenient parameter called the rank of a matrix.

2.1 Row echelon form (REF)

Consider the linear system

x₁+ 5x₂− 2x₄− x₅+ 7x₆= −4

2x₂− 2x₃+ 3x₆= 0

−9x₄− x₅+ x₆= −1

5x₅+ x₆= 5

0 = 0

having augmented matrix





₁





5 0 −2 −1 7 −4^

0 2 −2 0 0 3 0 _

^0 0 0 −9 −1 1 −1 .

0 0 0 0 5 1 5

0 0 0 0 0 0 0

The above augmented matrix has the following properties:

P1. All nonzero rows are above any rows of all zeros.

P2. The leftmost nonzero entry of a row is to the right of the leftmost nonzero entry of the row above it.

13 of 202

Row Reduction and Echelon Forms

Any matrix satisfying properties P1 and P2 is said to be in row echelon form (REF). In REF, the leftmost nonzero entry in a row is called a leading entry:





1 5 0 −2 −1 7 −4

0 2 −2 0 0 3 0

0 0 0 −9 −1 1 −1

0 0 0 0 5 1 5

0 0 0 0 0 0 0

 





A consequence of property P2 is that every entry below a leading entry is zero:







1 5 0 −2 −4 −1 −7

0 2 −2 0 0 3 0

0 0 0 −9 −1 1 −1

0 0 0 0 5 1 5

0 0 0 0 0 0 0







We can perform elementary row operations, or row reduction, to transform a matrix into REF.

Example 2.1. Explain why the following matrices are not in REF. Use elementary row operations to put them in REF.

M = 0 0

 



_3 3 ₇

N = 0 3 −1 1

5 0 −3^



0 0 0 6 −5 2

Solution. Matrix M fails property P1. To put M in REF we interchange R₂with R₃:

−1 0	0 0
1	3
−1	0 0
1	3

R₂↔R₃

  

−1 0

M = 0 0 0 −−−−→ 0

1 3 0

0 0 0 0 0 0

_3 3 _3 3



The matrix N fails property P2. To put N in REF we perform the operation −2R₂+ R₃→

R₃:

0 6 −5 2

 

−2R +R

2 3

₇₇

0 3 −1 1 −−−−−→ 0 3 −1 1

0 0 −3 0

5 0 −3^5 0 −3^

 

Why is REF useful? Certain properties of a matrix can be easily deduced if it is in REF. For now, REF is useful to us for solving a linear system of equations. If an augmented matrix is in REF, we can use back substitution to solve the system, just as we did in Lecture 1. For example, consider the system

8x₁− 2x₂+ x₃= 4

3x₂− x₃= 7

2x₃= 4

14 of 202

Lecture 2

whose augmented matrix is already in REF:

₈



−2 1

₄

0 3 −1 7

0 0 2 4



From the last equation we obtain that 2x₃= 4, and thus x₃= 2. Substituting x₃= 2 into the second equation we obtain that x₂= 3. Substituting x₃= 2 and x₂= 3 into the first equation we obtain that x₁= 1.

2.2 Reduced row echelon form (RREF)

Although REF simplifies the problem of solving a linear system, later on in the course we will need to completely row reduce matrices into what is called reduced row echelon form (RREF). A matrix is in RREF if it is in REF (so it satisfies properties P1 and P2) and in addition satisfies the following properties:

P3. The leading entry in each nonzero row is a 1.

P4. All the entries above (and below) a leading 1 are all zero.

A leading 1 in the RREF of a matrix is called a pivot. For example, the following matrix in RREF:



6 0 3 0

0 0 1 −4 0 5

0 0 0 0 1 7

_1 0



has three pivots:



₁

6 0 3 0

0 0 1 −4 0 5

0 0 0 0 1 7

₀



Example 2.2. Use row reduction to transform the matrix into RREF.



3 −6 6 4 −5

3 −7 8 −5 8 9

 ₀

3 −9 12 −9 6 15

Solution. The first step is to make the top leftmost entry nonzero:





3 −9 12 −9 6 15



R₃↔R₁



₀₃

3 −6 6 4 −5^−9

3 −7 8 −5 8 9

−−−−→ 3

12 −9 6 15 ^

−7 8 −5 8 9

0 3 −6 6 4 −5



Now create a leading 1 in the first row:



₃



−9 12 −9 6 15 ^1 _R₁

3 −7 8 −5 8 9

0 3 −6 6 4 −5



₁

−3 4 −3 2

−−→ 3 −7 8 −5 8 9

0 3 −6 6 4 −5

₅



15 of 202

Row Reduction and Echelon Forms

Create zeros under the newly created leading 1:



−3 4 −3 2

3 −7 8 −5 8 9

0 3 −6 6 4 −5

_1 5



−3R₁+R₂



₁

−−−−−→ 0

−3 4 −3 2

2 −4

0 3 −6

₅

4 2 −6

6 4 −5



Create a leading 1 in the second row:

₁



−3 4 −3 2

₅

0 2 −4 4 2 −6

0 3 −6 6 4 −5



R₂



₁

−3 4 −3 2

−−→ 0 1 −2

0 3 −6

₅

2 1 −3

6 4 −5



Create zeros under the newly created leading 1:

−3 4 −3 2

_1 5

 

−3R₂+R₃

₁



−3 4 −3 2

0 1 −2 2 1 −3 −−−−−→ 0

0 3 −6 6 4 −5 0 0 0 0

₅

1 −2 2 1 −3

1 4



We have now completed the top-to-bottom phase of the row reduction algorithm. In the next phase, we work bottom-to-top and create zeros above the leading 1’s. Create zeros above the leading 1 in the third row:



−3 4 −3 2

0 1 −2 2 1 −3

0 0 0 0 1 4

_1 5



−R₃+R₂

₁



−3 4 −3 2

−−−−−→ 0

₅





₁

−3 4 −3 2

0 1 −2 2 0 −7

0 0 0 0 1 4

₅



−2R₃+R₁



₁

1 −2 2 0 −7

0 0 0 0 1 4

−3 4

−−−−−→ 0

1 −2 2 0 −7

0 0 0 0 1 4

−3 0 −3^



Create zeros above the leading 1 in the second row:



₁

0 1 −2 2 0 −7

−3 4 −3 0 −3^



3R +R

2 1



₁

−−−−→ 0

0 −2 3 0 −24^

1 −2 2 0 −7

0 0 0 0 1 4 0 0 0 0 1 4



This completes the row reduction algorithm and the matrix is in RREF.

Example 2.3. Use row reduction to solve the linear system.

2x₁+ 4x₂+ 6x₃= 8 x₁+ 2x₂+ 4x₃= 8 3x₁+ 6x₂+ 9x₃= 12

Solution. The augmented matrix is



₂

4 6

1 2 4 8

3 6 9 12

₈



16 of 202

Lecture 2

Create a leading 1 in the first row:



₂

₈



1 _R₁



₁

4 6 2 3

1 2 4 8 −−→ 1 2 4 8

3 6 9 12 3 6 9 12

₄



Create zeros under the first leading 1:





−R₁+R₂



₁

^1 2 3 4 ^2 3

1 2 4 8 −−−−−→ 0 0 1 4

3 6 9 12 3 6 9 12

₄





₁

₄



−3R +R

1 3



₁

2 3 2 3

0 0 1 4 −−−−−→ 0 0 1 4

3 6 9 12 0 0 0 0

₄



The system is consistent, however, there are only 2 nonzero rows but 3 unknown variables. This means that the solution set will contain 3 − 2 = 1 free parameter. The second row in the augmented matrix is equivalent to the equation:

x₃= 4.

The first row is equivalent to the equation:

x₁+ 2x₂+ 3x₃= 4

and after substituting x₃= 4 we obtain

x₁+ 2x₂= −8.

We now must choose one of the variables x₁or x₂to be a parameter, say t, and solve for the remaining variable. If we set x₂= t then from x₁+ 2x₂= −8 we obtain that

x₁= −8 − 2t.

We can therefore write the solution set for the linear system as

x₁= −8 − 2t

x₂= t x₃= 4

(2.1)

where t can be any real number. If we had chosen x₁to be the parameter, say x₁= t, then the solution set can be written as

x₁= t

x = −4 − t

(2.2)

x₃= 4

Although (2.1) and (2.2) are two different parameterizations, they both give the same solution set.

17 of 202

Row Reduction and Echelon Forms

In general, if a linear system has n unknown variables and the row reduced augmented matrix has r leading entries, then the number of free parameters d in the solution set is

d = n − r.

Thus, when performing back substitution, we will have to set d of the unknown variables to arbitrary parameters. In the previous example, there are n = 3 unknown variables and the row reduced augmented matrix contained r = 2 leading entries. The number of free parameters was therefore

d = n − r = 3 − 2 = 1.

Because the number of leading entries r in the row reduced coefficient matrix determine the number of free parameters, we will refer to r as the rank of the coefficient matrix:

r = rank(A).

Later in the course, we will give a more geometric interpretation to rank(A).

Example 2.4. Solve the linear system represented by the augmented matrix

₁



0 1 −3 3 1 −5

−7 2 −5 8 10 ^



0 0 0 1 −1 4

Solution. The number of unknowns is n = 5 and the augmented matrix has rank r = 3 (leading entries). Thus, the solution set is parameterized by d = 5 − 3 = 2 free variables, call them t and s. The last equation of the augmented matrix is x₄− x₅= 4. We choose x₅to be the first parameter so we set x₅= t. Therefore, x₄= 4 + t. The second equation of the augmented matrix is

x₂− 3x₃+ 3x₄+ x₅= −5

and the unassigned variables are x₂and x₃. We choose x₃to be the second parameter, say

x₃= s. Then

x₂= −5 + 3x₃− 3x₄− x₅

= −5 + 3s − 3(4 + t) − t

= −17 − 4t + 3s.

We now use the first equation of the augmented matrix to write x₁in terms of the other variables:

x₁= 10 + 7x₂− 2x₃+ 5x₄− 8x₅

= 10 + 7(−17 − 4t + 3s) − 2s + 5(4 + t) − 8t

= −89 − 31t + 19s

18 of 202

Lecture 2

Thus, the solution set is

x₁= −89 − 31t + 19s x₂= −17 − 4t + 3s

x₃= s

x₄= 4 + t x₅= t

where t and s are arbitrary real numbers.. Choose arbitrary numbers for t and s and substitute the corresponding list (x₁, x₂, . . . , x₅) into the system of equations to verify that it is a solution.

Existence and uniqueness of solutions

The REF or RREF of an augmented matrix leads to three distinct possibilities for the solution set of a linear system.

Theorem 2.5: Let [A b] be the augmented matrix of a linear system. One of the following distinct possibilities will occur:

The augmented matrix will contain an inconsistent row.
All the rows of the augmented matrix are consistent and there are no free parameters.
All the rows of the augmented matrix are consistent and there are d ≥ 1 variables that must be set to arbitrary parameters

In Case 1., the linear system is inconsistent and thus has no solution. In Case 2., the linear system is consistent and has only one (and thus unique) solution. This case occurs when r = rank(A) = n since then the number of free parameters is d = n − r = 0. In Case 3., the linear system is consistent and has infinitely many solutions. This case occurs when r < n and thus d = n − r > 0 is the number of free parameters.

After this lecture you should know the following:

what the REF is and how to compute it
what the RREF is and how to compute it
how to solve linear systems using row reduction (Practice!!!)
how to identify when a linear system is inconsistent
how to identify when a linear system is consistent
what is the rank of a matrix
how to compute the number of free parameters in a solution set
what are the three possible cases for the solution set of a linear system (Theorem 2.5)

19 of 202

Row Reduction and Echelon Forms

20 of 202

Lecture 3

Vector Equations

In this lecture, we introduce vectors and vector equations. Specifically, we introduce the linear combination problem which simply asks whether it is possible to express one vector in terms of other vectors; we will be more precise in what follows. As we will see, solving the linear combination problem reduces to solving a linear system of equations.

3.1 Vectors in Rⁿ

Recall that a column vector in Rⁿis a n × 1 matrix. From now on, we will drop the “column” descriptor and simply use the word vectors. It is important to emphasize that a vector in Rⁿis simply a list of n numbers; you are safe (and highly encouraged!) to forget the idea that a vector is an object with an arrow. Here is a vector in R²:

v =

−1

Here is a vector in R³:

₋₃

 

v = 0 .

Here is a vector in R⁶:

 ₉

 

 

v = ^⁻³^.

To indicate that v is a vector in Rⁿ, we will use the notation v ∈ Rⁿ. The mathematical

symbol ∈ means “is an element of”. When we write vectors within a paragraph, we will write

them using list notation instead of column notation, e.g., v = (−1, 4) instead of v =

−1 _.

21 of 202

Vector Equations

We can add/subtract vectors, and multiply vectors by numbers or scalars. For example, here is the addition of two vectors:

 ₀  ₄  ₄

2 1

+ =

−5 −3 −8

     

     

And the multiplication of a scalar with a vector:

 ₁  ₃

3 ^−3^= ^−9^.

5 15

And here are both operations combined:

 ₄ ₋₂ ₋₈ ₋₆ ₋₁₄

         

−2 −8 + 3 9 = 16 + 27 = 43 .

3 4 −6 12 6

These operations constitute “the algebra” of vectors. As the following example illustrates, vectors can be used in a natural way to represent the solution of a linear system.

Example 3.1. Write the general solution in vector form of the linear system represented by the augmented matrix





A b = 0

1 −3 3 1 −5

1 −7 2 −5 8 10 ^



x =

x₅

x =

  

  

x −17 − 4t + 3t

2 1 2

t₂

4 + t₁

t₁

  

= 0 + t

 

−17 −4 3

0 + t 1

0 0 0 1 −1 4

Solution. The number of unknowns is n = 5 and the associated coefficient matrix A has rank r = 3. Thus, the solution set is parametrized by d = n − r = 2 parameters. This system was considered in Example 2.4 and the general solution was found to be

x₁= −89 − 31t₁+ 19t₂

x₂= −17 − 4t₁+ 3t₂

x₃= t₂

x₄= 4 + t₁

x₅= t₁

where t₁and t₂are arbitrary real numbers. The solution in vector form therefore takes the form

^x₁^ −89 − 31t₁+ 19t₂^ −89^ −31^ 19^

      

 

      

22 of 202

Lecture 3

A fundamental problem in linear algebra is solving vector equations for an unknown vector. As an example, suppose that you are given the vectors

v = −

 ₄ ₋₂

   

^−14^

 

8 , v = 9 , b = 43 ,

3 4 6

and asked to find numbers x₁and x₂such that x₁v₁+ x₂v₂= b, that is,

x −

 ₄

 

^−2^ −14^

   

8 + x 9 = 43 .

3 4 6

Here the unknowns are the scalars x₁and x₂. After some guess and check, we find that

x₁= −2 and x₂= 3 is a solution to the problem since

 ₄ ₋₂ ₋₁₄

     

−2 −8 + 3 9 = 43 .

3 4 6

In some sense, the vector b is a combination of the vectors v₁and v₂. This motivates the following definition.

Definition 3.2: Let v₁, v₂, . . . , v_pbe vectors in Rⁿ. A vector b is said to be a linear combination of the vectors v₁, v₂, . . . , v_pif there exists scalars x₁, x₂, . . . , x_psuch that x₁v₁+ x₂v₂+ · · · + x_pv_p= b.

The scalars in a linear combination are called the coefficients of the linear combination. As an example, given the vectors

v = −

−6

   

2 , v = 4 , v

2 3

−3

= 5 , b = 0

6 −27

 ₁ ₋₂ ₋₁  

   

you can verify (and you should!) that

3v₁+ 4v₂− 2v₃= b.

Therefore, we can say that b is a linear combination of v₁, v₂, v₃with coefficients x₁= 3,

x₂= 4, and x₃= −2.

3.2 The linear combination problem

The linear combination problem is the following:

23 of 202

Vector Equations

Problem: Given vectors v₁, . . . , v_pand b, is b a linear combination of v₁, v₂, . . . , v_p? For example, say you are given the vectors

₁

 

v = 2 , v

1 2

₁

 

= 1 , v

= 1

₂

 

and also

 ₀

 

b = 1 .

−2

Does there exist scalars x₁, x₂, x₃such that

x₁v₁+ x₂v₂+ x₃v₃= b? (3.1)

For obvious reasons, equation (3.1) is called a vector equation and the unknowns are x₁, x₂, and x₃. To gain some intuition with the linear combination problem, let’s do an example by inspection.

Example 3.3. Let v₁= (1, 0, 0), let v₂= (0, 0, 1), let b₁= (0, 2, 0), and let b₂= (−3, 0, 7). Are b₁and b₂linear combinations of v₁, v₂?

Solution. For any scalars x₁and x₂

_x₁  ₀ _x₁ ₀

1 1 2 2

x v + x v = 0 + 0 = 0 =

/ 2

0 x₂x₂0

       

and thus no, b₁is not a linear combination of v₁, v₂, v₃. On the other hand, by inspection we have that

₋₃ ₀ ₋₃

     

−3v + 7v = 0 + 0 = 0 = b

1 2 2

0 7 7

and thus yes, b₂is a linear combination of v₁, v₂, v₃. These examples, of low dimension, were more-or-less obvious. Going forward, we are going to need a systematic way to solve the linear combination problem that does not rely on pure inspection.

We now describe how the linear combination problem is connected to the problem of solving a system of linear equations. Consider again the vectors

₁ ₁ ₂  ₀

v = 2 , v

1 2

1 0

   

= 1 , v

= ^1^, b = ^1 ^.

2 −2

Does there exist scalars x₁, x₂, x₃such that

x₁v₁+ x₂v₂+ x₃v₃= b?

(3.2)

24 of 202

Lecture 3

1 1 2 2 3 3 1 2 3 1 2 3

First, let’s expand the left-hand side of equation (3.2):

^x₁^ x₂^ 2x₃^ x₁+ x₂+ 2x₃^

       

x v + x v + x v = 2x + x + x = 2x + x + x .

1 2 3

x₁0 2x₃x₁+ 2x₃

We want equation (3.2) to hold so let’s equate the expansion x₁v₁+ x₂v₂+ x₃v₃with b. In other words, set

^x₁+ x₂+ 2x₃^ 0 ^

   

2x + x + x = 1 .

x₁+ 2x₃−2

Comparing component-by-component in the above relationship, we seek scalars x₁, x₂, x₃

satisfying the equations

x₁+ x₂+ 2x₃= 0

2x₁+ x₂+ x₃= 1 (3.3)

x₁+ 2x₃= −2.

This is just a linear system consisting of m = 3 equations and n = 3 unknowns! Thus, the linear combination problem can be solved by solving a system of linear equations for the unknown scalars x₁, x₂, x₃. We know how to do this. In this case, the augmented matrix of the linear system (3.3) is



1 2

[A b] = 2 1 1 1

1 0 2 −2

_1 0



Notice that the 1st column of A is just v₁, the second column is v₂, and the third column is v₃, in other words, the augment matrix is

[A b] = v v₂v₃

Applying the row reduction algorithm, the solution is

x₁= 0, x₂= 2, x₃= −1

and thus these coefficients solve the linear combination problem. In other words,

0v₁+ 2v₂− v₃= b

In this case, there is only one solution to the linear system, so b can be written as a linear combination of v₁, v₂, . . . , v_pin only one (or unique) way. You should verify these computations.

We summarize the previous discussion with the following:

The problem of determining if a given vector b is a linear combination of the vectors

v₁, v₂, . . . , v_pis equivalent to solving the linear system of equations with augmented matrix

b=v₁v₂· · · v_pb.

25 of 202

Vector Equations

Applying the existence and uniqueness Theorem 2.5, the only three possibilities to the linear combination problem are:

If the linear system is inconsistent then b is not a linear combination of v₁, v₂, . . . , v_p, i.e., there does not exist scalars x₁, x₂, . . . , x_psuch that x₁v₁+ x₂v₂+ · · · + x_pv_p= b.
If the linear system is consistent and the solution is unique then b can be written as a linear combination of v₁, v₂, . . . , v_pin only one way.
If the the linear system is consistent and the solution set has free parameters, then b

can be written as a linear combination of v₁, v₂, . . . , v_pin infinitely many ways.

Example 3.4. Is the vector b = (7, 4, −3) a linear combination of the vectors

 ₁ ₂

v = −

−5

   

2 , v = 5 ?

Solution. Form the augmented matrix:





1 2

v₁v₂b = −2 5 4

₇

−5 6 −3

The RREF of the augmented matrix is





0 1 2

0 0 0

_1 3



and therefore the solution is x₁= 3 and x₂= 2. Therefore, yes, b is a linear combination of

v₁, v₂:

1 2

3v + 2v = 3 −

 ₁ ₂  ₇

     

2 + 2 5 = 4 = b

−5 6 −3

Notice that the solution set does not contain any free parameters because n = 2 (unknowns) and r = 2 (rank) and so d = 0. Therefore, the above linear combination is the only way to write b as a linear combination of v₁and v₂.

Example 3.5. Is the vector b = (1, 0, 1) a linear combination of the vectors

₁ ₀ ₂

v =

   

0 , v = 1 , v =

 

1 ?

26 of 202

Lecture 3

Solution. The augmented matrix of the corresponding linear system is



_1 1



0 2

0 1 1 0 .

2 0 4 1

After row reducing we obtain that



₁

₁



0 2

0 1 1 0 .

0 0 0 −1

The last row is inconsistent, and therefore the linear system does not have a solution. There- fore, no, b is not a linear combination of v₁, v₂, v₃.

Example 3.6. Is the vector b = (8, 8, 12) a linear combination of the vectors

₂ ₄ ₆

1 2

   

v = 1 , v = 2 ,

 

v = 4 ?

Solution. The augmented matrix is



₂

₈



REF



₁

₄



4 6 2 3

1 2 4 8 −−→ 0 0 1 4 .

3 6 9 12 0 0 0 0

The system is consistent and therefore b is a linear combination of v₁, v₂, v₃. In this case, the solution set contains d = 1 free parameters and therefore, it is possible to write b as a linear combination of v₁, v₂, v₃in infinitely many ways. In terms of the parameter t, the solution set is

x₁= −8 − 2t

x₂= t x₃= 4

Choosing any t gives scalars that can be used to write b as a linear combination of v₁, v₂, v₃. For example, choosing t = 1 we obtain x₁= −10, x₂= 1, and x₃= 4, and you can verify

that

1 2 3

−10v + v + 4v = −

₂ ₄ ₆  ₈

       

10 1 + 2 + 4 4 = 8 = b

3 6 9 12

Or, choosing t = −2 we obtain x₁= −4, x₂= −2, and x₃= 4, and you can verify that

1 2 3

−4v − 2v + 4v = −

₂ ₄ ₆  ₈

       

4 1 − 2 2 + 4 4 = 8 = b

27 of 202

Vector Equations

We make a few important observations on linear combinations of vectors. Given vectors v₁, v₂, . . . , v_p, there are certain vectors b that can be written as a linear combination of v₁, v₂, . . . , v_pin an obvious way. The zero vector b = 0 can always be written as a linear combination of v₁, v₂, . . . , v_p:

0 = 0v₁+ 0v₂+ · · · + 0v_p.

Each v_iitself can be written as a linear combination of v₁, v₂, . . . , v_p, for example,

v₂= 0v₁+ (1)v₂+ 0v₃+ · · · + 0v_p.

More generally, any scalar multiple of v_ican be written as a linear combination of v₁, v₂, . . . , v_p, for example,

xv₂= 0v₁+ xv₂+ 0v₃+ · · · + 0v_p.

By varying the coefficients x₁, x₂, . . . , x_p, we see that there are infinitely many vectors b that can be written as a linear combination of v₁, v₂, . . . , v_p. The “space” of all the possible linear combinations of v₁, v₂, . . . , v_phas a name, which we introduce next.

3.3 The span of a set of vectors

Given a set of vectors {v₁, v₂, . . . , v_p}, we have been considering the problem of whether or not a given vector b is a linear combination of {v₁, v₂, . . . , v_p}. We now take another point of view and instead consider the idea of generating all vectors that are a linear combination of {v₁, v₂, . . . , v_p}. So how do we generate a vector that is guaranteed to be a linear combination of {v₁, v₂, . . . , v_p}? For example, if v₁= (2, 1, 3), v₂= (4, 2, 6) and v₃= (6, 4, 9) then

₂ ₄ ₆  ₈

1 2 3

−10v + v + 4v = −

       

10 1 + 2 + 4 4 = 8 .

3 6 9 12

Thus, by construction, the vector b = (8, 8, 12) is a linear combination of {v₁, v₂, v₃}. This discussion leads us to the following definition.

Definition 3.7: Let v₁, v₂, . . . , v_pbe vectors. The set of all vectors that are a linear combination of v₁, v₂, . . . , v_pis called the span of v₁, v₂, . . . , v_p, and we denote it by

S = span{v₁, v₂, . . . , v_p}.

By definition, the span of a set of vectors is a collection of vectors, or a set of vectors. If b is a linear combination of v₁, v₂, . . . , v_pthen b is an element of the set span{v₁, v₂, . . . , v_p}, and we write this as

b ∈ span{v₁, v₂, . . . , v_p}.

28 of 202

Lecture 3

By definition, writing that b ∈ span{v₁, v₂, . . . , v_p} implies that there exists scalars x₁, x₂, . . . , x_p

such that

x₁v₁+ x₂v₂+ · · · + x_pv_p= b.

Even though span{v₁, v₂, . . . , v_p} is an infinite set of vectors, it is not necessarily true that

it is the whole space Rⁿ.

The set span{v₁, v₂, . . . , v_p} is just a collection of infinitely many vectors but it has some geometric structure. In R²and R³we can visualize span{v₁, v₂, . . . , v_p}. In R², the span of a single nonzero vector, say v ∈ R², is a line through the origin in the direction of v, see Figure 3.1.

Figure 3.1: The span of a single non-zero vector in R².

In R², the span of two vectors v₁, v₂∈ R²that are not multiples of each other is all of R². That is, span{v₁, v₂} = R². For example, with v₁= (1, 0) and v₂= (0, 1), it is true that span{v₁, v₂} = R². In R³, the span of two vectors v₁, v₂∈ R³that are not multiples of each other is a plane through the origin containing v₁and v₂, see Figure 3.2. In R³, the

— 3

— 2

— 4

— 3

— 1

^z0

— 2

— 4

— 3

— 1

sspan{v,,w}

— 2

y ₁

— 1

1 x

Figure 3.2: The span of two vectors, not multiples of each other, in R³.

span of a single vector is a line through the origin, and the span of three vectors that do not depend on each other (we will make this precise soon) is all of R³.

Example 3.8. Is the vector b = (7, 4, −3) in the span of the vectors v₁= (1, −2, −5), v₂= (2, 5, 6)? In other words, is b ∈ span{v₁, v₂}?

29 of 202

Vector Equations

Solution. By definition, b is in the span of v₁and v₂if there exists scalars x₁and x₂such that

x₁v₁+ x₂v₂= b,

that is, if b can be written as a linear combination of v₁and v₂. From our previous discussion

on the linear combination problem, we must consider the augmented matrix v v₂b .

Using row reduction, the augmented matrix is consistent and there is only one solution (see Example 3.4). Therefore, yes, b ∈ span{v₁, v₂} and the linear combination is unique.

Example 3.9. Is the vector b = (1, 0, 1) in the span of the vectors v₁= (1, 0, 2), v₂=

(0, 1, 0), v₃= (2, 1, 4)?

Solution. From Example 3.5, we have that

REF



₁

0 2

v₁v₂v₃b −−→ 0 1 1 0

0 0 0 −1

₁



The last row is inconsistent and therefore b is not in span{v₁, v₂, v₃}.

Example 3.10. Is the vector b = (8, 8, 12) in the span of the vectors v₁= (2, 1, 3), v₂=

(4, 2, 6), v₃= (6, 4, 9)?

Solution. From Example 3.6, we have that

REF



₁

₄



2 3

v₁v₂v₃b −−→ 0 0 1 4 .

0 0 0 0

The system is consistent and therefore b ∈ span{v₁, v₂, v₃}. In this case, the solution set contains d = 1 free parameters and therefore, it is possible to write b as a linear combination of v₁, v₂, v₃in infinitely many ways.

Example 3.11. Answer the following with True or False, and explain your answer.

(a) The vector b = (1, 2, 3) is in the span of the set of vectors

^_₋₁  ₂  ₄^_



0 0 0

     



3 , −7 , −5 .

(b) The solution set of the linear system whose augmented matrix is v v₂v₃bis the

same as the solution set of the vector equation x₁v₁+ x₂v₂+ x₃v₃= b.

either b can be written as a linear combination of v₁, v₂, v₃or b ∈ span{v₁, v₂, v₃}.

(d) The span of the vectors {v₁, v₂, v₃} (at least one of which is nonzero) contains only the vectors v₁, v₂, v₃and the zero vector 0.

30 of 202

Lecture 3

After this lecture you should know the following:

what a vector is
what a linear combination of vectors is
what the linear combination problem is
the relationship between the linear combination problem and the problem of solving linear systems of equations
how to solve the linear combination problem
what the span of a set of vectors is
the relationship between what it means for a vector b to be in the span of v₁, v₂, . . . , v_p

and the problem of writing b as a linear combination of v₁, v₂, . . . , v_p

the geometric interpretation of the span of a set of vectors

31 of 202

Vector Equations

32 of 202

Lecture 4

The Matrix Equation Ax = b

In this lecture, we introduce the operation of matrix-vector multiplication and how it relates to the linear combination problem.

4.1 Matrix-vector multiplication

We begin with the definition of matrix-vector multiplication.

Definition 4.1: Given a matrix A ∈ M_m_×_nand a vector x ∈ Rⁿ,

A =







a · · ·

· · · a

. . . .

. . .

^am1 ^am2 ^am3 ^amn

11 ^a12 ^a13 ^a1n ^

21 ^a22 ^a23



, x =



· · · x_n

_x₁

 

  

Ax =

we define the product of A and x as the vector Ax in R^mgiven by







11 ^a12 ^a13

21 ^a22

. . .

^am1 ^am2 ^am3

· · ·

^amn

˛_A¸

x_n

x ` ˛_x¸ x

   

   

2 21 1 22 2

a · · · a x a x + a x + · · · + a x

2n n

. .

a_m₁x₁+ a_m₂x₂+ · · · + a_mnx_n

a · · · a₁_n^{ }x₁^ a₁₁x₁+ a₁₂x₂+ · · · + a₁_nx_n^





For the product Ax to be well-defined, the number of columns of A must equal the number of components of x. Another way of saying this is that the outer dimension of A must equal the inner dimension of x:

(m × n) · (n × 1) → m × 1

Example 4.2. Compute Ax.

33 of 202

The Matrix Equation Ax = b

(a)

A =1

−1 3 0, x =

 ₂

−4

−3

 

 

(b)

A =

3 3 −2

4 −4 −1

, x = 0

−1

 

 

(c)

A =







−1 1 0

4 1 −2

3 −3 3

0 −2 −3







, x = 2

₋₁

−2

 

Solution. We compute: (a)

Ax =1 −1 3 0

 ₂

−4

−3

 

 

=(1)(2) + (−1)(−4) + (3)(−3) + (0)(8)=−3

(b)

Ax =

3 3 −2

4 −4 −1

−1

 

 

(3)(1) + (3)(0) + (−2)(−1)

(4)(1) + (−4)(0) + (−1)(−1)

34 of 202

Lecture 4

(c)

Ax =

^−1 1





₀^₋₁

4 1 −2

3 −3 3

0 −2 −3



−2

^ 







(−1)(−

1) + (1)(2) + (0)(−2)

(4)(−

1) + (1)(2) + (−

2)(−2)

(3)(−1) + (−3)(2) + (3)(−2)

(0)(−1) + (−2)(2) + (−3)(−2)







−15

 

 

 

We now list two important properties of matrix-vector multiplication.

Theorem 4.3: Let A be an m × n a matrix.

For any vectors u, v in Rⁿit holds that

A(u + v) = Au + Av.

For any vector u and scalar c it holds that

A(cu) = c(Au).

Example 4.4. For the given data, verify that the properties of Theorem 4.3 hold:

A =

−1 2

3 −3

2 1 3 −1

, u = , v = , c = −2.

4.2 Matrix-vector multiplication and linear combina- tions

Recall the general definition of matrix-vector multiplication Ax is







11 ^a12 ^a13

21 ^a22 ^a23

. . .

^am1 ^am2 ^am3

· · ·

^amn

x_n

   

   

2 21 1 22 2

· · · a x a x + a x + · · · + a x

2n n

. .

a_m₁x₁+ a_m₂x₂+ · · · + a_mnx_n

a · · · a₁_n^{ }x₁^ a₁₁x₁+ a₁₂x₂+ · · · + a₁_nx_n^





(4.1)

35 of 202

x₁v₁+ x₂v₂+ · · · + x_nv_n=





1 21

^x1^am1

 

 

2 22

. .

^x2^am2

  



+ · · · +







x a x a x a

n 2n

^xn^amn

The Matrix Equation Ax = b

There is an important way to decompose matrix-vector multiplication involving a linear combination. To see how, let v₁, v₂, . . . , v_ndenote the columns of A and consider the following linear combination:

^^x1^a11 ^ ^x2^a12 ^ ^xn^a1n ^











x a + x a + · · · + x a

1 11 2 12 n 1n

1 21 2 22

x a + x a + · · · + x a

n 2n

x₁a_m₁+ x₂a_m₂+ · · · + x_na_mn







(4.2)

We observe that expressions (4.1) and (4.2) are equal! Therefore, if A = v v₂

· · ·

_v_n

and x = (x₁, x₂, . . . , x_n) then

Ax = x₁v₁+ x₂v₂+ · · · + x_nv_n.

In summary, the vector Ax is a linear combination of the columns of A where the scalar in the linear combination are the components of x! This (important) observation gives an alternative way to compute Ax.

Example 4.5. Given

A =

^−1 1





₀

4 1 −2

3 −3 3

0 −2 −3





₋₁

−2

 

, x = 2 ,

compute Ax in two ways: (1) using the original Definition 4.1, and (2) as a linear combination of the columns of A.

4.3 The matrix equation problem

As we have seen, with a matrix A and any vector x, we can produce a new output vector via the multiplication Ax. If A is a m×n matrix then we must have x ∈ Rⁿand the output vector Ax is in R^m. We now introduce the following problem:

Problem: Given a matrix A ∈ M_m_×_nand a vector b ∈ R^m, find, if possible, a vector

x ∈ Rⁿsuch that

Ax = b. (⋆)

Equation (⋆) is a matrix equation where the unknown variable is x. If u is a vector such that Au = b, then we say that u is a solution to the equation Ax = b. For example,

36 of 202

Lecture 4

suppose that

A =

1 0

−3

1 0 7

, b = .

Does the equation Ax = b have a solution? Well, for any x =

x₂

we have that

Ax =

1 0 x

1 0 x₂x₁

and thus any output vector Ax has equal entries. Since b does not have equal entries then the equation Ax = b has no solution.

We now describe a systematic way to solve matrix equations. As we have seen, the vector Ax is a linear combination of the columns of A with the coefficients given by the components of x. Therefore, the matrix equation problem is equivalent to the linear combination problem. In Lecture 2, we showed that the linear combination problem can be solved by solving a system of linear equations. Putting all this together then, if A =v₁v₂· · · v_nand b ∈ R^mthen:

To find a vector x ∈ Rⁿthat solves the matrix equation

Ax = b

we solve the linear system whose augmented matrix is

b=v₁v₂· · · v_nb.

From now on, a system of linear equations such as

^a11^x1 ^a21^x1 ^a31^x1

^a12^x2 ^a22^x2 ^a32^x2

^a13^x3 ^a23^x3 ^a33^x3

^a1n^xn ^a2n^xn ^a3n^xn

= b₁

= b₂

= b₃

+ · · · +

. .

+ · · · +

= b_m

^am1^x1

^am2^x2 ^am3^x3 ^amn^xn

will be written in the compact form

Ax = b

where A is the coefficient matrix of the linear system, b is the output vector, and x is the unknown vector to be solved for. We summarize our findings with the following theorem.

Theorem 4.6: Let A ∈ M_m_×_nand b ∈ R^m. The following statements are equivalent:

The equation Ax = b has a solution.
The vector b is a linear combination of the columns of A.
The linear system represented by the augmented matrixA bis consistent.

37 of 202

The Matrix Equation Ax = b

Example 4.7. Solve, if possible, the matrix equation Ax = b if



−6 12

 ₁₋₂

  

3 −4^

A = 1 5 2 , b = 4 .

−3 −7

Solution. First form the augmented matrix:



 ₁

[A b] = 1

3 −4 −2^

5 2 4

−3 −7 −6 12



Performing the row reduction algorithm we obtain that

 ₁



1 5 2 4 ∼ 0

3 −4 −2^ 1 3 −4 −2^

  

1 3 3 .

−3 −7 −6 12 0 0 −12 0

Here r = rank(A) = 3 and therefore d = 0, i.e., no free parameters. Peforming back substitution we obtain that x₁= −11, x₂= 3, and x₃= 0. Thus, the solution to the matrix equation is unique (no free parameters) and is given by

−11

x = ^3

 



Let’s verify that Ax = b:

 ₁



Ax = 1

5 2

3 −4 −11

   

   

−11 + 9 + 0

−3 −7 −6 0 33 − 21

−2

  

  

3 = −11 + 15 + 0 = 4 = b

In other words, b is a linear combination of the columns of A:

 ₁ ₃ ₋₄ ₋₂

−3

−6

−11 1 + 3 5 + 0 2 = 4

       

38 of 202

Lecture 4

Example 4.8. Solve, if possible, the matrix equation Ax = b if

A =

1 2 3

2 4 −4

, b = .

^{1 2 3}−2R₁+R₂^{1 2 3}

2 4 −4 0 0 −10

Solution. Row reducing the augmented matrixA bwe get

−−−−−→ .

The last row is inconsistent and therefore there is no solution to the matrix equation Ax = b. In other words, b is not a linear combination of the columns of A.

Example 4.9. Solve, if possible, the matrix equation Ax = b if

A =

1 −1 2

0 3 6 −1

, b = .

Solution. First note that the unknown vector x is in R³because A has n = 3 columns. The linear system Ax = b has m = 2 equations and n = 3 unknowns. The coefficient matrix A

has rank r = 2, and therefore the solution set will contain d = n − r = 1 parameter. The

augmented matrix A b is

b =

1 −1 2 2

0 3 6 −1

Let x₃= t be the parameter and use the last row to solve for x₂:

x = − − 2t

Now use the first row to solve for x₁:

1 2 3

x = 2 + x − 2x = 2 + (− − 2t) −

2t = − 4t.

Thus, the solution set to the linear system is

x = − 4t

x = − − 2t

x₃= t

where t is an arbitrary number. Therefore, the matrix equation Ax = b has an infinite number of solutions and they can all be written as



— 4t

x = − − 2t

 



39 of 202

x = −7/3

−1

The Matrix Equation Ax = b

where t is an arbitrary number. Equivalently, b can be written as a linear combination of the columns of A in infinitely many ways. For example, choosing t = −1 gives the particular solution

^17/3 ^

 

and you can verify that

^17/3 ^

 

A −7/3 = b.

−1

Recall from Definition 3.7 that the span of a set of vectors v₁, v₂, . . . , v_p, which we denoted

by span{v₁, v₂, . . . , v_p}, is the space of vectors that can be written as a linear combination

of the vectors v₁, v₂, . . . , v_p.

Example 4.10. Is the vector b in the span of the vectors v₁, v₂?

b = 4 , v

₀  ₃ ₋₅

   

1 2

= −2 , v = 6

 

4 1 1

Solution. The vector b is in span{v₁, v₂} if we can find scalars x₁, x₂such that

x₁v₁+ x₂v₂= b.

If we let A ∈ R³^×²be the matrix

1 2

 ₃



A = [v v ] = −

2 6

1 1

−5^



then we need to solve the matrix equation Ax = b. Note that here x = Performing row reduction on the augmented matrix [A b] we get that

x₂

∈ R .

−2 6 4

  

 _3 0 ₁

~ 0 1 1.5

−5 0 2.5^



1 1 4 0 0 0

Therefore, the linear system is consistent and has solution

x =

2.5

1.5

Therefore, b is in span{v₁, v₂}, and b can be written in terms of v₁and v₂as

2.5v₁+ 1.5v₂= b

40 of 202

Lecture 4

If v₁, v₂, . . . , v_pare vectors in Rⁿand it happens to be true that span{v₁, v₂, . . . , v_p} = Rⁿthen we would say that the set of vectors {v₁, v₂, . . . , v_p} spans all of Rⁿ. From Theorem 4.6, we have the following.

m×n 1 2 n

Theorem 4.11: Let A ∈ M be a matrix with columns v , v , . . . , v , that is, A =

v₁v₂· · · v_n. The following are equivalent:

span{v₁, v₂, . . . , v_n} = R^m
Every b ∈ R^mcan be written as a linear combination of v₁, v₂, . . . , v_n.
The matrix equation Ax = b has a solution for any b ∈ R^m.
The rank of A is m.

Example 4.12. Do the vectors v₁, v₂, v₃span R³?

 ₁  ₂

v = −

3 , v = −

   

2 3

4 , v = 2

5 2 3

₋₁

 

1 2 3 1

Solution. From Theorem 4.11, the vectors v , v , v span R if the matrix A = v v₂

_v₃

has rank r = 3 (leading entries in its REF/RREF). The RREF of A is

  

2 −1^ 1 0

−3 −4 2 ∼ 0 1 0

 _1 0



5 2 3 0 0 1

which does indeed have r = 3 leading entries. Therefore, regardless of the choice of b ∈ R³, the augmented matrix [A b] will be consistent. Therefore, the vectors v₁, v₂, v₃span R³:

span{v₁, v₂, v₃} = R³.

In other words, every vector b ∈ R³can be written as a linear combination of v₁, v₂, v₃.

After this lecture you should know the following:

how to multiply a matrix A with a vector x
that the product Ax is a linear combination of the columns of A
how to solve the matrix equation Ax = b if A and b are known
how to determine if a set of vectors {v₁, v₂, . . . , v_p} in R^mspans all of R^m

the relationship between the equation Ax = b, when b can be written as a linear

combination of the columns of A, and when the augmented matrix A b is consistent

(Theorem 4.6)

when the columns of a matrix A ∈ M_m_×_nspan all of R^m(Theorem 4.11)
the basic properties of matrix-vector multiplication Theorem 4.3

41 of 202

The Matrix Equation Ax = b

42 of 202

Lecture 5

Homogeneous and Nonhomogeneous Systems

5.1 Homogeneous linear systems

We begin with a definition.

Definition 5.1: A linear system of the form Ax = 0 is called a homogeneous linear system.

A homogeneous system Ax = 0 always has at least one solution, namely, the zero solution because A0 = 0. A homogeneous system is therefore always consistent. The zero solution x = 0 is called the trivial solution and any non-zero solution is called a nontrivial solution. From the existence and uniqueness theorem (Theorem 2.5), we know that a consistent linear system will have either one solution or infinitely many solutions. Therefore, a homogeneous linear system has nontrivial solutions if and only if its solution set has at least one parameter.

Recall that the number of parameters in the solution set is d = n − r, where r is the rank of the coefficient matrix A and n is the number of unknowns.

Example 5.2. Does the linear homogeneous system have any nontrivial solutions?

3x₁+ x₂− 9x₃= 0 x₁+ x₂− 5x₃= 0 2x₁+ x₂− 7x₃= 0

Solution. The linear system will have a nontrivial solution if the solution set has at least one free parameter. Form the augmented matrix:

1 −9

1 1 −5 0

2 1 −7 0

_3 0

 

43 of 202

Homogeneous and Nonhomogeneous Systems

The RREF is:

₃

1 −9

  

₀ ₁

0 −2

1 1 −5 0 ∼ 0 1 −3 0

₀



2 1 −7 0 0 0 0 0

The system is consistent. The rank of the coefficient matrix is r = 2 and thus there will be

d = 3 − 2 = 1 free parameter in the solution set. If we let x₃be the free parameter, say

x₃= t, then from the row equivalent augmented matrix

0 −2

0 1 −3 0

_1 0

 

0 0 0 0

we obtain that x₂= 3x₃= 3t and x₁= 2x₃= 2t. Therefore, the general solution of the linear system is

x₁= 2t x₂= 3t x₃= t

The general solution can be written in vector notation as

₂

 

x = 3 t

₂

 

Or more compactly if we let v = 3 then x = vt. Hence, any solution x to the linear

₂

 

system can be written as a linear combination of the vector v = 3 . In other words, the

solution set of the linear system is the span of the vector v:

span{v}.

Notice that in the previous example, when solving a homogeneous system Ax = 0 using

row reduction, the last column of the augmented matrix A 0 remains unchanged (always

0) after every elementary row operation. Hence, to solve a homogeneous system, we can row reduce the coefficient matrix A only and then set all rows equal to zero when performing back substitution.

Example 5.3. Find the general solution of the homogenous system Ax = 0 where



_1 4



2 2 1

A = 3 7 7 3 13 .

2 5 5 2 9

44 of 202

Lecture 5

Solution. After row reducing we obtain



₁

 

₄ ₁

2 2 1

A = 3 7 7 3 13 ∼ 0

2 5 5

0 0 1

1 1 0 1

2 9 0 0 0 0 0

₂



x =





−t − t

2 1

t₂t₃t₁



v v

1 2

−1 −1 0

Here n = 5, and r = 2, and therefore the number of parameters in the solution set is

d = n − r = 3. The second row of rref(A) gives the equation

x₂+ x₃+ x₅= 0.

Setting x₅= t₁and x₃= t₂as free parameters we obtain that

x₂= −x₃− x₅= −t₂− t₁.

From the first row we obtain the equation

x₁+ x₄+ 2x₅= 0

The unknown x₅has already been assigned, so we must now choose either x₁or x₄to be a parameter. Choosing x₄= t₃we obtain that

x₁= −x₄− 2x₅= −t₃− 2t₁

In summary, the general solution can be written as

^−t₃− 2t₁^ −2^ 0 ^ −1^

      

     

      

` ˛¸ x ` ˛¸ x ` ˛¸ x

= t 0 +t 1 +t 0 = t v + t v + t v

3 1 1 2 2 3 3

where t₁, t₂, t₃are arbitrary parameters. In other words, any solution x is in the span of

v₁, v₂, v₃:

x ∈ span{v₁, v₂, v₃}.

The form of the general solution in Example 5.3 holds in general and is summarized in the following theorem.

Theorem 5.4: Consider the homogenous linear system Ax = 0, where A ∈ M_m_×_nand

0 ∈ R^m. Let r be the rank of A.

If r = n then the only solution to the system is the trivial solution x = 0.
Otherwise, if r < n and we set d = n − r, then there exist vectors v₁, v₂, . . . , v_dsuch that any solution x of the linear system can be written as

x = t₁v₁+ t₂v₂+ · · · + t_pv_d.

45 of 202

Homogeneous and Nonhomogeneous Systems

In other words, any solution x is in the span of v₁, v₂, . . . , v_d:

x ∈ span{v₁, v₂, . . . , v_d}.

A solution x to a homogeneous system written in the form

x = t₁v₁+ t₂v₂+ · · · + t_pv_d

is said to be in parametric vector form.

5.2 Nonhomogeneous systems

As we have seen, a homogeneous system Ax = 0 is always consistent. However, if b is non- zero, then the nonhomogeneous linear system Ax = b may or may not have a solution. A natural question arises: What is the relationship between the solution set of the homogeneous system Ax = 0 and that of the nonhomogeneous system Ax = b when it is consistent? To answer this question, suppose that p is a solution to the nonhomogeneous system Ax = b, that is, Ap = b. And suppose that v is a solution to the homogeneous system Ax = 0, that is, Av = 0. Now let q = p + v. Then

Aq = A(p + v)

= Ap + Av

= b + 0

= b.

Therefore, Aq = b. In other words, q = p + v is also a solution of Ax = b. We have therefore proved the following theorem.

Theorem 5.5: Suppose that the linear system Ax = b is consistent and let p be a solution. Then any other solution q of the system Ax = b can be written in the form q = p + v, for some vector v that is a solution to the homogeneous system Ax = 0.

Another way of stating Theorem 5.5 is the following: If the linear system Ax = b is consistent and has solutions p and q, then the vector v = q−p is a solution to the homogeneous system Ax = 0. The proof is a simple computation:

Av = A(q − p) = Aq − Ap = b − b = 0.

More generally, any solution of Ax = b can be written in the form

q = p + t₁v₁+ t₂v₂+ · · · + t_pv_d

where p is one particular solution of Ax = b and the vectors v₁, v₂, . . . , v_dspan the solution set of the homogeneous system Ax = 0.

46 of 202

b ₀

^b btv

b _v

span{v}

Lecture 5

There is a useful geometric interpretation of the solution set of a general linear system. We saw in Lecture 3 that we can interpret the span of a set of vectors as a plane containing the zero vector 0. Now, the general solution of Ax = b can be written as

x = p + t₁v₁+ t₂v₂+ · · · + t_pv_d.

Therefore, the solution set of Ax = b is a shift of the span{v₁, v₂, . . . , v_d} by the vector p. This is illustrated in Figure 5.1.

p + span{v}

p + tv

Figure 5.1: The solution sets of a homogeneous and nonhomogeneous system.

Example 5.6. Write the general solution, in parametric vector form, of the linear system 3x₁+ x₂− 9x₃= 2

x₁+ x₂− 5x₃= 0 2x₁+ x₂− 7x₃= 1.

Solution. The RREF of the augmented matrix is:

  

1 −9 0 −2

1 1 −5 0 ∼ 0

2 1 −7 1 0

1 −3 −1

0 0 0

_3 2 _1 1



x = 3t − 1 = −

1 +t 3

The system is consistent and the rank of the coefficient matrix is r = 2. Therefore, there are d = 3 − 2 = 1 parameters in the solution set. Letting x₃= t be the parameter, from the second row of the RREF we have

x₂= 3t − 1

And from the first row of the RREF we have

x₁= 2t + 1

Therefore, the general solution of the system in parametric vector form is

^2t + 1^ 1 ^ 2^

     

0 1

` ˛_p¸ x `˛_v¸x

47 of 202

Homogeneous and Nonhomogeneous Systems

You should check that p = (1, −1, 0) solves the linear system Ax = b, and that v = (2, 3, 1) solves the homogeneous system Ax = 0.

Example 5.7. Write the general solution, in parametric vector form, of the linear system represented by the augmented matrix

−3 6

 _3 3

 

−1 1 −2 −1 .

−1 1 −2 −1

2 −2 4 2

Solution. Write the general solution, in parametric vector form, of the linear system repre- sented by the augmented matrix

^3 −3 6 3 ^

 

2 −2 4 2

The RREF of the augmented matrix is



 ₃

 

₃ ₁

−1 2

−3 6

−1 1 −2 −1 ∼ 0

2 −2 4

0 0 0

2 0 0 0 0

₁



Here n = 3, r = 1 and therefore the solution set will have d = 2 parameters. Let x₃= t₁

and x₂= t₂. Then from the first row we obtain

x₁= 1 + x₂− 2x₃= 1 + t₂− 2t₁

The general solution in parametric vector form is therefore

₁ ₋₂ ₁

1 2

x = 0 +t 0 +t 1

0 1 0

     

`˛_p¸x ` ˛_v¸₁x `˛_v¸₂x

You should verify that p is a solution to the linear system Ax = b:

Ap = b

And that v₁and v₂are solutions to the homogeneous linear system Ax = 0:

Av₁= Av₂= 0

48 of 202

Lecture 5

Summary

The material in this lecture is so important that we will summarize the main results. The solution set of a linear system Ax = b can be written in the form

x = p + t₁v₁+ t₂v₂+ · · · + t_dv_d

where Ap = b and where each of the vectors v₁, v₂, . . . , v_dsatisfies Av_i= 0. Loosely speaking,

{Solution set of Ax = b} = p + {Solution set of Ax = 0}

{Solution set of Ax = b} = p + span{v₁, v₂, . . . , v_d}

where p satisfies Ap = b and Av_i= 0.

After this lecture you should know the following:

what a homogeneous/nonhomogenous linear system is
when a homogeneous linear system has nontrivial solutions
how to write the general solution set of a homogeneous system in parametric vector form Theorem 5.4)
how to write the solution set of a nonhomogeneous system in parametric vector form Theorem 5.5)
the relationship between the solution sets of the nonhomogeneous equation Ax = b

and the homogeneous equation Ax = 0

49 of 202

Homogeneous and Nonhomogeneous Systems

50 of 202

Lecture 6

Linear Independence

6.1 Linear independence

In Lecture 3, we defined the span of a set of vectors {v₁, v₂, . . . , v_n} as the collection of all possible linear combinations

t₁v₁+ t₂v₂+ · · · + t_nv_n

and we denoted this set as span{v₁, v₂, . . . , v_n}. Thus, if x ∈ span{v₁, v₂, . . . , v_n} then by definition there exists scalars t₁, t₂, . . . , t_nsuch that

x = t₁v₁+ t₂v₂+ · · · + t_nv_n.

A natural question that arises is whether or not there are multiple ways to express x as a linear combination of the vectors v₁, v₂, . . . , v_n. For example, if v₁= (1, 2), v₂= (0, 1), v₃= (−1, −1), and x = (3, −1) then you can verify that x ∈ span{v₁, v₂, v₃} and x can be written in infinitely many ways using v₁, v₂, v₃. Here are three ways:

x = 3v₁− 7v₂+ 0v₃

x = −4v₁+ 0v₂− 7v₃

x = 0v₁− 4v₂− 3v₃.

The fact that x can be written in more than one way in terms of v₁, v₂, v₃suggests that there might be a redundancy in the set {v₁, v₂, v₃}. In fact, it is not hard to see that v₃= −v₁+v₂, and thus v₃∈ span{v₁, v₂}. The preceding discussion motivates the following definition.

Definition 6.1: A set of vectors {v₁, v₂, . . . , v_n} is said to be linearly dependent if some v_jcan be written as a linear combination of the other vectors, that is, if

v_j∈ span{v₁, . . . , v_j₋₁, v_j₊₁, . . . , v_n}.

If {v₁, v₂, . . . , v_n} is not linearly dependent then we say that {v₁, v₂, . . . , v_n} is linearly independent.

51 of 202

Linear Independence

Example 6.2. Consider the vectors

₁

 

v = 2 , v

1 2

₄

 

= 5 , v

₂

 

= 1 .

Show that they are linearly dependent.

Solution. By inspection, we have

₂ ₂ ₄

     

2v + v = 4 + 1 = 5 = v

1 3 2

6 0 6

Thus, v₂∈ span{v₁, v₃} and therefore {v₁, v₂, v₃} is linearly dependent.

Notice that in the previous example, the equation 2v₁+ v₃= v₂is equivalent to

2v₁− v₂+ v₃= 0.

Hence, because {v₁, v₂v₃} is a linearly dependent set, it is possible to write the zero vector 0 as a linear combination of {v₁, v₂v₃} where not all the coefficients in the linear combination are zero. This leads to the following characterization of linear independence.

Theorem 6.3: The set of vectors {v₁, v₂, . . . , v_n} is linearly independent if and only if 0

can be written in only one way as a linear combination of {v₁, v₂, . . . , v_n}. In other words,

t₁v₁+ t₂v₂+ · · · + t_nv_n= 0

then necessarily the coefficients t₁, t₂, . . . , t_nare all zero.

Proof. If {v₁, v₂, . . . , v_n} is linearly independent then every vector x ∈ span{v₁, v₂, . . . , v_n} can be written uniquely as a linear combination of {v₁, v₂, . . . , v_n}, and this applies to the particular case of the zero vector x = 0.

Now assume that 0 can be written uniquely as a linear combination of {v₁, v₂, . . . , v_n}.

In other words, assume that if

t₁v₁+ t₂v₂+ · · · + t_nv_n= 0

then t₁= t₂= · · · = t_n= 0. Now take any x ∈ span{v₁, v₂, . . . , v_n} and suppose that there are two ways to write x in terms of {v₁, v₂, . . . , v_n}:

r₁v₁+ r₂v₂+ · · · + r_nv_n= x

s₁v₁+ s₂v₂+ · · · + s_nv_n= x.

Subtracting the second equation from the first we obtain that

(r₁− s₁)v₁+ (r₂− s₂)v₂+ · · · + (r_n− s_n)v_n= x − x = 0.

52 of 202

Lecture 6

The above equation is a linear combination of v₁, v₂, . . . , v_nresulting in the zero vector 0. But we are assuming that the only way to write 0 in terms of {v₁, v₂, . . . , v_n} is if all the

coefficients are zero. Therefore, we must have r₁− s₁= 0, r₂− s₂= 0, . . . , r_n− s_n= 0, or

equivalently that r₁= s₁, r₂= s₂, . . . , r_n= s_n. Therefore, the linear combinations

r₁v₁+ r₂v₂+ · · · + r_nv_n= x

s₁v₁+ s₂v₂+ · · · + s_nv_n= x

are actually the same. Therefore, each x ∈ span{v₁, v₂, . . . , v_n} can be written uniquely in terms of {v₁, v₂, . . . , v_n}, and thus {v₁, v₂, . . . , v_n} is a linearly independent set.

Because of Theorem 6.3, an alternative definition of linear independence of a set of vectors

{v₁, v₂, . . . , v_n} is that the vector equation

x₁v₁+ x₂v₂+ · · · + x_nv_n= 0

has only the trivial solution, i.e., the solution x₁= x₂= · · · = x_n= 0. Thus, if {v₁, v₂, . . . , v_n}

is linearly dependent, then there exist scalars x₁, x₂, . . . , x_nnot all zero such that

x₁v₁+ x₂v₂+ · · · + x_nv_n= 0.

Hence, if we suppose for instance that x_n/= 0 then we can write v_nin terms of the vectors

v₁, . . . , v_n₋₁as follows:

^xn ^xn

1 2

n−1

^xn

v = − v − v − · · · − v

n−1

In other words, v_n∈ span{v₁, v₂, . . . , v_n₋₁}.

According to Theorem 6.3, the set of vectors {v₁, v₂, . . . , v_n} is linearly independent if

the equation

x₁v₁+ x₂v₂+ · · · + x_nv_n= 0 (6.1)

has only the trivial solution. Now, the vector equation (6.1) is a homogeneous linear system of equations with coefficient matrix

A =v₁v₂· · · v_n.

Therefore, the set {v₁, v₂, . . . , v_n} is linearly independent if and only if the the homogeneous system Ax = 0 has only the trivial solution. But the homogeneous system Ax = 0 has only the trivial solution if there are no free parameters in its solution set. We therefore have the following.

Theorem 6.4: The set {v₁, v₂, . . . , v_n} is linearly independent if and only if the the rank of A is r = n, that is, if the number of leading entries r in the REF (or RREF) of A is exactly n.

Example 6.5. Are the vectors below linearly independent?

₀ ₁  ₄

v =

   

2 3

1 , v = 2 , v =

₋₁

53 of 202

Linear Independence

Solution. Let A be the matrix





0 1

A = v₁v₂v₃= 1 2 −1

5 8 0

₄



Performing elementary row operations we obtain

₁



A ∼ 0

1 4

0 0 13

2 −1^



Clearly, r = rank(A) = 3, which is equal to the number of vectors n = 3.

{v₁, v₂, v₃} is linearly independent.

Example 6.6. Are the vectors below linearly independent?

₁ ₄ ₂

Therefore,

v =

2 , v 5

   

= , v = 1

 

Solution. Let A be the matrix





1 4

A = v₁v₂v₃= 2 5 1

₂



3 6 0

Performing elementary row operations we obtain

₁



A ∼ 0

₂

−3 −3

0 0 0



Clearly, r = rank(A) = 2, which is not equal to the number of vectors, n = 3. Therefore,

{v₁, v₂, v₃} is linearly dependent. We will find a nontrivial linear combination of the vectors

v₁, v₂, v₃that gives the zero vector 0. The REF of A = [v₁v₂v₃] is

A ∼ 0 −3 −3

_1 2

 

0 0 0

Since r = 2, the solution set of the linear system Ax = 0 has d = n − r = 1 free parameter.

Using back substitution on the REF above, we find that the general solution of Ax = 0

written in parametric form is

x = t −

 ₂

 

The vector

v =

 ₂

₋₁

54 of 202

Lecture 6

spans the solution set of the system Ax = 0. Choosing for instance t = 2 we obtain the solution

x = t −

 ₂  ₄

1 = −

   

2 .

Therefore,

4v₁− 2v₂+ 2v₃= 0

is a non-trivial linear combination of v₁, v₂, v₃that gives the zero vector 0. And, for instance,

v₃= −2v₁+ v₂

that is, v₃∈ span{v₁, v₂}.

Below we record some simple observations on the linear independence of simple sets:

A set consisting of a single non-zero vector {v₁} is linearly independent. Indeed, if v₁

is non-zero then

tv₁= 0

is true if and only if t = 0.

A set consisting of two non-zero vectors {v₁, v₂} is linearly independent if and only if neither of the vectors is a multiple of the other. For example, if v₂= tv₁then

tv₁− v₂= 0

is a non-trivial linear combination of v₁, v₂giving the zero vector 0.

Any set {v₁, v₂, . . . , v_p} containing the zero vector, say that v_p= 0, is linearly depen- dent. For example, the linear combination

0v₁+ 0v₂+ · · · + 0v_p₋₁+ 2v_p= 0

is a non-trivial linear combination giving the zero vector 0.

6.2 The maximum size of a linearly independent set

The next theorem puts a constraint on the maximum size of a linearly independent set in

Rⁿ.

Theorem 6.7: Let {v₁, v₂, . . . , v_p} be a set of vectors in Rⁿ. If p > n then v₁, v₂, . . . , v_pare linearly dependent. Equivalently, if the vectors v₁, v₂, . . . , v_pin Rⁿare linearly inde- pendent then p ≤ n.

55 of 202

Linear Independence

Proof. Let A = v v₂

· · · v . Thus, A is a n × p matrix. Since A has n rows, the

maximum rank of A is n, that is r ≤ n. Therefore, the number of free parameters d = p − r is always positive because p > n ≥ r. Thus, the homogeneous system Ax = 0 has non-trivial solutions. In other words, there is some non-zero vector x ∈ R^psuch that

Ax = x₁v₁+ x₂v₂+ · · · + x_pv_p= 0

and therefore {v₁, v₂, . . . , v_p} is linearly dependent.

Theorem 6.7 will be used when we discuss the notion of the dimension of a space. Although we have not discussed the meaning of dimension, the above theorem says that in n-dimensional space Rⁿ, a set of vectors {v₁, v₂, . . . , v_p} consisting of more than n vectors is automatically linearly dependent.

Example 6.8. Are the vectors below linearly independent?

 ₈  ₄ ₂  ₃

v =

 

 

, v =

3 11

−4

 

 

, v₃=

 

 

, v₄=

−9

−5

−2 6 1 3

 

 

, v =

 ₀

−2

−7

 

 

1 2 3 4 5

1 5

Solution. The vectors v , v , v , v , v are in R . Therefore, by Theorem 6.7, the set {v , . . . , v }

is linearly dependent. To see this explicitly, let A = v₁v₂v₃v₄v₅. Then

A ∼





₁

0 0 0 −1^

0 1 0 0 1

0 0 1 0 0

0 0 0 1 −2





One solution to the linear system Ax = 0 is x = (−1, 1, 0, −2, −1) and therefore

(−1)v₁+ (1)v₂+ (0)v₃+ (−2)v₄+ (−1)v₅= 0

Example 6.9. Suppose that the set {v₁, v₂, v₃, v₄} is linearly independent. Show that the set {v₁, v₂, v₃} is also linearly independent.

Solution. We must argue that if there exists scalars x₁, x₂, x₃such that

x₁v₁+ x₂v₂+ x₃v₃= 0

then necessarily x₁, x₂, x₃are all zero. Suppose then that there exists scalars x₁, x₂, x₃such that

x₁v₁+ x₂v₂+ x₃v₃= 0.

Then clearly it holds that

x₁v₁+ x₂v₂+ x₃v₃+ 0v₄= 0.

But the set {v₁, v₂, v₃, v₄} is linearly independent, and therefore, it is necessary that x₁, x₂, x₃

are all zero. This proves that v₁, v₂, v₃are also linearly independent.

56 of 202

Lecture 6

The previous example can be generalized as follows: If {v₁, v₂, . . . , v_d} is linearly inde- pendent then any (non-empty) subset of the set {v₁, v₂, . . . , v_d} is also linearly independent.

After this lecture you should know the following:

the definition of linear independence and be able to explain it to a colleague
how to test if a given set of vectors are linearly independent (Theorem 6.4)

1 2 p

the relationship between the linear independence of {v , v , . . . , v } and the solution

set of the homogeneous system Ax = 0, where A = v₁v₂· · · v_p

that in Rⁿ, any set of vectors consisting of more than n vectors is automatically linearly dependent (Theorem 6.7)

57 of 202

Linear Independence

58 of 202

Lecture 7

Introduction to Linear Mappings

7.1 Vector mappings

By a vector mapping we mean simply a function

T : Rⁿ→ R^m.

The domain of T is Rⁿand the co-domain of T is R^m. The case n = m is allowed of course. In engineering or physics, the domain is sometimes called the input space and the co-domain is called the output space. Using this terminology, the points x in the domain are called the inputs and the points T(x) produced by the mapping are called the outputs.

Definition 7.1: The vector b ∈ R^mis in the range of T, or in the image of T, if there exists some x ∈ Rⁿsuch that T(x) = b.

In other words, b is in the range of T if there is an input x in the domain of T that outputs b = T(x). In general, not every point in the co-domain of T is in the range of T. For example, consider the vector mapping T : R²→ R²defined as

T(x) =

x sin(x )

— cos(x − 1)

" #

1 2

x²+ x²+ 1

The vector b = (3, −1) is not in the range of T because the second component of T(x) is positive. On the other hand, b = (−1, 2) is in the range of T because

2 2

1 1 sin(0) − cos(1 − 1)

−1

0 1²+ 0²+ 1 2

= = = b.

Hence, a corresponding input for this particular b is x = (1, 0). In Figure 7.1 we illustrate the general setup of how the domain, co-domain, and range of a mapping are related. A crucial idea is that the range of T may not equal the co-domain.

59 of 202

Introduction to Linear Mappings

_bT(x)

Range

R^m, Co-domain

Rⁿ, domain

Figure 7.1: The domain, co-domain, and range of a mapping.

Linear mappings

For our purposes, vector mappings T : Rⁿ→ R^mcan be organized into two categories: (1) linear mappings and (2) nonlinear mappings.

Definition 7.2: The vector mapping T : Rⁿ→ R^mis said to be linear if the following conditions hold:

For any u, v ∈ Rⁿ, it holds that T(u + v) = T(u) + T(v).
For any u ∈ Rⁿand any scalar c, it holds that T(cu) = cT(u). If T is not linear then it is said to be nonlinear.

As an example, the mapping

T(x) =

x sin(x ) − cos(x − 1)

1 2

x²+ x²+ 1

is nonlinear. To see this, previously we computed that

1 −1

60 of 202

Lecture 7

If T were linear then by property (2) of Definition 7.2 the following must hold:

= T 3

3 1

= 3T

= 3

−1

−3

However,

2 2

3 sin(0) − cos(3 −

3²+ 0²+ 1

— cos(8)

−3

Example 7.3. Is the vector mapping T : R²→ R³linear?

x₂



2x − x

1 2

= x + x

1 2

−x₁− 3x₂

 



Solution. We must verify that the two conditions in Definition 7.2 hold. For the first condi- tion, take arbitrary vectors u = (u₁, u₂) and v = (v₁, v₂). We compute:

T (u + v) = T

u + v

1 1

u₂+ v₂

₌



1 1 2 2

2(u + v ) − (u + v )

(u₁+ v₁) + (u₂+ v₂)

−(u₁+ v₁) − 3(u₂+ v₂)





₌



2u + 2v − u − v

1 1 2 2

u₁+ v₁+ u₂+ v₂

−u₁− v₁− 3u₂− 3v₂





₌



2u − u + 2v − v

1 2 1 2

u₁+ u₂+ v₁+ v₂

−u₁− 3u₂− v₁− 3v₂







2u − u

−u₁− 3u₂

 

  

2v − v

1 2 1 2

= u₁+ u₂+ v₁+ v₂

−v₁− 3v₂





= T(u) + T(v)

61 of 202

Introduction to Linear Mappings

Therefore, for arbitrary u, v ∈ R², it holds that

T(u + v) = T(u) + T(v).

To prove the second condition, let c ∈ R be an arbitrary scalar. Then:

T(cu) = T

cu₂



2(cu ) − (cu )

1 2

= (cu ) + (cu )

−(cu₁) − 3(cu₂)

 



₌



1 2

c(2u − u )

c(u₁+ u₂)

c(−u₁− 3u₂)





2u − u

1 2

= c ^u₁+ u₂

 



−u₁− 3u₂

= cT(u)

Therefore, both conditions of Definition 7.2 hold, and thus T is a linear map.

Example 7.4. Let α ≥ 0 and define the mapping T : Rⁿ→ Rⁿby the formula T(x) = αx. If 0 ≤ α ≤ 1 then T is called a contraction and if α > 1 then T is called a dilation. In either case, show that T is a linear mapping.

Solution. Let u and v be arbitrary. Then

T(u + v) = α(u + v) = αu + αv = T(u) + T(v).

This shows that condition (1) in Definition 7.2 holds. To show that the second condition holds, let c is any number. Then

T(cx) = α(cx) = αcx = c(αx) = cT(x).

Therefore, both conditions of Definition 7.2 hold, and thus T is a linear mapping. To see a

particular example, consider the case α = ¹and n = 3. Then,

 

 

 

T(x) = x = x .

62 of 202

Lecture 7

Matrix mappings

Given a matrix A ∈ R^m^×ⁿand a vector x ∈ Rⁿ, in Lecture 4 we defined matrix-vector multiplication between A and x as an operation that produces a new output vector Ax ∈ R^m. We discussed that we could interpret A as a mapping that takes the input vector x ∈ Rⁿand produces the output vector Ax ∈ R^m. We can therefore associate to each matrix A a vector mapping T : Rⁿ→ R^mdefined by

T(x) = Ax.

Such a mapping T will be called a matrix mapping corresponding to A and when con- venient we will use the notation T_Ato indicate that T_Ais associated to A. We proved in Lecture 4 (Theorem 4.3), that for any u, v ∈ Rⁿ, and scalar c, matrix-vector multiplication satisfies the properties:

A(u + v) = Au + Av

A(cu) = cAu.

The following theorem is therefore immediate.

Theorem 7.5: To a given matrix A ∈ R^m^×ⁿassociate the mapping T : Rⁿ→ R^mdefined by the formula T(x) = Ax. Then T is a linear mapping.

Example 7.6. Is the vector mapping T : R²→ R³linear?

x₂



2x − x

1 2

= x + x

1 2

−x₁− 3x₂

 



Solution. In Example 7.3 we showed that T is a linear mapping using Definition 7.2. Alter- natively, we observe that T is a mapping defined using matrix-vector multiplication because

x₂

1 2

−x − 3x

1 2

2x − x 2 −1

= x₁+ x₂= 1

−1 −3

   

   

x₂

Therefore, T is a matrix mapping corresponding to the matrix

A =

 ₂



1 1

−1^



−1 −3

that is, T(x) = Ax. By Theorem 7.5, T is a linear mapping.

63 of 202

Introduction to Linear Mappings

Let T : Rⁿ→ R^mbe a vector mapping. Recall that b ∈ R^mis in the range of T if there is some input vector x ∈ Rⁿsuch that T(x) = b. In this case, we say that b is the image of x under T or that x is mapped to b under T. If T is a nonlinear mapping, finding a specific vector x such that T(x) = b is generally a difficult problem. However, if T(x) = Ax is a matrix mapping, then it is clear that finding such a vector x is equivalent to solving the matrix equation Ax = b. In summary, we have the following theorem.

Theorem 7.7: Let T : Rⁿ→ R^mbe a matrix mapping corresponding to A, that is, T(x) = Ax. Then b ∈ R^mis in the range of T if and only if the matrix equation Ax = b has a solution.

Let T_A: Rⁿ→ R^mbe a matrix mapping, that is, T_A(x) = Ax. We proved that the output vector Ax is a linear combination of the columns of A where the coefficients in the linear combination are the components of x. Explicitly, if A = [v₁v₂· · · v_n] and the

components of x = (x₁, x₂, . . . , x_n) then

Ax = x₁v₁+ x₂v₂+ · · · + x_nv_n.

Therefore, the range of the matrix mapping T_A(x) = Ax is

Range(T_A) = span{v₁, v₂, . . . , v_n}.

In words, the range of a matrix mapping is the span of its columns. Therefore, if v₁, v₂, . . . , v_n

span all of R^mthen every vector b ∈ R^mis in the range of T_A.

Example 7.8. Let



A = 1

3 −4^

 ₁₋₂

  

5 2 , b = 4 .

−3 −7 −6 12

Is the vector b in the range of the matrix mapping T(x) = Ax?

Solution. From Theorem 7.7, b is in the range of T if and only if the the matrix equation

Ax = b has a solution. To solve the system Ax = b, row reduce the augmented matrix

A b :



1 3 −4 −2 1 3 −4 −2

1 5 2 4 ∼ 0 1 3 3

   

  

−3 −7 −6 12 0 0 −12 0

The system is consistent and the (unique) solution is x = (−11, 3, 0). Therefore, b is in the

range of T.

7.4 Examples

If T : Rⁿ

→ R^mis a linear mapping, then for any vectors v₁, v₂, . . . , v_pand scalars

c₁, c₂, . . . , c_p, it holds that

T(c₁v₁+ c₂v₂+ · · · + c_pv_d) = c₁T(v₁) + c₂T(v₂) + · · · + c_dT(v_p).

(⋆)

64 of 202

Lecture 7

Therefore, if all you know are the values T(v₁), T(v₂), . . . , T(v_p) and T is linear, then you can compute T(v) for every

v ∈ span{v₁, v₂, . . . , v_p}.

Example 7.9. Let T : R²→ R²be a linear transformation that maps u to T(u) = (3, 4)

and maps v to T(v) = (−2, 5). Find T(2u + 3v).

Solution. Because T is a linear mapping we have that

T(2u + 3v) = T(2u) + T(3v) = 2T(u) + 3T(v).

We know that T(u) = (3, 4) and T(v) = (−2, 5). Therefore,

T(2u + 3v) = 2T(u) + 3T(v) = 2 + 3

3 −2 0

b _v

Example 7.10. (Rotations) Let T_θ: R²→ R²be the mapping on the 2D plane that rotates every v ∈ R²by an angle θ. Write down a formula for T_θand show that T_θis a linear mapping.

T_θ(v)

Solution. If v = (cos(α), sin(α)) then

T_θ(v) =

cos(α + θ)

sin(α + θ)

Then from the angle sum trigonometric identities:

" # "

cos(α + θ) cos(α) cos(θ) − sin(α) sin(θ)

T_θ(v) = =

sin(α + θ) cos(α) sin(θ) + sin(α) cos(θ)

But

T_θ(v) =

cos(α) cos(θ) − sin(α) sin(θ)

cos(α) sin(θ) + sin(α) cos(θ)

# " # "

cos(θ) − sin(θ) cos(α)

sin(θ)

cos(θ) sin(α)

` ˛_v¸ x

65 of 202

Introduction to Linear Mappings

If we scale v by any c > 0 then performing the same computation as above we obtain that

T_θ(cv) = cT(v). Therefore, T_θis a matrix mapping with corresponding matrix

A =

cos(θ)

— sin(θ)

cos(θ)

" #

sin(θ)

Thus, T_θis a linear mapping.

Example 7.11. (Projections) Let T : R³→ R²be the vector mapping

T x

x x

2 2

   

   

= x .

x₃0

Show that T is a linear mapping and describe the range of T.

Solution. First notice that

T x

2 2

    

= x = 0

x x 1 0 0 x

x₃0 0 0 0 x₃

       

  

0 x .

Thus, T is a matrix mapping corresponding to the matrix

0 0 0

_1 0

 

A = 0 1 0 .

Therefore, T is a linear mapping. Geometrically, T takes the vector x and projects it to the (x₁, x₂) plane, see Figure 7.2. What is the range of T? The range of T consists of all vectors in R³of the form

b = s

_t

 

where the numbers t and s are arbitrary. For each b in the range of T, there are infinitely many x’s such that T(x) = b.

b x = x

x₃

_x₁

 

T(x) = x

 

 

Figure 7.2: Projection onto the (x₁, x₂) plane

66 of 202

Lecture 7

After this lecture you should know the following:

what a vector mapping is
what the range of a vector mapping is
that the co-domain and range of a vector mapping are generally not the same
what a linear mapping is and how to check when a given mapping is linear
what a matrix mapping is and that they are linear mappings
how to determine if a vector b is in the range of a matrix mapping
the formula for a rotation in R²by an angle θ

67 of 202

Introduction to Linear Mappings

68 of 202

Lecture 8

Onto and One-to-One Mappings, and the Matrix of a Linear Mapping

8.1 Onto Mappings

We have seen through examples that the range of a vector mapping (linear or nonlinear) is not always the entire co-domain. For example, if T_A(x) = Ax is a matrix mapping and b is such that the equation Ax = b has no solutions then the range of T does not contain b and thus the range is not the whole co-domain.

Definition 8.1: A vector mapping T : Rⁿ→ R^mis said to be onto if for each b ∈ R^m

there is at least one x ∈ Rⁿsuch that T(x) = b.

For a matrix mapping T_A(x) = Ax, the range of T_Ais the span of the columns of A. Therefore:

Theorem 8.2: Let T_A: Rⁿ→ R^mbe the matrix mapping T_A(x) = Ax, where A ∈

M_m_×_n. Then T_Ais onto if and only if the columns of A span all of R^m.

Combining Theorem 4.11 and Theorem 8.2 we have:

Theorem 8.3: Let T_A: Rⁿ→ R^mbe the matrix mapping T_A(x) = Ax, where A ∈ R^m^×ⁿ. Then T_Ais onto if and only if r = rank(A) = m.

Example 8.4. Let T : R³→ R³be the matrix mapping with corresponding matrix

 ₁



A = −3 −4 2

5 2 3

2 −1^



Is T_Aonto?

69 of 202

Onto, One-to-One, and Standard Matrix

Solution. The rref(A) is

 ₁



2 −1^ 1 0

−3 −4 2 ∼ 0 1 0

5 2 3 0 0 1

  

₀

Therefore, r = rank(A) = 3. The dimension of the co-domain is m = 3 and therefore T_Ais onto. Therefore, the columns of A span all of R³, that is, every b ∈ R³can be written as a linear combination of the columns of A:





span −

  

3 , −4 , 2

  

^     ^

²⁻¹



_R3

Example 8.5. Let T_A: R⁴→ R³be the matrix mapping with corresponding matrix

2 −1

A = −1 4 1 8

2 0 −2 0

 _1 4

 

Is T_Aonto?

Solution. The rref(A) is



 ₁

2 −1

 

₄ ₁

0 −1

A = −1 4 1 8 ∼ 0 1 0 2

2 0 −2 0 0 0 0 0

₀



Therefore, r = rank(A) = 2. The dimension of the co-domain is m = 3 and therefore T_Ais not onto. Notice that v₃= −v₁and v₄= 2v₂. Thus, v₃and v₄are already in the span of the columns v₁, v₂. Therefore,

span{v₁, v₂, v₃, v₄} = span{v₁, v₂} =

/ R³.

Below is a theorem which places restrictions on the size of the domain of an onto mapping.

Theorem 8.6: Suppose that T_A: Rⁿ→ R^mis a matrix mapping corresponding to

A ∈ M_m_×_n. If T_Ais onto then m ≤ n.

Proof. If T_Ais onto then the rref(A) has r = m leading 1’s. Therefore, A has at least m

columns. The number of columns of A is n. Therefore, m ≤ n.

An equivalent way of stating Theorem 8.6 is the following.

70 of 202

Lecture 8

Corollary 8.7: If T_A: Rⁿ→ R^mis a matrix mapping corresponding to A ∈ M_m_×_nand

n < m then T_Acannot be onto.

Intuitively, if the domain Rⁿis “smaller” than the co-domain R^mand T_A: Rⁿ→ R^mis linear then T_Acannot be onto. For example, a matrix mapping T_A: R → R²cannot be onto. Linearity plays a key role in this. In fact, there exists a continuous (nonlinear) function f : R → R²whose range is a square! In this case, the domain is 1-dimensional and the range is 2-dimensional. This situation cannot happen when the mapping is linear.

Example 8.8. Let T_A: R²→ R³be the matrix mapping with corresponding matrix

 _1 4

A = ^−3 2^

2 1

Is T_Aonto?

Solution. T_Ais onto because the domain is R²and the co-domain is R³. Intuitively, two vectors are not enough to span R³. Geometrically, two vectors in R³span a 2D plane going through the origin. The vectors not on the plane span{v₁, v₂} are not in the range of T_A.

8.2 One-to-One Mappings

Given a linear mapping T : Rⁿ→ R^m, the question of whether b ∈ R^mis in the range of T is an existence question. Indeed, if b ∈ Range(T) then there exists a x ∈ R^msuch that T(x) = b. We now want to look at the problem of whether x is unique. That is, does there exist a distinct y such that T(y) = b.

Definition 8.9: A vector mapping T : Rⁿ→ R^mis said to be one-to-one if for each

b ∈ Range(T) there exists only one x ∈ Rⁿsuch that T(x) = b.

When T is a linear mapping, we have all the tools necessary to give a complete description of when T is one-to-one. To do this, we use the fact that if T : Rⁿ→ R^mis linear then T(0) = 0. Here is one proof: T(0) = T(x − x) = T(x) − T(x) = 0.

Theorem 8.10: Let T : Rⁿ→ R^mbe linear. Then T is one-to-one if and only if T(x) = 0

implies that x = 0.

If T_A: Rⁿ→ R^mis a matrix mapping then according to Theorem 8.10, T_Ais one-to-one if and only if the only solution to Ax = 0 is x = 0. We gather these facts in the following theorem.

71 of 202

Onto, One-to-One, and Standard Matrix

Theorem 8.11: Let T_A: Rⁿ→ R^mbe a matrix mapping, where A = [v₁v₂· · · v_n] ∈

M_m_×_n. The following statements are equivalent:

T_Ais one-to-one.
The rank of A is r = rank(A) = n.
The columns v₁, v₂, . . . , v_nare linearly independent.

Example 8.12. Let T_A: R⁴→ R³be the matrix mapping with matrix

−2 6

2 −2 0 2

 _3 4

 

A = −1 0 −2 −1 .

Is T_Aone-to-one?

Solution. By Theorem 8.11, T_Ais one-to-one if and only if the columns of A are linearly independent. The columns of A lie in R³and there are n = 4 columns. From Lecture 6, we know then that the columns are not linearly independent. Therefore, T_Ais not one-to-one. Alternatively, A will have rank at most r = 3 (why?). Therefore, the solution set to Ax = 0 will have at least one parameter, and thus there exists infinitely many solutions to Ax = 0. Intuitively, because R⁴is “larger” than R³, the linear mapping T_Awill have to project R⁴onto R³and thus infinitely many vectors in R⁴will be mapped to the same vector in R³.

Example 8.13. Let T_A: R²→ R³be the matrix mapping with matrix

_1 0

A = ^3 −1^

2 0

Is T_Aone-to-one?

Solution. By inspection, we see that the columns of A are linearly independent. Therefore,

T_Ais one-to-one. Alternatively, one can compute that

_{1 0}

rref(A) = 0 1

 

0 0

Therefore, r = rank(A) = 2, which is equal to the number columns of A.

72 of 202

Lecture 8

8.3 Standard Matrix of a Linear Mapping

We have shown that all matrix mappings T_Aare linear mappings. We now want to answer the reverse question: Are all linear mappings matrix mappings in disguise? If T : Rⁿ→ R^mis a linear mapping, then to show that T is in fact a matrix mapping we must show that there is some matrix A ∈ M_m_×_nsuch that T(x) = Ax. To that end, introduce the standard unit vectors e₁, e₂, . . . , e_nin Rⁿ:

₁ ₀ ₀ ₀

e =

 

 

, e =

 

 

, e =

 

 

, · · · , e =

0 0

Every x ∈ Rⁿis in span{e₁, e₂, . . . , e_n} because:

_x₁ ₁ ₀ ₀

 

 

 

x =

 

 

= x

x 0

 

+ x

 

+ · · · + x

 

. . . .

 

     

= x₁e₁+ x₂e₂+ · · · + x_ne_n

x_n0 0

With this notation we prove the following.

Theorem 8.14: Every linear mapping is a matrix mapping.

Proof. Let T : Rⁿ→ R^mbe a linear mapping. Let

v₁= T(e₁), v₂= T(e₂), . . . , v_n= T(e_n).

The co-domain of T is R^m, and thus v_i∈ R^m. Now, for arbitrary x ∈ Rⁿwe can write

x = x₁e₁+ x₂e₂+ · · · + x_ne_n.

Then by linearity of T, we have

T(x) = T(x₁e₁+ x₂e₂+ · · · + x_ne_n)

= x₁T(e₁) + x₂T(e₂) + · · · + x_nT(e_n)

= x₁v₁+ x₂v₂+ · · · + x_nv_n

=v₁v₂· · · v_nx.

m×n 1

Define the matrix A ∈ M by A = v v₂· · · v_n. Then our computation above

shows that

T(x) = x₁v₁+ x₂v₂+ · · · + x_nv_n= Ax.

Therefore, T is a matrix mapping with the matrix A ∈ M_m_×_n.

73 of 202

Onto, One-to-One, and Standard Matrix

If T : Rⁿ→ R^mis a linear mapping, the matrix

A = T(e

) T(e₂) · · · T(e_n)

is called the standard matrix of T. In words, the columns of A are the images of the standard unit vectors e₁, e₂, . . . , e_nunder T. The punchline is that if T is a linear mapping, then to derive properties of T we need only know the standard matrix A corresponding to T.

2 2

Example 8.15. Let T : R → R be the linear mapping that rotates every vector by an

and e₂=

angle θ. Use the standard unit vectors e₁= matrix A ∈ R²^×²corresponding to T.

in R²to write down the

^b bT_θ(e₁)

e₁

^b^e2

T_θ(e₂)

Solution. We have

A =T(e₁) T(e₂)=

cos(θ)

— sin(θ)

cos(θ)

sin(θ)

Example 8.16. Let T : R³→ R³be a dilation of factor k = 2. Find the standard matrix

A of T.

Solution. The mapping is T(x) = 2x. Then

₁ ₂

   

T(e ) = 2 0 = 0 , T(e

1 2

₀ ₀

   

) = 2 1 = 2 , T(e

₀ ₀

) = 2 0 = 0

   

Therefore,





2 0

A = T(e₁) T(e₂) T(e₃) = 0

₀

2 0

0 0 2



is the standard matrix of T.

After this lecture you should know the following:

74 of 202

Lecture 8

the relationship between the range of a matrix mapping T(x) = Ax and the span of the columns of A
what it means for a mapping to be onto and one-to-one
how to verify if a linear mapping is onto and one-to-one
that all linear mappings are matrix mappings
what the standard unit vectors are
how to compute the standard matrix of a linear mapping

75 of 202

Onto, One-to-One, and Standard Matrix

76 of 202

Lecture 9

Matrix Algebra

9.1 Sums of Matrices

We begin with the definition of matrix addition.

Definition 9.1: Given matrices





A =





· · · a

11 ^a12

21 ^a22

· · · a

. . .











, B =

 



· · · b

11 ^b12

21 ^b22

· · · b

. . . .

^am1 ^am2 ^amn ^bm1 ^bm2

· · · · · ·

^bmn









both of the same dimension m × n, the sum A + B is defined as





A + B =





a + b · · · a + b

1n 1n

a + b

11 11 ^a12 ⁺^b12

21 21 ^a22 ⁺^b22

· · · a + b

2n 2n

. .

^am1 ⁺^bm1 ^am2 ⁺^bm2

· · ·

^amn ⁺^bmn









Next is the definition of scalar-matrix multiplication.

Definition 9.2: For a scalar α we define αA by

αA = α







a · · · a

11 ^a12

21 ^a22

· · · a

. . .

^am1 ^am2

· · ·

^amn

 

 

 

αa αa₁₂· · · αa

αa

αa₂₂· · · αa

. .

αa_m₁

. .

αa_m₂· · ·

αa_mn







77 of 202

Matrix Algebra

Example 9.3. Given A and B below, find 3A − 2B.

−2

A = 0 −3

 



_1 5  ₅

0 −11

9 , B = 3 −5 1

4 −6 7 −1 −9 0





Solution. We compute:



 ₃

3A − 2B = 0

 

27 − 6

21 −2

−6 15^ 10 0 −22

−9 −10

12 −18 −18







^−7

= −6 1 25

14 0 21

−6 37^



Below are some basic algebraic properties of matrix addition/scalar multiplication.

Theorem 9.4: Let A, B, C be matrices of the same size and let α, β be scalars. Then

A + B = B + A
(A + B) + C = A + (B + C)
A + 0 = A

α(A + B) = αA + αB
(α + β)A = αA + βA
α(βA) = (αβ)A

9.2 Matrix Multiplication

Let T_B: R^p→ Rⁿand let T_A: Rⁿ→ R^mbe linear mappings. If x ∈ R^pthen T_B(x) ∈ Rⁿand thus we can apply T_Ato T_B(x). The resulting vector T_A(T_B(x)) is in R^m. Hence, each x ∈ R^pcan be mapped to a point in R^m, and because T_Band T_Aare linear mappings the resulting mapping is also linear. This resulting mapping is called the composition of T_Aand T_B, and is usually denoted by T_A◦ T_B: R^p→ R^m(see Figure 9.1). Hence,

(T_A◦ T_B)(x) = T_A(T_B(x)).

Because (T_A◦ T_B) : R^p→ R^mis a linear mapping it has an associated standard matrix, which we denote for now by C. From Lecture 8, to compute the standard matrix of any linear mapping, we must compute the images of the standard unit vectors e₁, e₂, . . . , e_punder the linear mapping. Now, for any x ∈ R^p,

T_A(T_B(x)) = T_A(Bx) = A(Bx).

Applying this to x = e_ifor all i = 1, 2, . . . , p, we obtain the standard matrix of T_A◦ T_B:

C =A(Be₁) A(Be₂) · · · A(Be_p).

78 of 202

Lecture 9

_Rp

_Rn

_Rm

T (x)

T_A(T_B(x))

^bB

T_B

T_A

(T_A◦ T_B)(x)

Figure 9.1: Illustration of the composition of two mappings.

Now Be₁is

Be₁=b₁b₂· · · b_pe₁= b₁.

And similarly Be_i= b_ifor all i = 1, 2, . . . , p. Therefore,

C =Ab₁Ab₂· · · Ab_p

is the standard matrix of T_A◦ T_B. This computation motivates the following definition.

m×n n×p

Definition 9.5: For A ∈ R and B ∈ R , with B = b b₂· · · b_p, we define the

product AB by the formula

AB =Ab₁Ab₂· · · Ab_p.

The product AB is defined only when the number of columns of A equals the number of rows of B. The following diagram is useful for remembering this:

(m × n) · (n × p) → m × p

From our definition of AB, the standard matrix of the composite mapping T_A◦ T_Bis

C = AB.

In other words, composition of linear mappings corresponds to matrix multiplication.

Example 9.6. For A and B below compute AB and BA.

A =

1 2 −2

1 1 −3



^−4 2 4 −4

, B = −1 −5 −3 3

−4 −4 −3 −1





79 of 202

Matrix Algebra

Solution. First AB = [Ab₁Ab₂Ab₃Ab₄]:

AB =

1 2 −2

1 1 −3





−1 −5

−4 −4

−4 2 4 −4

−3 3

−3 −1





2 0

7 9

2 0 4

7 9 10

2 0 4 4

7 9 10 2

On the other hand, BA is not defined! B has 4 columns and A has 2 rows.

Example 9.7. For A and B below compute AB and BA.

A = ^3

−2 −1 1

 ₃

 

−4 4 ^−1 −1

−3 −1 , B = −

₀

3 0 −2

−2 1 −2



Solution. First AB = [Ab₁Ab₂Ab₃]:

AB = ^



 

−4 4 3 ^{ }−1 −1

₀

−3 0 −2

−2 1 −2

3 −3 −1

−2 −1 1

−14



₌



₌



−14 7

8 −4

3 3

₌



8 −4 8

3 3 0

−14 7 −14 ^



80 of 202

Lecture 9

Next BA = [Ba₁Ba₂Ba₃]:



BA = −

^−1 −1 0

3 0 −2

−2 1 −2

 

 

−4 4

₃

3 −3 −1

−2 −1 1







= 16









1 −1

= 16 −10

15 −9

1 −1

= 16 −10 −11

15 −9 −9

−2 ^



On the other hand:

AB = ^8 −4 8

3 3 0



−14 7 −14 ^



Therefore, in general AB /= BA, i.e., matrix multiplication is not commutative.

An important matrix that arises frequently is the identity matrix I_n∈ Rⁿ^×ⁿof size

I =





0 0 · · ·

0 1 0 · · · 0

.^..^..^.· · · .^.

_1 0





0 0 0 · · · 1

You should verify that for any A ∈ Rⁿ^×ⁿit holds that AI_n= I_nA = A. Below are some basic algebraic properties of matrix multiplication.

Theorem 9.8: Let A, B, C be matrices, of appropriate dimensions, and let α be a scalar.

Then

A(BC) = (AB)C
A(B + C) = AB + AC
(B + C)A = BA + CA
α(AB) = (αA)B = A(αB)
I_nA = AI_n= A

If A ∈ Rⁿ^×ⁿis a square matrix, the kth power of A is

A = AAA · · · A

` ˛¸ x

k times

81 of 202

Matrix Algebra

Example 9.9. Compute A³if

A =

−2 3

1 0

Solution. Compute A²:

A =

−2 3 −2 3

1 0 1 0

7 −6

−2 3

And then A³:

A³= A²A =

7 −6 −2 3

−2 3 1 0

−20 21

7 −6

We could also do:

A³= AA²=

−2 3 7 −6

1 0 −2 3

−20 21

7 −6

9.3 Matrix Transpose

We begin with the definition of the transpose of a matrix.

Definition 9.10: Given a matrix A ∈ R^m^×ⁿ, the transpose of A is the matrix A^Twhose

ith column is the ith row of A.

If A is m × n then A^Tis n × m. For example, if

A =

_





0 −1 8 −7 −4

−4 6 −10 −9 6

9 5 −2 −3 5

−8 8 4 7 7







then

A =

_





8 −10

0 −4 9 −8

−1 6 5 8

−2

−7 −9 −3 7

−4 6 5 7







Example 9.11. Compute (AB)^Tand B^TA^Tif

A =

−2 1 0

3 −1 −3

, B =



^−2 1

₂

−1 −2 0

0 0 −1



82 of 202

Lecture 9

Solution. Compute AB:

AB =

−2 1 0

3 −1 −3

3 −4 −4

−5 5 9





−2 1

₂

−1 −2 0

0 0 −1



₌

Next compute B^TA^T:



 

 

−2

₃

1 −1

0 −3







−2 −1 0

B^TA^T= ^1 −2 0

2 0 −1

= −4

−5 ^



5 = (AB)

−4 9

The following theorem summarizes properties of the transpose.

Theorem 9.12: Let A and B be matrices of appropriate sizes. The following hold:

(A^T)^T= A
(A + B)^T= A^T+ B^T
(αA)^T= αA^T
(AB)^T= B^TA^T

A consequence of property (4) is that

(A₁A₂. . . A_k)^T= A^TA^T

k k−1

· · · A A

T T

2 1

and as a special case

(A^k)^T= (A^T)^k.

Example 9.13. Let T : R²→ R²be the linear mapping that first contracts vectors by a factor of k = 3 and then rotates by an angle θ. What is the standard matrix A of T?

Solution. Let e₁= (1, 0) and e₂= (0, 1) denote the standard unit vectors in R². From

Lecture 8, the standard matrix of T is A = T(e ) T(e₂). Recall that the standard matrix

of a rotation by θ is

cos(θ) − sin(θ)sin(θ) cos(θ)

Contracting e₁by a factor of k = 3 results in (¹, 0) and then rotation by θ results in

cos(θ)

¹sin(θ)

= T(e₁).

83 of 202

Matrix Algebra

Contracting e₂by a factor of k = 3 results in (0, ¹) and then rotation by θ results in

— sin(θ)

¹cos(θ)

= T(e₂).

Therefore,

A = T(e ) T(e ) =

cos(θ) − sin(θ)

¹sin(θ) ¹cos(θ)

3 3

On the other hand, the standard matrix corresponding to a contraction by a factor k = ¹is

" #

Therefore,

^"cos(θ) − sin(θ)

sin(θ) cos(θ) 0

rot^˛a^¸tion ^xc^`ontr^˛a^¸ctio^xn

# " # "

cos(θ) − sin(θ)

¹sin(θ)

¹cos(θ)

= A

After this lecture you should know the following:

know how to add and multiply matrices
that matrix multiplication corresponds to composition of linear mappings
the algebraic properties of matrix multiplication (Theorem 9.8)
how to compute the transpose of a matrix
the properties of matrix transposition (Theorem 9.12)

84 of 202

Lecture 10

Invertible Matrices

10.1 Inverse of a Matrix

The inverse of a square matrix A ∈ Rⁿ^×ⁿgeneralizes the notion of the reciprocal of a non- zero number a ∈ R. Formally speaking, the inverse of a non-zero number a ∈ R is the unique

number c ∈ R such that ac = ca = 1. The inverse of a /= 0, usually denoted by a

−1

= , can

be used to solve the equation ax = b:

ax = b ⇒ a⁻¹ax = a⁻¹b ⇒ x = a⁻¹b.

This motivates the following definition.

Definition 10.1: A matrix A ∈ Rⁿ^×ⁿis called invertible if there exists a matrix C ∈

Rⁿ^×ⁿsuch that AC = I_nand CA = I_n.

If A is invertible then can it have more than one inverse? Suppose that there exists C₁, C₂

such that AC_i= C_iA = I_n. Then

C₂= C₂(AC₁) = (C₂A)C₁= I_nC₁= C₁.

Thus, if A is invertible, it can have only one inverse. This motivates the following definition.

Definition 10.2: If A is invertible then we denote the inverse of A by A⁻¹. Thus,

AA⁻¹= A⁻¹A = I_n.

Example 10.3. Given A and C below, show that C is the inverse of A.

A =

−2 6 1

 ₀

 

−1 2 −2 , C =



−5 −1 −2

2 0 1

1 −3 ^−14 −3 −6 ^



85 of 202

Invertible Matrices

Solution. Compute AC:





AC = −1 2 −2

−2 6 1

 

1 −3 0 ^{ }−14

 

−3 −6 ^ 1 0

−5 −1 −2 = 0

₀

1 0

2 0 1 0 0 1



Compute CA:



^−14 −3

CA = −5 −1 −2

2 0 1

₋₆ 

 

₀ ₁

1 −3

−1 2 −2 = 0 1 0

−2 6 1 0 0 1

₀



Therefore, by definition C = A⁻¹.

Theorem 10.4: Let A ∈ Rⁿ^×ⁿand suppose that A is invertible. Then for any b ∈ Rⁿ

the matrix equation Ax = b has a unique solution given by A⁻¹b.

Proof: Let b ∈ Rⁿbe arbitrary. Then multiplying the equation Ax = b by A⁻¹from the left we obtain that

A⁻¹Ax = A⁻¹b

−1

⇒ I_nx = A b

−1

⇒ x = A b.

Therefore, with x = A⁻¹b we have that

Ax = A(A⁻¹b) = AA⁻¹b = I_nb = b

and thus x = A⁻¹b is a solution. If x˜ is another solution of the equation, that is, Ax˜ = b, then multiplying both sides by A⁻¹we obtain that x˜ = A⁻¹b. Thus, x = x˜. Example 10.5. Use the result of Example 10.3. to solve the linear system Ax = b if

−1

1 −3

A = −1 2 −

 ₀

 

2 , b = −

−1

 ₁

 

3 .

−2 6 1

Solution. We showed in Example 10.3 that

−1

^−14 −3 −6 ^

 

A = −5 −1 −2 .

2 0 1

Therefore, the unique solution to the linear system Ax = b is

−1

A b = ^−5

₋₆  ₁ ₁

^−14 −3

−1 −2 ^{ }−3^= ^0^

86 of 202

Lecture 10

Verify:





1 −3

₀ ₁  ₁

−1 2 −

−2 6 1 1

2 0 = −3

−1

    

The following theorem summarizes the relationship between the matrix inverse and ma- trix multiplication and matrix transpose.

Theorem 10.6: Let A and B be invertible matrices. Then:

The matrix A⁻¹is invertible and its inverse is A:

(A⁻¹)⁻¹= A.

The matrix AB is invertible and its inverse is B⁻¹A⁻¹:

(AB)⁻¹= B⁻¹A⁻¹.

The matrix A^Tis invertible and its inverse is (A⁻¹)^T:

(A^T)⁻¹= (A⁻¹)^T.

Proof: To prove (2) we compute

(AB)(B⁻¹A⁻¹) = ABB⁻¹A⁻¹= AI_nA⁻¹= AA⁻¹= I_n.

To prove (3) we compute

A^T(A⁻¹)^T= (A⁻¹A)^T= I^T= I_n.

10.2 Computing the Inverse of a Matrix

If A ∈ M

n×n

−1

is invertible, how do we find A ?

−1

Let A = c

c · · ·

1 2 n

c and we will

−1

find expressions for c . First note that AA = Ac₁Ac₂· · · Ac_n. On the other hand,

n 1 2 n

we also have AA⁻¹= I_n= e₁e₂· · · e . Therefore, we want to find c , c , . . . , c such

that

` ˛¸

−1

x ` ˛¸ x

AA ^In

Ac₁Ac₂· · · Ac_n=e₁e₂· · · e_n.

To find c_iwe therefore need to solve the linear system Ax = e_i. Here the image vector “b” is e_i. To find c₁we form the augmented matrixA e₁and find its RREF:

e₁∼I_nc₁.

87 of 202

Invertible Matrices

We will need to do this for each c₂, . . . , c_nso we might as well form the combined augmented matrixA e₁e₂· · · e_nand find the RREF all at once:

e₁e₂· · · e_n∼I_nc₁c₂· · · c_n.

In summary, to determine if A⁻¹exists and to simultaneously compute it, we compute the RREF of the augmented matrix

I_n,

that is, A augmented with the n × n identity matrix. If the RREF of A is I_n, that is

I_n∼I_nc₁c₂· · ·

_A c_n

then

−1

A = c

c₂· · · c_n.

If the RREF of A is not I_nthen A is not invertible.

Example 10.7. Find the inverse of A = ^1 3if it exists.

−1 −2

Solution. Form the augmented matrixA I₂and row reduce:

I =

1 3 1 0

−1 −2 0 1

Add rows R₁and R₂:

1 3 1 0

−1 −2 0 1

R₁+R₂

−−−−→

1 3 1 0

0 1 1 1

−3R₂+R₁

Perform the operation −−−−−→ :

−−−−−→

−3R₂+R₁¹

1 3 1 0 0 −2 −3

0 1 1 1 0 1 1 1

Thus, rref(A) = I₂, and therefore A is invertible. The inverse is

−1

A =

−2 −3

1 1

Verify:

−1

AA =

1 3 −2 −3

−1 −2 1 1

1 0

0 1



1 0

Example 10.8. Find the inverse of A = ^1 1 0

−2 0 −7

₃



if it exists.

88 of 202

Lecture 10

Solution. Form the augmented matrix A

I and row reduce:



 _1 0

−R₁+R₂, 2R₁+R₂



₁

₀



−R₃:





3 1

0 1 −3 −1

0 0 −1 2





−R₃





0 1 0 3

0 −−→ 0 1 −3

1 0 0 1

0 3 1 0 0 3 1 0

1 1 0 0 1 0^−−−−−−−−−−→ 0 1 −3 −1 1 0

−2 0 −7 0 0 1 0 0 −1 2 0 1

0 0 1

1 −1

0 −2

0 0

1 0

0 −1





3R₃+ R₂and −3R₃+ R₁:

₁



0 3 1 0

₀

0 1 −3 −1 1 0



3R₃+R₂, −3R₃+R₁

−−−−−−−−−−−→



₁

0 0 1 −2 0 −1 0

0 0 7 0

0 1 0 −7 1 −3

0 1 −2 0 −1

₃



Therefore, rref(A) = I₃, and therefore A is invertible. The inverse is

−1



A = −7 1 −3

−2 0 −1

 _7 3



Verify:

−1



 ₁

AA = 1

 

₃  ₇

 

₃ ₁

−3 = 0

1 0

₀



1 0 −7 1

−2 0 −7 −2 0 −1 0 0 1

 ₁

Example 10.9. Find the inverse of A = ^



1 0

1 1 −2 if it exists.

−2 0 −2

Solution. Form the augmented matrix A I and row reduce:



 _1 0

−R₁+R₂, 2R₁+R₂

 

₁

0 1 1 0

1 1 −2 0 1 0 −−−−−−−−−−→ 0

−2 0 −2 0

0 1 1 0

1 −3 −1 1 0

0 1 0 0 0 2 0 1

₀



We need not go further since the rref(A) is not I₃(rank(A) = 2 ). invertible.

10.3 Invertible Linear Mappings

Therefore, A is not

Let T_A: Rⁿ→ Rⁿbe a matrix mapping with standard matrix A and suppose that A is invertible. Let T_A−1 : Rⁿ→ Rⁿbe the matrix mapping with standard matrix A⁻¹. Then the standard matrix of the composite mapping T_A−1 ◦ T_A: Rⁿ→ Rⁿis

A⁻¹A = I_n.

89 of 202

Invertible Matrices

Therefore, (T_A−1 ◦ T_A)(x) = I_nx = x. Let’s unravel (T_A−1 ◦ T_A)(x) to see this: (T_A−1 ◦ T_A)(x) = T_A−1 (T_A(x)) = T_A−1 (Ax) = A⁻¹Ax = x.

Similarly, the standard matrix of (T_A◦T_A−1 ) is also I_n. Intuitively, the linear mapping T_A−1 undoes what T_Adoes, and conversely. Moreover, since Ax = b always has a solution, T_Ais onto. And, because the solution to Ax = b is unique, T_Ais one-to-one.

The following theorem summarizes equivalent conditions for matrix invertibility.

Theorem 10.10: Let A ∈ Rⁿ^×ⁿ. The following statements are equivalent:

A is invertible.
A is row equivalent to I_n, that is, rref(A) = I_n.
The equation Ax = 0 has only the trivial solution.
The linear transformation T_A(x) = Ax is one-to-one.
The linear transformation T_A(x) = Ax is onto.
The matrix equation Ax = b is always solvable.
The columns of A span Rⁿ.
The columns of A are linearly independent.
A^Tis invertible.

Proof: This is a summary of all the statements we have proved about matrices and matrix mappings specialized to the case of square matrices A ∈ Rⁿ^×ⁿ. Note that for non-square matrices, one-to-one does not imply ontoness, and conversely.

Example 10.11. Without doing any arithmetic, write down the inverse of the dilation matrix

3 0

" #

A = .

0 5

Example 10.12.

matrix

Without doing any arithmetic, write down the inverse of the rotation

A =

cos(θ)

— sin(θ)

cos(θ)

" #

sin(θ)

After this lecture you should know the following:

how to compute the inverse of a matrix
properties of matrix inversion and matrix multiplication
relate invertibility of a matrix with properties of the associated linear mapping (1-1, onto)
the characterizations of invertible matrices Theorem 10.10

90 of 202

Lecture 11

Determinants

11.1 Determinants of 2 × 2 and 3 × 3 Matrices

Consider a general 2 × 2 linear system

a₁₁x₁+ a₁₂x₂= b₁a₂₁x₁+ a₂₂x₂= b₂.

Using elementary row operations, it can be shown that the solution is

b₁a₂₂− b₂a₁₂

x = ,

^a11^a22 ^{− a}12^a21

b₂a₁₁− b₁a₂₁

x = ,

^a11^a22 ^{− a}12^a21

provided that a₁₁a₂₂− a₁₂a₂₁/= 0. Notice the denominator is the same in both expressions. The number a₁₁a₂₂− a₁₂a₂₁then completely characterizes when a 2 × 2 linear system has a unique solution. This motivates the following definition.

Definition 11.1: Given a 2 × 2 matrix

A =

a a

^a21 ^a22

we define the determinant of A as

det A = det

a a

^a21 ^a22

11 22 12 21

= a a − a a .

An alternative notation for det A is using vertical bars:

det

a a

^a21 ^a22

a a

11 12

a a

21 22

91 of 202

Determinants

Example 11.2. Compute the determinant of A.

(i) A =

3 −1

8 2

(ii) A =

3 1

−6 −2

(iii) A =

−110 0

568 0

Solution. For (i):

det(A) =

3 −1

8 2

= (3)(2) − (8)(−

1) = 14

For (ii):

det(A) =

3 1

−6 −2

= (3)(−

2) − (−

6)(1) = 0

For (iii):

det(A) =

−110 0

568 0

= (−110)(0) − (568)(0) = 0

As in the 2 × 2 case, the solution of a 3 × 3 linear system Ax = b can be shown to be

x₁=

Numerator₁Numerator₂

, x₂= , x₃=

Numerator₃

D D D

where

^D⁼^a11⁽^a22^a33 ^{− a}23^a32⁾^{− a}12⁽^a21^a33 ^{− a}23^a31^{) +}^a13⁽^a21^a32 ^{− a}22^a31⁾^.

Notice that the terms of D in the parenthesis are determinants of 2 × 2 submatrices of A:

22 23

a a

^a32 ^a33

a a

21 23

^a31 ^a33

a a

21 22

^a31 ^a32

^D⁼^a11⁽^a22^a33 ^{− a}23^a32⁾^{− a}12⁽^a21^a33 ^{− a}23^a31^{) +}^a13⁽^a21^a32 ^{− a}22^a31⁾^.

Let

A =

a a

^a32 ^a33

` ˛¸ x ` ˛¸ x ` ˛¸ x

, A

a a

^a31 ^a33

, and A =

a a

^a31 ^a32

Then we can write

D = a₁₁det(A₁₁) − a₁₂det(A₁₂) + a₁₃det(A₁₃).

The matrix A =

a a

22 23

^a32 ^a33

is obtained from A by deleting the 1st row and the 1st column:

A = a a

21 ^a22 23

^a31 ^a32 ^a33

^^a11 ^a12 ^a13 ^

 

−→ A =

a a

^a32 ^a33

92 of 202

Lecture 11

Similarly, the matrix A = 2nd column:

a a

^a31 ^a33

is obtained from A by deleting the 1st row and the

A = a

21 ^a22 23

^a31 ^a32 ^a33

^^a11 ^a12 ^a13 ^

 

a −→ A =

a a

^a31 ^a33

Finally, the matrix A = column:

a a

^a31 ^a32

is obtained from A by deleting the 1st row and the 3rd



A = a

^^a11 ^a12

21 ^a22 23

^a31 ^a32 ^a33





a −→

a a

^a31 ^a32

Notice also that the sign in front of the coefficients a₁₁, a₁₂, and a₁₃, alternate. This motivates the following definition.

Definition 11.3: Let A be a 3 × 3 matrix. Let A_jkbe the 2 × 2 matrix obtained from A by deleting the jth row and kth column. Define the cofactor of a_jkto be the number C_jk= (−1)^j⁺^kdet A_jk. Define the determinant of A to be

det A = a₁₁C₁₁+ a₁₂C₁₂+ a₁₃C₁₃.

This definition of the determinant is called the expansion of the determinant along the first row. In the cofactor C_jk= (−1)^j⁺^kdet A_jk, the expression (−1)^j⁺^kwill evaluate to either 1 or −1, depending on whether j + k is even or odd. For example, the cofactor of a₁₂is

1+2

C₁₂= (−1) det A₁₂= − det A₁₂

1+3

C₁₃= (−1) det A₁₃= det A₁₃.

and the cofactor of a₁₃is

We can also compute the cofactor of the other entries of A in the obvious way. For example, the cofactor of a₂₃is

2+3

C₂₃= (−1) det A₂₃= − det A₂₃.

A helpful way to remember the sign (−1)^j⁺^kof a cofactor is to use the matrix

—

_+ +

 

− + − .

+ − +

This works not just for 3 × 3 matrices but for any square n × n matrix.

Example 11.4. Compute the determinant of the matrix

−2

A = 2

3 5

1 0 6

_4 3

 

93 of 202

Determinants

Solution. From the definition of the determinant

det A = a₁₁C₁₁+ a₁₂C₁₂+ a₁₃C₁₃

= (4) det A₁₁− (−2) det A₁₂+ (3) det A₁₃

= 4

3 5 2 5

+ 2 + 3

2 3

0 6 1 6 1 0

= 4(3 · 6 − 5 · 0) + 2(2 · 6 − 1 · 5) + 3(2 · 0 − 1 · 3)

= 72 + 14 − 9

= 77

We can compute the determinant of a matrix A by expanding along any row or column.

For example, the expansion of the determinant for the matrix



^^a11

A = a

^a12 ^a13^

21 ^a22 23

^a31 ^a32 ^a33



along the 3rd row is

det A = a

a a

— a

a a

+ a

a a

And along the 2nd column:

det A = −a

a a

+ a

a a

a a a

— a

a a

a a a

31 33 31 33 21 23

The punchline is that any way you choose to expand (row or column) you will get the same answer. If a particular row or column contains zeros, say entry a_jk, then the computation of the determinant is simplified if you expand along either row j or column k because a_jkC_jk= 0 and we need not compute C_jk.

Example 11.5. Compute the determinant of the matrix

−2

A = 2 3 5

_4 3

 

1 0 6

Solution. In Example 11.4, we computed det(A) = 77 by expanding along the 1st row.

94 of 202

Lecture 11

Notice that a₃₂= 0. Expanding along the 3rd row:

det A = (1) det A₃₁− (0) det A₃₂+ (6) det A₃₃

−2 3

+ 6

4 −2

3 5 2 3

= 1(−2 · 5 − 3 · 3) + 6(4 · 3 − (−2) · 2)

= −19 + 96

= 77

11.2 Determinants of n × n Matrices

Using the 3 × 3 case as a guide, we define the determinant of a general n × n matrix as follows.

Definition 11.6: Let A be a n × n matrix. Let A_jkbe the (n − 1) × (n − 1) matrix

j+k

obtained from A by deleting the jth row and kth column, and let C_jk= (−1) det A_jk

be the (j, k)-cofactor of A. The determinant of A is defined to be

det A = a₁₁C₁₁+ a₁₂C₁₂+ · · · + a₁_nC₁_n.

The next theorem tells us that we can compute the determinant by expanding along any row or column.

Theorem 11.7: Let A be a n × n matrix. Then det A may be obtained by a cofactor expansion along any row or any column of A:

det A = a_j₁C_j₁+ a_j₂C_j₂+ · · · + a_jnC_jn.

We obtain two immediate corollaries.

Corollary 11.8: If A has a row or column containing all zeros then det A = 0.

Proof. If the jth row contains all zeros then a_j₁= a_j₂= · · · = a_jn= 0:

det A = a_j₁C_j₁+ a_j₂C_j₂+ · · · + a_jnC_jn= 0.

95 of 202

Determinants

Corollary 11.9: For any square matrix A it holds that det A = det A^T.

Sketch of the proof. Expanding along the jth row of A is equivalent to expanding along the jth column of A^T.

Example 11.10. Compute the determinant of

A =

_





1	3
1	2
0	0
−1	−3

0 −2

−2 −1

2 1

1 0







Solution. The third row contains two zeros, so expand along this row:

det A = 0 det A₃₁− 0 det A₃₂+ 2 det A₃₃− det A₃₄

1 3 −2

−1 −3 0

= 2 1 2 −1 −

1 3 0

1 2 −2

−1 −3 1

= 2 1

2 −1

−3 0

— 3

1 −1

−1 0

— 2

1 2

−1 −3

— 1

2 −2

— 3

1 −2

−3 1 −1 1

= 2((0 − 3) − 3(0 − 1) − 2(−3 + 2)) − ((2 − 6) − 3(1 − 2))

= 5

Example 11.11. Compute the determinant of

_

A = ^_

1	3
1	2
0	0
−1	−3

0 −2

−2 −1

2 1

1 0







96 of 202

Lecture 11

Solution. Expanding along the second row:

det A = − det A₂₁+ 2 det A₂₂− (−2) det A₂₃− 1 det A₂₄

= −

3 0 −2 1 0 −2

0 2 1 + 2 0 2 1

−3 1 0 −1 1 0

+ 2

0 1 −

1 3 −2 1 3 0

0 0 0 2

−1 −3 0 −1 −3 1

= −1(−3 − 12) + 2(−1 − 4) + 2(0) − (0)

= 5

11.3 Triangular Matrices

Below we introduce a class of matrices for which the determinant computation is trivial.

Definition 11.12: A square matrix A ∈ Rⁿ^×ⁿis called upper triangular if a_jk= 0 whenever j > k. In other words, all the entries of A below the diagonal entries a_iiare zero. It is called lower triangular if a_jk= 0 whenever j < k.

A = ^_₀

For example, a 4 × 4 upper triangular matrix takes the form ^_^a11 ^a12 ^a13 ^a14^_ ⁰^a22 ^a23 ^a24
	0	^a33
0	0	0	^a44

^a34^

Expanding along the first column, we compute

det A = a

a a

22 ^a23 24

0 a₃₃a

0 0

^a44

= a

11 ^a22

₀

^a44

^a33 ^a34

⁼^a11^a22^a33^a44^.

The general n × n case is similar and is summarized in the following theorem.

Theorem 11.13: The determinant of a triangular matrix is the product of its diagonal entries.

After this lecture you should know the following:

how to compute the determinant of any sized matrix

that the determinant of A is equal to the determinant of A

the determinant of a triangular matrix is the product of its diagonal entries

97 of 202

Determinants

98 of 202

Lecture 12

Properties of the Determinant

12.1 ERO and Determinants

Recall that for a matrix A ∈ Rⁿ^×ⁿwe defined

det A = a_j₁C_j₁+ a_j₂C_j₂+ · · · + a_jnC_jn

where the number C_jk= (−1)^j⁺^kdet A_jkis called the (j, k)-cofactor of A and

a = a

· · ·

j1 ^aj2 ^ajn

denotes the jth row of A. Notice that

det A =a a · · · a

_C_j₁

^Cjn

 

 

If we let c_j=C_j₁

^Cj2

· · · C_jnthen

det A = a · c .

In this lecture, we will establish properties of the determinant under elementary row opera- tions and some consequences. The following theorem describes how the determinant behaves under elementary row operations of Type 1.

Theorem 12.1: Suppose that A ∈ Rⁿ^×ⁿand let B be the matrix obtained by interchang- ing two rows of A. Then det B = − det A.

Proof. Consider the 2 × 2 case. Let A =

a a

11 12

^a21 ^a22

^a21 ^a22 ^a11 ^a12

and let B = . Then

det B = a₁₂a₂₁− a₁₁a₂₂= −(a₁₁a₂₂− a₁₂a₂₁) = − det A.

The general case is proved by induction.

This theorem leads to the following corollary.

99 of 202

Properties of the Determinant

Corollary 12.2: If A ∈ Rⁿ^×ⁿhas two rows (or two columns) that are equal then det(A) = 0.

Proof. Suppose that A has rows j and k that are equal. Let B be the matrix obtained by interchanging rows j and k. Then by the previous theorem det B = − det A. But clearly B = A, and therefore det B = det A. Therefore, det(A) = − det(A) and thus det A = 0.

Now we consider how the determinant behaves under elementary row operations of Type

Theorem 12.3: Let A ∈ Rⁿ^×ⁿand let B be the matrix obtained by multiplying a row of

A by β. Then det B = β det A.

Proof. Suppose that B is obtained from A by multiplying the jth row by β. The rows of A

and B different from j are equal, and therefore

B_jk= A_jk, for k = 1, 2, . . . , n.

In particular, the (j, k) cofactors of A and B are equal. The jth row of B is βa_j. Then, expanding det B along the jth row:

det B = (βa ) · c

= β(a · c )

= β det A.

Lastly we consider Type 3 elementary row operations.

Theorem 12.4: Let A ∈ Rⁿ^×ⁿand let B be the matrix obtained from A by adding β

times the kth row to the jth row. Then det B = det A.

Proof. For any matrix A and any row vector r = [r₁r₂· · · r_n] the expression

r · c = r C + r C + · · · + r C

1 j1 2 j2 n jn

is the determinant of the matrix obtained from A by replacing the jth row with the row r. Therefore, if k /= j then

a · c = 0

100 of 202

Lecture 12

since then rows k and j are equal. The jth row of B is b_j= a_j+ βa_k. Therefore, expanding det B along the jth row:

det B = (a_j+ βa_k) · c

j k

= a · c + β a ·

= det A.

Example 12.5. Suppose that A is a 4 × 4 matrix and suppose that det A = 11. If B is obtained from A by interchanging rows 2 and 4, what is det B?

Solution. Interchanging (or swapping) rows changes the sign of the determinant. Therefore, det B = −11.

Example 12.6. Suppose that A is a 4 × 4 matrix and suppose that det A = 11. Let a₁, a₂, a₃, a₄denote the rows of A. If B is obtained from A by replacing row a₃by 3a₁+ a₃, what is det B?

Solution. This is a Type 3 elementary row operation, which preserves the value of the de- terminant. Therefore,

det B = 11.

Example 12.7. Suppose that A is a 4 × 4 matrix and suppose that det A = 11. Let a₁, a₂, a₃, a₄denote the rows of A. If B is obtained from A by replacing row a₃by 3a₁+ 7a₃, what is det B?

Solution. This is not quite a Type 3 elementary row operation because a₃is multiplied by

7. The third row of B is b₃= 3a₁+ 7a₃. Therefore, expanding det B along the third row

det B = (3a₁+ 7a₃) · c

1 3

= 3a · c + 7a · c

= 7(a · c )

= 7 det A

= 77

101 of 202

100

Properties of the Determinant

Example 12.8. Suppose that A is a 4 × 4 matrix and suppose that det A = 11. Let a₁, a₂, a₃, a₄denote the rows of A. If B is obtained from A by replacing row a₃by 4a₁+ 5a₂, what is det B?

Solution. Again, this is not a Type 3 elementary row operation. The third row of B is

b₃= 4a₁+ 5a₂. Therefore, expanding det B along the third row

det B = (4a₁+ 5a₂) · c

1 2

= 4a · c + 5a · c

= 0 + 0

= 0

Determinants and Invertibility of Matrices

The following theorem characterizes invertibility of matrices with the determinant.

Theorem 12.9: A square matrix A is invertible if and only if det A /= 0.

Proof. Beginning with the matrix A, perform elementary row operations and generate a sequence of matrices A₁, A₂, . . . , A_psuch that A_pis in row echelon form and thus triangular:

A ~ A₁~ A₂~ · · · ~ A_p.

Thus, matrix A_iis obtained from A_i₋₁by performing one of the elementary row operations. From Theorems 12.1, 12.3, 12.4, if det A_i₋₁/= 0 then det A_i/= 0. In particular, det A = 0 if and only if det A_p= 0. Now, A_pis triangular and therefore its determinant is the product of its diagonal entries. If all the diagonal entries are non-zero then det A = det A_p/= 0. In this case, A is invertible because there are r = n leading entries in A_p. If a diagonal entry of A_pis zero then det A = det A_p= 0. In this case, A is not invertible because there are r < n leading entries in A_p. Therefore, A is invertible if and only if det A /= 0.

Properties of the Determinant

The following theorem characterizes how the determinant behaves under scalar multiplication of matrices.

Theorem 12.10: Let A ∈ Rⁿ^×ⁿand let B = βA, that is, B is obtained by multiplying every entry of A by β. Then det B = βⁿdet A.

102 of 202

101

Lecture 12

Proof. Consider the 2 × 2 case:

det(βA) =

βa βa

11 12

βa βa₂₂

= βa₁₁· βa₂₂− βa₁₂· βa₂₁

= β²(a₁₁a₂₂− a₁₂a₂₁)

= β²det A.

Thus, the statement holds for 2 × 2 matrices. Consider a 3 × 3 matrix A. Then det(βA) = βa₁₁|βA₁₁| − βa₁₂|βA₁₂| + βa₁₃|βA₁₃|

= βa₁₁β²|A₁₁| − βa₁₂β²|A₁₂| + βa₁₃β²|A₁₃|

= β³(a₁₁|A₁₁| − a₁₂|A₁₂| + a₁₃|A₁₃|)

= β³det A.

The general case can be treated using mathematical induction on n.

Example 12.11. Suppose that A is a 4 × 4 matrix and suppose that det A = 11. What is det(3A)?

Solution. We have

det(3A) = 3⁴det A

= 81 · 11

= 891

The following theorem characterizes how the determinant behaves under matrix multi- plication.

Theorem 12.12: Let A and B be n × n matrices. Then

det(AB) = det(A) det(B).

Corollary 12.13: For any square matrix det(A^k) = (det A)^k.

103 of 202

102

Properties of the Determinant

Corollary 12.14: If A is invertible then

det(A⁻¹) =

det A

Proof. From AA⁻¹= I_nwe have that det(AA⁻¹) = 1. But also

det(AA⁻¹) = det(A) det(A⁻¹).

Therefore

det(A) det(A⁻¹) = 1

or equivalently

det A⁻¹=

det A

Example 12.15. Let A, B, C be n × n matrices. Suppose that det A = 3, det B = 0, and det C = 7.

Is AC invertible?
Is AB invertible?
Is ACB invertible?

Solution. (i): We have det(AC) = det A det C = 3 · 7 = 21. Thus, AC is invertible.

: We have det(AB) = det A det B = 3 · 0 = 0. Thus, AB is not invertible.
: We have det(ACB) = det A det C det B = 3·7·0 = 0. Thus, ACB is not invertible.

After this lecture you should know the following:

how the determinant behaves under elementary row operations
that A is invertible if and only if det A /= 0
that det(AB) = det(A) det(B)

104 of 202

Lecture 13

103

Lecture 13

Applications of the Determinant

13.1 The Cofactor Method

Recall that for A ∈ Rⁿ^×ⁿwe defined

det A = a_j₁C_j₁+ a_j₂C_j₂+ · · · + a_jnC_jn

where C_jk= (−1)^j⁺^kdet A_jkis called the (j, k)-Cofactor of A and

a_j=a_j₁· · ·

^aj2 ^ajn

is the jth row of A. If c_j=C_j₁

^Cj2

· · · C_jnthen

j1 j2

det A =a a · · · a

^^Cj1^

 

 

^Cjn

Suppose that B is the matrix obtained from A by replacing row a_jwith a distinct row a_k. To compute det B expand along its jth row b_j= a_k:

= a · c .

det B = a · c = 0.

The Cofactor Method is an alternative method to find the inverse of an invertible matrix.

Recall that for any matrix A ∈ Rⁿ^×ⁿ, if we expand along the jth row then

det A = a · c .

On the other hand, if j /= k then

In summary,

a · c = 0.

a · c =

(

det A, if j = k

0, if j /= k.

105 of 202

104

Applications of the Determinant

Form the Cofactor matrix





Cof(A) =





11 ^C12

21 ^C22

. .

^Cn1 ^Cn2

^Cnn

. .

· · ·

· · · C c

. . · · · . .

· · ·

c_n

_C₁_n _c₁

  

  

Then,

A(Cof(A)) =

_a₁

a_n

 

 

T _cT

1 2

· · · c





a c



^2

a₁c^T

a c

a₂c^T

· · · a c

. _.

a c













det A

a_nc^T· · · a c

0 · · · 0

· · ·

. _{. .}

0 · · ·

det A



_





This can be written succinctly as

A(Cof(A))^T= det(A)I_n.

Now if det A /= 0 then we can divide by det A to obtain

det A

(Cof(A)) = I .

This leads to the following formula for the inverse:

_A−1 ₌

det A

(Cof(A))^T

Although this is an explicit and elegant formula for A⁻¹, it is computationally intensive, even for 3 × 3 matrices. However, for the 2 × 2 case it provides a useful formula to compute

106 of 202

105

Lecture 13

the matrix inverse. Indeed, if A =

a b

c d

we have Cof(A) =

d −c

−b a

and therefore

−1

A =

ad − bc

−c a

¹d −b

When does an integer matrix have an integer inverse? We can answer this question using the Cofactor Method. Let us first be clear about what we mean by an integer matrix.

Definition 13.1: A matrix A ∈ R^m^×ⁿis called an integer matrix if every entry of A is an integer.

Suppose that A ∈ Rⁿ^×ⁿis an invertible integer matrix. Then det(A) is a non-zero integer and (Cof(A))^Tis an integer matrix. If A⁻¹is also an integer matrix then det(A⁻¹) is also an integer. Now det(A) det(A⁻¹) = 1 thus it must be the case that det(A) = ±1. Suppose on the other hand that det(A) = ±1. Then by the Cofactor method

−1

A =

1 det(A)

(Cof(A)) = ±(Cof(A))

and therefore A⁻¹is also an integer matrix. We have proved the following.

Theorem 13.2: An invertible integer matrix A ∈ Rⁿ^×ⁿhas an integer inverse A⁻¹if and only if det A = ±1.

We can use the previous theorem to generate integer matrices with an integer inverse as follows. Begin with an upper triangular matrix M₀having integer entries and whose diagonal entries are either 1 or −1. By construction, det(M₀) = ±1. Perform any sequence of elementary row operations of Type 1 and Type 3. This generates a sequence of matrices M₁, . . . , M_pwhose entries are integers. Moreover,

M₀~ M₁~ · · · ~ M_p.

Therefore,

±1 = det(M) = det(M₁) = · · · = det(M_p).

107 of 202

106

Applications of the Determinant

13.2 Cramer’s Rule

The Cofactor method can be used to give an explicit formula for the solution of a linear system where the coefficient matrix is invertible. The formula is known as Cramer’s Rule. To derive this formula, recall that if A is invertible then the solution to Ax = b is x = A⁻¹b.

det A

Using the Cofactor method, A⁻¹= ¹(Cof(A))^T, and therefore

x =



det A



C · · ·

11 ^C21

12 ^C22

. .

. _.

^C1n ^C2n

· · ·

^Cnn

· · · C b

n2 2

. .

b_n

 _C_n₁ _b₁

  

  

Consider the first component x₁of x:

x₁= (b₁C₁₁+ b₂C₂₁+ · · · + b_nC_n₁).

det A

The expression b₁C₁₁+ b₂C₂₁+ · · · + b_nC_n₁is the expansion of the determinant along the

first column of the matrix obtained from A by replacing the first column with b:





det





1 ^a12

b · · · a

2 ^a22

· · · a

. .

. _.

b_n

^an2

· · ·

^ann







= b₁C₁₁+ b₂C₂₁+ · · · + b_nC_n₁



Similarly,

x₂= (b₁C₁₂+ b₂C₂₂+ · · · + b_nC_n₂) det A

x =



det A 

det A

det A_n

and (b₁C₁₂+ b₂C₂₂+ · · · + b_nC_n₂) is the expansion of the determinant along the second

column of the matrix obtained from A by replacing the second column with b. In summary:

Theorem 13.3: (Cramer’s Rule) Let A ∈ Rⁿ^×ⁿbe an invertible matrix. Let b ∈ Rⁿ

and let A_ibe the matrix obtained from A by replacing the ith column with b. Then the solution to Ax = b is

^det A₁^





Although this is an explicit and elegant formula for x, it is computationally intensive, and used mainly for theoretical purposes.

108 of 202

107

Lecture 13

13.3 Volumes

The volume of the parallelepiped determined by the vectors v₁, v₂, v₃is

1 2 3 2 3

1 3

Vol(v , v , v ) = abs(v (v × v )) = abs(det v v₂v )

where abs(x) denotes the absolute value of the number x. Let A be an invertible matrix and let w₁= Av₁, w₂= Av₂, w₃= Av₃. How are Vol(v₁, v₂, v₂) and Vol(w₁, w₂, w₂) related? Compute:

Vol(w₁, w₂, w₃) = abs(detw₁w₂w₃)

= absdetAv₁Av₂

= absdet(Av₁v₂

_Av₃

_v₃ ₎

= absdet A · detv₁v₂

_v₃

= abs(det A) · Vol(v₁, v₂, v₃).

Therefore, the number abs(det A) is the factor by which volume is changed under the linear transformation with matrix A. In summary:

Theorem 13.4: Suppose that v₁, v₂, v₃are vectors in R³that determine a parallelepiped of non-zero volume. Let A be the matrix of a linear transformation and let w₁, w₂, w₃be the images of v₁, v₂, v₃under A, respectively. Then

Vol(w₁, w₂, w₃) = abs(det A) · Vol(v₁, v₂, v₃).

Example 13.5. Consider the data

₄



A = 2 4 1

1 1 4

1 −1^



, v =

 ₁

, v = 1

2 3

₋₁    

0 2 1

₀

, v = 5 .

₋₁

and let w₁= Av₁, w₂= Av₂, and w₃= Av₃. Find the volume of the parallelepiped spanned by the vectors {w₁, w₂, w₃}.

Solution. We compute:

Vol(v₁, v₂, v₃) = abs(det(v₁v₂v₃)) = abs(−7) = 7

We compute:

det(A) = 55.

Therefore, the volume of the parallelepiped spanned by the vectors {w₁, w₂, w₃} is

Vol(w₁, w₂, w₃) = abs(55) × 7 = 385.

109 of 202

108

Applications of the Determinant

After this lecture you should know the following:

what the Cofactor Method is
what Cramer’s Rule is
the geometric interpretation of the determinant (volume)

110 of 202

Lecture 14

Vector Spaces

Vector Spaces

When you read/hear the word vector you may immediately think of two points in R²(or R³) connected by an arrow. Mathematically speaking, a vector is just an element of a vector space. This then begs the question: What is a vector space? Roughly speaking, a vector space is a set of objects that can be added and multiplied by scalars. You have already worked with several types of vector spaces. Examples of vector spaces that you have already encountered are:

the set Rⁿ,
the set of all n × n matrices,
the set of all functions from [a, b] to R, and
the set of all sequences.

In all of these sets, there is an operation of “addition“ and “multiplication by scalars”. Let’s formalize then exactly what we mean by a vector space.

Definition 14.1: A vector space is a set V of objects, called vectors, on which two operations called addition and scalar multiplication have been defined satisfying the following properties. If u, v, w are in V and if α, β ∈ R are scalars:

The sum u + v is in V. (closure under addition)
u + v = v + u (addition is commutative)
(u + v) + w = u + (v + w) (addition is associativity)
There is a vector in V called the zero vector, denoted by 0, satisfying v + 0 = v.
For each v there is a vector −v in V such that v + (−v) = 0.

111 of 202

110

Vector Spaces

The scalar multiple of v by α, denoted αv, is in V. (closure under scalar multiplica- tion)
α(u + v) = αu + αv
(α + β)v = αv + βv
α(βv) = (αβ)v
1v = v

It can be shown that 0 · v = 0 for any vector v in V. To better understand the definition of a vector space, we first consider a few elementary examples.

Example 14.2. Let V be the unit disc in R²:

V = {(x, y) ∈ R²| x²+ y²≤ 1}

Is V a vector space?

Solution. The circle is not closed under scalar multiplication. For example, take u = (1, 0) ∈ V and multiply by say α = 2. Then αu = (2, 0) is not in V. Therefore, property (6) of the definition of a vector space fails, and consequently the unit disc is not a vector space.

Example 14.3. Let V be the graph of the quadratic function f(x) = x²:

V = ⁿ(x, y) ∈ R²| y = x²^,.

Is V a vector space?

Solution. The set V is not closed under scalar multiplication. For example, u = (1, 1) is a point in V but 2u = (2, 2) is not. You may also notice that V is not closed under addition either. For example, both u = (1, 1) and v = (2, 4) are in V but u + v = (3, 5) and (3, 5) is not a point on the parabola V. Therefore, the graph of f(x) = x²is not a vector space.

112 of 202

111

Lecture 14

Example 14.4. Let V be the graph of the function f(x) = 2x:

V = {(x, y) ∈ R²| y = 2x}.

Is V a vector space?

Solution. We will show that V is a vector space. First, we verify that V is closed under addition. We first note that an arbitrary point in V can be written as u = (x, 2x). Let then u = (a, 2a) and v = (b, 2b) be points in V. Then

u + v = (a + b, 2a + 2b) = (a + b, 2(a + b)).

Therefore V is closed under addition. Verify that V is closed under scalar multiplication:

αu = α(a, 2a) = (αa, α2a) = (αa, 2(αa)).

Therefore V is closed under scalar multiplication. There is a zero vector 0 = (0, 0) in V:

u + 0 = (a, 2a) + (0, 0) = (a, 2a).

All the other properties of a vector space can be verified to hold; for example, addition is commutative and associative in V because addition in R²is commutative/associative, etc. Therefore, the graph of the function f(x) = 2x is a vector space.

The following example is important (it will appear frequently) and is our first example of what we could say is an “abstract vector space”. To emphasize, a vector space is a set that comes equipped with an operation of addition and scalar multiplication and these two operations satisfy the list of properties above.

Example 14.5. Let V = P_n[t] be the set of all polynomials in the variable t and of degree at most n:

P_n[t] = ⁿa₀+ a₁t + a₂t²+ · · · + a_ntⁿ| a₀, a₁, . . . , a_n∈ R^,.

Is V a vector space?

Solution. Let u(t) = u₀+ u₁t + · · · + u_ntⁿand let v(t) = v₀+ v₁t + · · · + v_ntⁿbe polynomials in V. We define the addition of u and v as the new polynomial (u + v) as follows:

(u + v)(t) = u(t) + v(t) = (u₀+ v₀) + (u₁+ v₁)t + · · · + (u_n+ v_n)tⁿ.

113 of 202

112

Vector Spaces

Then u + v is a polynomial of degree at most n and thus (u + v) ∈ P_n[t], and therefore this shows that P_n[t] is closed under addition. Now let α be a scalar, define a new polynomial (αu) as follows:

(αu)(t) = (αu₀) + (αu₁)t + · · · + (αu_n)tⁿ

Then (αu) is a polynomial of degree at most n and thus (αu) ∈ P_n[t]; hence, P_n[t] is closed under scalar multiplication. The 0 vector in P_n[t] is the zero polynomial 0(t) = 0. One can verify that all other properties of the definition of a vector space also hold; for example, addition is commutative and associative, etc. Thus P_n[t] is a vector space.

Example 14.6. Let V = M_m_×_nbe the set of all m×n matrices. Under the usual operations of addition of matrices and scalar multiplication, is M_n_×_ma vector space?

Solution. Given matrices A, B ∈ M_m_×_nand a scalar α, we defined the sum A+B by adding entry-by-entry, and αA by multiplying each entry of A by α. It is clear that the space M_m_×_nis closed under these two operations. The 0 vector in M_m_×_nis the matrix of size m × n having all entries equal to zero. It can be verified that all other properties of the definition of a vector space also hold. Thus, the set M_m_×_nis a vector space.

Example 14.7. The n-dimensional Euclidean space V = Rⁿunder the usual operations of addition and scalar multiplication is vector space.

Example 14.8. Let V = C[a, b] denote the set of functions with domain [a, b] and co-domain

R that are continuous. Is V a vector space?

Subspaces of Vector Spaces

Frequently, one encounters a vector space W that is a subset of a larger vector space V. In this case, we would say that W is a subspace of V. Below is the formal definition.

Definition 14.9: Let V be a vector space. A subset W of V is called a subspace of V

if it satisfies the following properties:

The zero vector of V is also in W.
W is closed under addition, that is, if u and v are in W then u + v is in W.
W is closed under scalar multiplication, that is, if u is in W and α is a scalar then

αu is in W.

Example 14.10. Let W be the graph of the function f(x) = 2x:

W = {(x, y) ∈ R²| y = 2x}.

Is W a subspace of V = R²?

114 of 202

113

Lecture 14

Solution. If x = 0 then y = 2 · 0 = 0 and therefore 0 = (0, 0) is in W. Let u = (a, 2a) and

v = (b, 2b) be elements of W. Then

` ˛¸ x

u + v = (a, 2a) + (b, 2b) = (a + b, 2a + 2b) = (a + b, 2 (a + b)).

Because the x and y components of u + v satisfy y = 2x then u + v is inside in W. Thus, W

is closed under addition. Let α be any scalar and let u = (a, 2a) be an element of W. Then

`˛¸x

αu = (αa, α2a) = ( αa , 2 (αa))

Because the x and y components of αu satisfy y = 2x then αu is an element of W, and thus W is closed under scalar multiplication. All three conditions of a subspace are satisfied for W and therefore W is a subspace of V.

Example 14.11. Let W be the first quadrant in R²:

W = {(x, y) ∈ R²| x ≥ 0, y ≥ 0}.

Is W a subspace?

Solution. The set W contains the zero vector and the sum of two vectors in W is again in W; you may want to verify this explicitly as follows: if u₁= (x₁, y₁) is in W then x₁≥ 0 and y₁≥ 0, and similarly if u₂= (x₂, y₂) is in W then x₂≥ 0 and y₂≥ 0. Then the sum u₁+u₂= (x₁+x₂, y₁+y₂) has components x₁+y₁≥ 0 and x₂+y₂≥ 0 and therefore u₁+u₂is in W. However, W is not closed under scalar multiplication. For example if u = (1, 1) and α = −1 then αu = (−1, −1) is not in W because the components of αu are clearly not non-negative.

Example 14.12. Let V = M_n_×_nbe the vector space of all n × n matrices. We define the

trace of a matrix A ∈ M_n_×_nas the sum of its diagonal entries:

tr(A) = a₁₁+ a₂₂+ · · · + a_nn.

Let W be the set of all n × n matrices whose trace is zero:

W = {A ∈ M_n_×_n| tr(A) = 0}.

Is W a subspace of V?

Solution. If 0 is the n × n zero matrix then clearly tr(0) = 0, and thus 0 ∈ M_n_×_n. Suppose that A and B are in W. Then necessarily tr(A) = 0 and tr(B) = 0. Consider the matrix C = A + B. Then

tr(C) = tr(A + B) = (a₁₁+ b₁₁) + (a₂₂+ b₂₂) + · · · + (a_nn+ b_nn)

= (a₁₁+ · · · + a_nn) + (b₁₁+ · · · + b_nn)

= tr(A) + tr(B)

= 0

115 of 202

114

Vector Spaces

Therefore, tr(C) = 0 and consequently C = A + B ∈ W, in other words, W is closed under addition. Now let α be a scalar and let C = αA. Then

tr(C) = tr(αA) = (αa₁₁) + (αa₂₂) + · · · + (αa_nn) = α tr(A) = 0.

Thus, tr(C) = 0, that is, C = αA ∈ W, and consequently W is closed under scalar multipli- cation. Therefore, the set W is a subspace of V.

Example 14.13. Let V = P_n[t] and consider the subset W of V:

W = {u ∈ P_n[t] | u^′(1) = 0}

In other words, W consists of polynomials of degree n in the variable t whose derivative at

t = 1 is zero. Is W a subspace of V?

Solution. The zero polynomial 0(t) = 0 clearly has derivative at t = 1 equal to zero, that is, 0^′(1) = 0, and thus the zero polynomial is in W. Now suppose that u(t) and v(t) are two polynomials in W. Then, u^′(1) = 0 and also v^′(1) = 0. To verify whether or not W is closed under addition, we must determine whether the sum polynomial (u + v)(t) has a derivative at t = 1 equal to zero. From the rules of differentiation, we compute

(u + v)^′(1) = u^′(1) + v^′(1) = 0 + 0.

Therefore, the polynomial (u + v) is in W, and thus W is closed under addition. Now let α be any scalar and let u(t) be a polynomial in W. Then u^′(1) = 0. To determine whether or not the scalar multiple αu(t) is in W we must determine if αu(t) has a derivative of zero at t = 1. Using the rules of differentiation, we compute that

(αu)^′(1) = αu^′(1) = α · 0 = 0.

Therefore, the polynomial (αu)(t) is in W and thus W is closed under scalar multiplication. All three properties of a subspace hold for W and therefore W is a subspace of P_n[t].

Example 14.14. Let V = P_n[t] and consider the subset W of V:

W = {u ∈ P_n[t] | u(2) = −1}

In other words, W consists of polynomials of degree n in the variable t whose value t = 2 is

−1. Is W a subspace of V?

Solution. The zero polynomial 0(t) = 0 clearly does not equal −1 at t = 2. Therefore, W does not contain the zero polynomial and, because all three conditions of a subspace must be satisfied for W to be a subspace, then W is not a subspace of P_n[t]. As an exercise, you may want to investigate whether or not W is closed under addition and scalar multiplication.

116 of 202

115

Lecture 14

Example 14.15. A square matrix A is said to be symmetric if A^T= A. For example, here is a 3 × 3 symmetric matrix:

 ₁



A = 2

4 5

2 −3^



−3 5 7

Verify for yourself that we do indeed have that A^T= A. Let W be the set of all symmetric

n × n matrices. Is W a subspace of V = M_n_×_n?

Example 14.16. For any vector space V, there are two trivial subspaces in V, namely, V itself is a subspace of V and the set consisting of the zero vector W = {0} is a subspace of V.

There is one particular way to generate a subspace of any given vector space V using the span of a set of vectors. Recall that we defined the span of a set of vectors in Rⁿbut we can define the same notion on a general vector space V.

Definition 14.17: Let V be a vector space and let v₁, v₂, . . . , v_pbe vectors in V. The

span of {v₁, . . . , v_p} is the set of all linear combinations of v₁, . . . , v_p:

span{v₁, v₂, . . . , v_p} = ⁿt₁v₁+ t₂v₂+ · · · + v_pv_p| t₁, t₂, . . . , t_p∈ R^,.

We now show that the span of a set of vectors in V is a subspace of V.

Theorem 14.18: If v₁, v₂, . . . , v_pare vectors in V then span{v₁, . . . , v_p} is a subspace of

Solution. Let u = t₁v₁+· · ·+t_pv_pand w = s₁v₁+· · ·+s_pv_pbe two vectors in span{v₁, v₂, . . . , v_p}.

Then

u + w = (t₁v₁+ · · · + t_pv_p) + (s₁v₁+ · · · + s_pv_p) = (t₁+ s₁)v₁+ · · · + (t_p+ s_p)v_p.

Therefore u + w is also in the span of v₁, . . . , v_p. Now consider αu:

αu = α(t₁v₁+ · · · + t_pv_p) = (αt₁)v₁+ · · · + (αt_p)v_p.

Therefore, αu is in the span of v₁, . . . , v_p. Lastly, since 0v₁+ 0v₂+ · · · + 0v_p= 0 then the

zero vector 0 is in the span of v₁, v₂, . . . , v_p. Therefore, span{v₁, v₂, . . . , v_p} is a subspace

of V.

Given a general subspace W of V, if w₁, w₂, . . . , w_pare vectors in W such that

span{w₁, w₂, . . . , w_p} = W

then we say that {w₁, w₂, . . . , w_p} is a spanning set of W. Hence, every vector in W can be written as a linear combination of the vectors w₁, w₂, . . . , w_p.

After this lecture you should know the following:

117 of 202

116

Vector Spaces

what a vector space/subspace is
be able to give some examples of vector spaces/subspaces
that the span of a set of vectors in V is a subspace of V

118 of 202

Lecture 15

Before we begin this Lecture, we review subspaces. Recall that W is a subspace of a vector space V if W is a subset of V and

the zero vector 0 in V is also in W,
for any vectors u, v in W the sum u + v is also in W, and
for any vector u in W and any scalar α the vector αu is also in W.

In the previous lecture we gave several examples of subspaces. For example, we showed that a line through the origin in R²is a subspace of R²and we gave examples of subspaces of P_n[t] and M_n_×_m. We also showed that if v₁, . . . , v_pare vectors in a vector space V then

W = span{v₁, v₂, . . . , v_p}

is a subspace of V.

Linear Maps on Vector Spaces

In Lecture 7, we defined what it meant for a vector mapping T : Rⁿ→ R^mto be a linear mapping. We now want to introduce linear mappings on general vector spaces; you will notice that the definition is essentially the same but the key point to remember is that the underlying spaces are not Rⁿbut a general vector space.

Definition 15.1: Let T : V → U be a mapping of vector spaces. Then T is called a linear mapping if

for any u, v in V it holds that T(u + v) = T(u) + T(v), and
for any scalar α and u in V is holds that T(αv) = αT(v).

Example 15.2. Let V = M_n_×_nbe the vector space of n × n matrices and let T : V → V be the mapping

T(A) = A + A^T.

117

Lecture 15

Linear Maps

119 of 202

118

Linear Maps

Is T is a linear mapping?

Solution. Let A and B be matrices in V. Then using the properties of the transpose and regrouping we obtain:

T(A + B) = (A + B) + (A + B)^T

= A + B + A^T+ B^T

= (A + A^T) + (B + B^T)

= T(A) + T(B).

Similarly, if α is any scalar then

T(αA) = (αA) + (αA)^T

= αA + αA^T

= α(A + A^T)

= αT(A).

This proves that T satisfies both conditions of Definition 15.1 and thus T is a linear mapping.

Example 15.3. Let V = M_n_×_nbe the vector space of n × n matrices, where n ≥ 2, and let

T(A) = det(A)

T : V → R be the mapping

Is T is a linear mapping?

Solution. If T is a linear mapping then according to Definition 15.1, we must have T(A + B) = det(A + B) = det(A) + det(B) and also T(αA) = αT(A) for any scalar α. Do these properties actually hold though? For example, we know from the properties of the determinant that det(αA) = αⁿdet(A) and therefore it does not hold that T(αA) = αT(A) unless α = 1. Therefore, T is not a linear mapping. Also, it does not hold in general that det(A + B) = det(A) + det(B); in fact it rarely holds. For example, if

A =

, B =

−1 1

2 0

0 1 0 3

then det(A) = 2, det(B) = −3 and therefore det(A) + det(B) = −1. On the other hand,

A + B =

1 1

0 4

and thus det(A + B) = 4. Thus, det(A + B) /= det(A) + det(B).

Example 15.4. Let V = P_n[t] be the vector space of polynomials in the variable t of degree no more than n ≥ 1. Consider the mapping T : V → V define as

T(f(t)) = 2f(t) + f^′(t).

120 of 202

119

Lecture 15

For example, if f(t) = 3t⁶− t²+ 5 then

T(f(t)) = 2f(t) + f^′(t)

= 2(3t⁵− t²+ 5) + (18t⁵− 2t)

= 6t⁵+ 18t⁵− 2t²− 2t + 10.

Is T is a linear mapping?

Solution. Let f(t) and g(t) be polynomials of degree no more than n ≥ 1. Then

T(f(t) + g(t)) = 2(f(t) + g(t)) + (f(t) + g(t))^′

= 2f(t) + 2g(t) + f^′(t) + g^′(t)

= (2f(t) + f^′(t)) + (2g(t) + g^′(t))

= T(f(t)) + T(g(t)).

Therefore, T(f(t) + g(t)) = T(f(t)) + T(g(t)). Now let α be any scalar. Then

T(αf(t)) = 2(αf(t)) + (αf(t))^′

= 2αf(t) + αf^′(t)

= α(2f(t) + f^′(t))

= αT(f(t)).

Therefore, T(αf(t)) = αT(f(t)). Therefore, T is a linear mapping.

We now introduce two important subsets associated to a linear mapping.

Definition 15.5: Let T : V → U be a linear mapping.

The kernel of T is the set of vectors v in the domain V that get mapped to the zero vector, that is, T(v) = 0. We denote the kernel of T by ker(T):

ker(T) = {v ∈ V | T(v) = 0}.

The range of T is the set of vectors b in the codomain U for which there exists at least one v in V such that T(v) = b. We denote the range of T by Range(T):

Range(T) = {b ∈ U | there exists some v ∈ U such that T(v) = b}.

You may have noticed that the definition of the range of a linear mapping on an abstract vector space is the usual definition of the range of a function. Not surprisingly, the kernel and range are subspaces of the domain and codomain, respectively.

121 of 202

120

Linear Maps

Theorem 15.6: Let T : V → U be a linear mapping. Then ker(T) is a subspace of V and Range(T) is a subspace of U.

Proof. Suppose that v and u are in ker(T). Then T(v) = 0 and T(u) = 0. Then by linearity of T it holds that

T(v + u) = T(v) + T(u) = 0 + 0 = 0.

Therefore, since T(u + v) = 0 then u + v is in ker(T). This shows that ker(T) is closed under addition. Now suppose that α is any scalar and v is in ker(T). Then T(v) = 0 and thus by linearity of T it holds that

T(αv) = αT(v) = α0 = 0.

Therefore, since T(αv) = 0 then αv is in ker(T) and this proves that ker(T) is closed under scalar multiplication. Lastly, by linearity of T it holds that

T(0) = T(v − v) = T(v) − T(v) = 0

that is, T(0) = 0. Therefore, the zero vector 0 is in ker(T). This proves that ker(T) is a subspace of V. The proof that Range(T) is a subspace of U is left as an exercise.

Example 15.7. Let V = M_n_×_nbe the vector space of n × n matrices and let T : V → V be the mapping

T(A) = A + A^T.

Describe the kernel of T.

Solution. A matrix A is in the kernel of T if T(A) = A + A^T= 0, that is, if A^T= −A. Hence,

ker(A) = {A ∈ M_n_×_n| A^T= −A}.

What type of matrix A satisfies A^T= −A? For example, consider the case that A is the 2 × 2 matrix

A =

^a11 ^a12

^a21 ^a22

and A^T= −A. Then

^a11 ^a21

^a12 ^a22

−a₁₁−a₁₂

−a₂₁−a₂₂

Therefore, it must hold that a₁₁= −a₁₁, a₂₁= −a₁₂and a₂₂= −a₂₂. Then necessarily

a₁₁= 0 and a₂₂= 0 and a₁₂can be arbitrary. For example, the matrix

A =

0 7

−7 0

satisfies A^T= −A. Using a similar computation as above, a 3 × 3 matrix satisfies A^T= −A

if A is of the form

A = −a 0 c

−b −c 0

 ₀_b

 

122 of 202

121

Lecture 15

where a, b, c are arbitrary constants. In general, a matrix A that satisfies A^T= −A is called

skew-symmetric.

Example 15.8. Let V be the vector space of differentiable functions on the interval [a, b]. That is, f is an element of V if f : [a, b] → R is differentiable. Describe the kernel of the linear mapping T : V → V defined as

T(f(x)) = f(x) + f^′(x).

Solution. A function f is in the kernel of T if T(f(x)) = 0, that is, if f(x) + f^′(x) = 0. Equivalently, if f^′(x) = −f(x). What functions f do you know of satisfy f^′(x) = −f(x)? How about f(x) = e⁻^x? It is clear that f^′(x) = −e⁻^x= −f(x) and thus f(x) = e⁻^xis in ker(T). How about g(x) = 2e⁻^x? We compute that g^′(x) = −2e⁻^x= −g(x) and thus g is also in ker(T). It turns out that the elements of ker(T) are of the form f(x) = Ce⁻^xfor a constant C.

15.2 Null space and Column space

In the previous section, we introduced the kernel and range of a general linear mapping T : V → U. In this section, we consider the particular case of matrix mappings T_A: Rⁿ→ R^mfor some m×n matrix A. In this case, v is in the kernel of T_Aif and only if T_A(v) = Av = 0. In other words, v ∈ ker(T_A) if and only if v is a solution to the homogeneous system Ax = 0. Because the case when T is a matrix mapping arises so frequently, we give a name to the set of vectors v such that Av = 0.

Definition 15.9: The null space of a matrix A ∈ M_m_×_n, denoted by Null(A), is the subset of Rⁿconsisting of vectors v such that Av = 0. In other words, v ∈ Null(A) if and only if Av = 0. Using set notation:

Null(A) = {v ∈ Rⁿ| Av = 0}.

Hence, the following holds

ker(T_A) = Null(A).

Because the kernel of a linear mapping is a subspace we obtain the following.

Theorem 15.10: If A ∈ M_m_×_nthen Null(A) is a subspace of Rⁿ.

Hence, by Theorem 15.10, if u and v are two solutions to the linear system Ax = 0 then

αu + βv is also a solution:

A(αu + βv) = αAu + βAv = α · 0 + β · 0 = 0.

123 of 202

122

Linear Maps

Example 15.11. Let V = R⁴and consider the following subset of V:

W = {(x₁, x₂, x₃, x₄) ∈ R⁴| 2x₁− 3x₂+ x₃− 7x₄= 0}.

Is W a subspace of V?

Solution. The set W is the null space of the matrix 1 × 4 matrix A given by

A =2 −3 1 −7.

Hence, W = Null(A) and consequently W is a subspace.

From our previous remarks, the null space of a matrix A ∈ M_m_×_nis just the solution set of the homogeneous system Ax = 0. Therefore, one way to explicitly describe the null space of A is to solve the system Ax = 0 and write the general solution in parametric vector form. From our previous work on solving linear systems, if the rref(A) has r leading 1’s then the number of parameters in the solution set is d = n − r. Therefore, after performing back substitution, we will obtain vectors v₁, . . . , v_dsuch that the general solution in parametric vector form can be written as

x = t₁v₁+ t₂v₂+ · · · + t_dv_d

where t₁, t₂, . . . , t_dare arbitrary numbers. Therefore,

Null(A) = span{v₁, v₂, . . . , v_d}.

Hence, the vectors v₁, v₂, . . . , v_nform a spanning set for Null(A).

Example 15.12. Find a spanning set for the null space of the matrix

^−3 6 −1 1 −7^

 

A = 1 −2 2 3 −1 .

2 −4 5 8 −4

Solution. The null space of A is the solution set of the homogeneous system Ax = 0. Performing elementary row operations one obtains

−2 0 −1

_1 3

 

A ~ 0 0 1 2 −2 .

0 0 0 0 0

Clearly r = rank(A) and since n = 5 we will have d = 3 vectors in a spanning set for Null(A). Letting x₅= t₁, and x₄= t₂, then from the 2nd row we obtain

x₃= −2t₂+ 2t₁.

Letting x₂= t₃, then from the 1st row we obtain

x₁= 2t₃+ t₂− 3t₁.

124 of 202

123

Lecture 15

2 3

x = t 2 + t −2 + t 0

Writing the general solution in parametric vector form we obtain

₋₃  ₁ ₂

0 0 1

     

     

Therefore,

Null(A) = span





2 , −2 0

1 0

     

^     

` ˛¸ x ` ˛¸ x `˛¸x

v v

2 3

_₋₃  ₁ ₂_



^ ⁰  ⁰ ¹^



0 0





You can verify that Av₁= Av₂= Av₃= 0.

Now we consider the range of a matrix mapping T_A: Rⁿ→ R^m. Recall that a vector b in the co-domain R^mis in the range of T_Aif there exists some vector x in the domain Rⁿsuch that T_A(x) = b. Since, T_A(x) = Ax then Ax = b. Now, if A has columns A =v₁v₂· · · v_nand x = (x₁, x₂, . . . , x_n) then recall that

Ax = x₁v₁+ x₂v₂+ · · · + x_nv_n

and thus Ax = x₁v₁+ x₂v₂+ · · · + x_nv_n= b. Thus, a vector b is in the range of A if it can be written as a linear combination of the columns v₁, v₂, . . . , v_nof A. This motivates the following definition.

Definition 15.13: Let A ∈ M_m_×_nbe a matrix. The span of the columns of A is called the column space of A. The column space of A is denoted by Col(A). Explicitly, if A =v₁v₂· · · v_nthen

Col(A) = span{v₁, v₂, . . . , v_n}.

In summary, we can write that

Range(T_A) = Col(A). and since Range(T_A) is a subspace of R^mthen so is Col(A).

Theorem 15.14: The column space of a m × n matrix is a subspace of R^m.

125 of 202

124

Linear Maps

Example 15.15. Let

 ₂



4 −2

A = −

₁

3 7 −8 6



−5 7 3 ,

b = ^−1^.

 ₃

Is b in the column space Col(A)?

Solution. The vector b is in the column space of A if there exists x ∈ R⁴such that Ax = b. Hence, we must determine if Ax = b has a solution. Performing elementary row operations on the augmented matrixA bwe obtain

2 4 −2 1

A b ~ 0 1 −5 −4 −2

 ₃

 

0 0 0 17 1

The system is consistent and therefore Ax = b will have a solution. Therefore, b is in Col(A).

After this lecture you should know the following:

what the null space of a matrix is and how to compute it
what the column space of a matrix is and how to determine if a given vector is in the column space
what the range and kernel of a linear mapping is

126 of 202

Lecture 16

Linear Independence, Bases, and Dimension

16.1 Linear Independence

Roughly speaking, the concept of linear independence evolves around the idea of working with “efficient” spanning sets for a subspace. For instance, the set of directions

{EAST, NORTH, NORTH-EAST}

are redundant since a total displacement in the NORTH-EAST direction can be obtained by combining individual NORTH and EAST displacements. With these vague statements out of the way, we introduce the formal definition of what it means for a set of vectors to be “efficient”.

Definition 16.1: Let V be a vector space and let {v₁, v₂, . . . , v_p} be a set of vectors in

V. Then {v₁, v₂, . . . , v_p} is linearly independent if the only scalars c₁, c₂, . . . , c_pthat satisfy the equation

c₁v₁+ c₂v₂+ · · · + c_pv_p= 0

are the trivial scalars c₁= c₂= · · · = c_p= 0. If the set {v₁, . . . , v_p} is not linearly independent then we say that it is linearly dependent.

We now describe the redundancy in a set of linear dependent vectors. If {v₁, . . . , v_p} are linearly dependent, it follows that there are scalars c₁, c₂, . . . , c_p, at least one of which is nonzero, such that

c₁v₁+ c₂v₂+ · · · + c_pv_p= 0. (⋆)

For example, suppose that {v₁, v₂, v₃, v₄} are linearly dependent. Then there are scalars c₁, c₂, c₃, c₄, not all of them zero, such that equation (⋆) holds. Suppose, for the sake of argument, that c₃/= 0. Then,

v = − v

3 1

c₁c₂

c₄

— v − v .

127 of 202

126

Linear Independence, Bases, and Dimension

Therefore, when a set of vectors is linearly dependent, it is possible to write one of the vec- tors as a linear combination of the others. It is in this sense that a set of linearly dependent vectors are redundant. In fact, if a set of vectors are linearly dependent we can say even more as the following theorem states.

Theorem 16.2: A set of vectors {v₁, v₂, . . . , v_p}, with v₁/= 0, is linearly dependent if and only if some v_jis a linear combination of the preceding vectors v₁, . . . , v_j₋₁.

Example 16.3. Show that the following set of 2 × 2 matrices is linearly dependent:

A =

, A =

2 3

1 2 −1 3 5 0

0 −1 1 0 −2 −3

, A = .

Solution. It is clear that A₁and A₂are linearly independent, i.e., A₁cannot be written as a scalar multiple of A₂, and vice-versa. Since the (2, 1) entry of A₁is zero, the only way to get the −2 in the (2, 1) entry of A₃is to multiply A₂by −2. Similary, since the (2, 2) entry of A₂is zero, the only way to get the −3 in the (2, 2) entry of A₃is to multiply A₁by 3. Hence, we suspect that 3A₁− 2A₂= A₃. Verify:

3A₁− 2A₂=

—

0 −3 2

5 0

3 6 −2 6

0 −2 −3

= = A

Therefore, 3A₁− 2A₂− A₃= 0 and thus we have found scalars c₁, c₂, c₃not all zero such that c₁A₁+ c₂A₂+ c₃A₃= 0.

16.2 Bases

We now introduce the important concept of a basis. Given a set of vectors {v₁, . . . , v_p₋₁, v_p} in V, we showed that W = span{v₁, v₂, . . . , v_p} is a subspace of V. If say v_pis linearly dependent on v₁, v₂, . . . , v_p₋₁then we can remove v_pand the smaller set {v₁, . . . , v_p₋₁} still spans all of W:

W = span{v₁, v₂, . . . , v_p₋₁, v_p} = span{v₁, . . . , v_p₋₁}.

Intuitively, v_pdoes not provide an independent “direction” in generating W. If some other vector v_jis linearly dependent on v₁, . . . , v_p₋₁then we can remove v_jand the resulting smaller set of vectors still spans W. We can continue removing vectors until we obtain a minimal set of vectors that are linearly independent and still span W. The following remarks motivate the following important definition.

Definition 16.4: Let W be a subspace of a vector space V. A set of vectors B = {v₁, . . . , v_k}

in W is said to be a basis for W if

(a) the set B spans all of W, that is, W = span{v₁, . . . , v_k}, and

128 of 202

127

Lecture 16

(b) the set B is linearly independent.

A basis is therefore a minimal spanning set for a subspace. Indeed, if B = {v₁, . . . , v_p} is a basis for W and we remove say v_p, then B^˜= {v₁, . . . , v_p₋₁} cannot be a basis for W. Why? If B = {v₁, . . . , v_p} is a basis then it is linearly independent and therefore v_pcannot be written as a linear combination of the others. In other words, v_p∈ W is not in the span of B^˜= {v₁, . . . , v_p₋₁} and therefore B^˜is not a basis for W because a basis must be a spanning set. If, on the other hand, we start with a basis B = {v₁, . . . , v_p} for W and we add a new vector u from W then B^˜= {v₁, . . . , v_p, u} is not a basis for W. Why? We still have that span B^˜= W but now B^˜is not linearly independent. Indeed, because B = {v₁, . . . , v_p} is a basis for W, the vector u can be written as a linear combination of {v₁, . . . , v_p}, and thus B^˜is not linearly independent.

Example 16.5. Show that the standard unit vectors form a basis for V = R³:

₁ ₀ ₀

e = 0 ,

   

2 3

e = 1 , e = 0

 

0 0 1

Solution. Any vector x ∈ R³can be written as a linear combination of e₁, e₂, e₃:

_x₁ ₁ ₀ ₀

       

x = x = x 0 + x 1 + x 0 = x e + x e + x e

2 1 2 3 1 1 2 2 3 3

x₃0 0 1

Therefore, span{e₁, e₂, e₃} = R³. The set B = {e₁, e₂, e₃} is linearly independent. Indeed, if there are scalars c₁, c₂, c₃such that

c₁e₁+ c₂e₂+ c₃e₃= 0

then clearly they must all be zero, c₁= c₂= c₃= 0. Therefore, by definition, B = {e₁, e₂, e₃} is a basis for R³. This basis is called the standard basis for R³. Analogous arguments hold for {e₁, e₂, . . . , e_n} in Rⁿ.

Example 16.6. Is B = {v₁, v₂, v₃} a basis for R³?

 ₂ ₋₄  ₄

1 2

v = 0 , v = −

−4 8

2 , v = −6

−6

     

Solution. Form the matrix A = [v₁v₂v₃] and row reduce:







A ~ 0

1 0 0

1 0

0 0 1







129 of 202

128

Linear Independence, Bases, and Dimension

Therefore, the only solution to Ax = 0 is the trivial solution. Therefore, B is linearly inde-

pendent. Moreover, for any b ∈ R³, the augmented matrix A b is consistent. Therefore,

the columns of A span all of R³:

Col(A) = span{v₁, v₂, v₃} = R³.

Therefore, B is a basis for R³.

Example 16.7. In V = R⁴, consider the vectors

v =

 ₁  ₂

 

 

, v =

3 −1

−2

 

 

, v =

^−1^

 

 

−2 1 −3

Let W = span{v₁, v₂, v₃}. Is B = {v₁, v₂, v₃} a basis for W?

Solution. By definition, B is a spanning set for W, so we need only determine if B is linearly independent. Form the matrix, A = [v₁v₂v₃] and row reduce to obtain

A ~







	0	1
	1	−1
	0	0
0	0	0

_





Hence, rank(A) = 2 and thus B is linearly dependent. Notice v₁− v₂= v₃. Therefore, B is not a basis of W.

Example 16.8. Find a basis for the vector space of 2 × 2 matrices.

Example 16.9. Recall that a n × n is skew-symmetric A if A^T= −A. We proved that the set of n × n matrices is a subspace. Find a basis for the set of 3 × 3 skew-symmetric matrices.

16.3 Dimension of a Vector Space

The following theorem will lead to the definition of the dimension of a vector space.

Theorem 16.10: Let V be a vector space. Then all bases of V have the same number of vectors.

Proof: We will prove the theorem for the case that V = Rⁿ. We already know that the standard unit vectors {e₁, e₂, . . . , e_n} is a basis of Rⁿ. Let {u₁, u₂, . . . , u_p} be nonzero vec- tors in Rⁿand suppose first that p > n. In Lecture 6, Theorem 6.7, we proved that any set of vectors in Rⁿcontaining more than n vectors is automatically linearly dependent. The reason is that the RREF of A =u₁u₂· · · u_pwill contain at most r = n leading ones,

130 of 202

129

Lecture 16

and therefore d = p − n > 0. Therefore, the solution set of Ax = 0 contains non-trivial solutions. On the other hand, suppose instead that p < n. In Lecture 4, Theorem 4.11, we proved that a set of vectors {u₁, . . . , u_p} in Rⁿspans Rⁿif and only if the RREF of A has exactly r = n leading ones. The largest possible value of r is r = p < n. Therefore, if p < n then {u₁, u₂, . . . , u_p} cannot be a basis for Rⁿ. Thus, in either case (p > n or p < n), the set

{u₁, u₂, . . . , u_p} cannot be a basis for Rⁿ. Hence, any basis in Rⁿmust contain n vectors.

The previous theorem does not say that every set {v₁, v₂, . . . , v_n} of nonzero vectors in

Rⁿcontaining n vectors is automatically a basis for Rⁿ. For example,

₁ ₀ ₂

v = 0 ,

   

v = 1 , v = 3

 

do not form a basis for R³because

x = 0

₀

 

is not in the span of {v₁, v₂, v₃}. All that we can say is that a set of vectors in Rⁿcontaining fewer or more than n vectors is automatically not a basis for Rⁿ. From Theorem 16.10, any basis in Rⁿmust have exactly n vectors. In fact, on a general abstract vector space V, if

{v₁, v₂, . . . , v_n} is a basis for V then any other basis for V must have exactly n vectors also. Because of this result, we can make the following definition.

Definition 16.11: Let V be a vector space. The dimension of V, denoted dim V, is the number of vectors in any basis of V. The dimension of the trivial vector space V = {0} is defined to be zero.

There is one subtle issue we are sweeping under the rug: Does every vector space have a basis? The answer is yes but we will not prove this result here.

Moving on, suppose that we have a set B = {v₁, v₂, . . . , v_n} in Rⁿcontaining exactly n vectors. For B = {v₁, v₂, . . . , v_n} to be a basis of Rⁿ, the set B must be linearly independent and span B = Rⁿ. In fact, it can be shown that if B is linearly independent then the spanning condition span B = Rⁿis automatically satisfied, and vice-versa. For example, say the vec- tors {v₁, v₂, . . . , v_n} in Rⁿare linearly independent, and put A = [v₁v₂· · · v_n]. Then A⁻¹exists and therefore Ax = b is always solvable. Hence, Col(A) = span {v₁, v₂, . . . , v_n} = Rⁿ. In summary, we have the following theorem.

Theorem 16.12: Let B = {v₁, . . . , v_n} be vectors in Rⁿ. If B is linearly independent then B is a basis for Rⁿ. Or if span{v₁, v₂, . . . , v_n} = Rⁿthen B is a basis for Rⁿ.

131 of 202

130

Linear Independence, Bases, and Dimension

Example 16.13. Do the columns of the matrix A form a basis for R⁴?

A = ^



_

2 3 3 −2 ^

4	7	8	−6
0	0	1	0
−4	−6	−6	3



_

A =





Solution. Let v₁, v₂, v₃, v₄denote the columns of A. Since we have n = 4 vectors in Rⁿ, we need only check that they are linearly independent. Compute

det A = −2 /= 0

Hence, rank(A) = 4 and thus the columns of A are linearly independent. Therefore, the vectors v₁, v₂, v₃, v₄form a basis for R⁴.

A subspace W of a vector space V is a vector space in its own right, and therefore also has dimension. By definition, if B = {v₁, . . . , v_k} is a linearly independent set in W and span{v₁, . . . , v_k} = W, then B is a basis for W and in this case the dimension of W is k. Since an n-dimensional vector space V requires exactly n vectors in any basis, then if W is a strict subspace of V then

dim W < dim V.

As an example, in V = R³subspaces can be classified by dimension:

The zero dimensional subspace in R³is W = {0}.
The one dimensional subspaces in R³are lines through the origin. These are spanned by a single non-zero vector.
The two dimensional subspaces in R³are planes through the origin. These are spanned by two linearly independent vectors.
The only three dimensional subspace in R³is R³itself. Any set {v₁, v₂, v₃} in R³that is linearly independent is a basis for R³.

Example 16.14. Find a basis for Null(A) and the dim Null(A) if

^−2 4 −2 −4 ^_

−3 8 2 −3





2 −6 −3 1 .

Solution. By definition, the Null(A) is the solution set of the homogeneous system Ax = 0. Row reducing we obtain







A ~ 0

1 0 6 5

1 5/2 3/2

0 0 0 0







132 of 202

131

Lecture 16

x = t





−3/2

+ s

 

 

−5/2

The general solution to Ax = 0 in parametric form is

^−5 ^ −6 ^





= tv₁+ sv₂

By construction, the vectors

v =





−3/2

^−5 ^





, v =





−5/2

^−6 ^





span the null space (A) and they are linearly independent. Therefore, B = {v₁, v₂} is a basis for Null(A) and therefore dim Null(A) = 2. In general, the dimension of the Null(A) is the number of free parameters in the solution set of the system Ax = 0, that is,

dim Null(A) = d = n − rank(A)

Example 16.15. Find a basis for Col(A) and the dim Col(A) if

A =





^1 2 3 −4 8 ^

1 2 0 2 8

2 4 −3 10 9

3 6 0 6 9





Solution. By definition, the column space of A is the span of the columns of A, which we denote by A = [v₁v₂v₃v₄v₅]. Thus, to find a basis for Col(A), by trial and error we could determine the largest subset of the columns of A that are linearly independent. For example, first we determine if {v₁, v₂} is linearly independent. If yes, then add v₃and determine if

{v₁, v₂, v₃} is linearly independent. If {v₁, v₂} is not linearly independent then discard v₂

and determine if {v₁, v₃} is linearly independent. We continue this process until we have

determined the largest subset of the columns of A that is linearly independent, and this will yield a basis for Col(A). Instead, we can use the fact that matrices that are row equivalent induce the same solution set for the associated homogeneous system. Hence, let B be the RREF of A:

B = rref(A) =





^1 2 0 2 0 ^

0 0 1 −2 0

0 0 0 0 1

0 0 0 0 0





133 of 202

132

Linear Independence, Bases, and Dimension

By inspection, the columns b₁, b₃, b₅of B are linearly independent. It is easy to see that

b₂= 2b₁and b₄= 2b₁− 2b₃. These same linear relations hold for the columns of A:

A =





^1 2 3 −4 8 ^

1 2 0 2 8

2 4 −3 10 9

3 6 0 6 9





By inspection, v₂= 2v₁and v₄= 2v₁− 2v₃. Thus, because b₁, b₃, b₅are linearly inde- pendent columns of B =rref(A), then v₁, v₃, v₅are linearly independent columns of A. Therefore, we have

^ ₁  ₃  ₈^

1 3 5

Col(A) = span{v , v , v } = span



_

  

^  

−3

  

 

1 0  _, 8 ^

  

  ^

_

3 0 9

and consequently dim Col(A) = 3. This procedure works in general: To find a basis for the Col(A), row reduce A ~ B until you can determine which columns of B are linearly independent. The columns of A in the same position as the linearly independent columns of B form a basis for the Col(A).

WARNING: Do not take the linearly independent columns of B as a basis for Col(A). Always go back to the original matrix A to select the columns.

After this lecture you should know the following:

what it means for a set to be linearly independent/dependents
what a basis is (a spanning set that is linearly independent)
what is the meaning of the dimension of a vector space
how to determine if a given set in Rⁿis linearly independent
how to find a basis for the null space and column space of a matrix A

134 of 202

Lecture 17

17.1 The Rank of a Matrix

We now give the definition to the rank of a matrix.

Definition 17.1: The rank of a matrix A is the dimension of its column space. We will use rank(A) to denote the rank of A.

Recall that Col(A) = Range(T_A), and thus the rank of A is the dimension of the range of the linear mapping T_A. The range of a mapping is sometimes called the image.

We now define the nullity of a matrix.

Definition 17.2: The nullity of a matrix A is the dimension of its nullspace Null(A). We will use nullity(A) to denote the nullity of A.

Recall that (A) = ker(T_A), and thus the nullity of A is the dimension of the kernel of the linear mapping T_A.

The rank and nullity of a matrix are connected via the following fundamental theorem known as the Rank Theorem.

Theorem 17.3: (Rank Theorem) Let A be a m × n matrix. The rank of A is the number of leading 1’s in its RREF. Moreover, the following equation holds:

n = rank(A) + nullity(A).

Proof. A basis for the column space is obtained by computing rref(A) and identifying the columns that contain a leading 1. Each column of A corresponding to a column of rref(A) with a leading 1 is a basis vector for the column space of A. Therefore, if r is the number of leading 1’s then r = rank(A). Now let d = n − r. The number of free parameters in the

The Rank Theorem

135 of 202

134

The Rank Theorem

solution set of Ax = 0 is d and therefore a basis for Null(A) will contain d vectors, that is, nullity(A) = d. Therefore,

nullity(A) = n − rank(A).

Example 17.4. Find the rank and nullity of the matrix







1 −2 2 3 −6

A = ^0 −1 −3 1 1 .

−2 4 −3 −6 11

Solution. Row reduce far enough to identify where the leading entries are:

2R₁+R₂

 ₁



A −−−−→ 0

−1 −3 1 1

−2 2 3 −6 ^



0 0 1 0 −1

There are r = 3 leading entries and therefore rank(A) = 3. The nullity is therefore nullity(A) = 5 − rank(A) = 2.

Example 17.5. Find the rank and nullity of the matrix





A = −1 4 2

1 −3 −1 ^



−1 3 0

Solution. Row reduce far enough to identify where the leading entries are:

R₁+R₂,R₁+R₃

A −−−−−−−−→ 0 1 1

 ₁

−3 −1 ^

 

0 0 −1

There are r = 3 leading entries and therefore rank(A) = 3. The nullity is therefore nullity(A) = 3 − rank(A) = 0. Another way to see that nullity(A) = 0 is as follows. From the above computation, A is invertible. Therefore, there is only one vector in Null(A) = {0}. The subspace {0} has dimension zero.

Using the rank and nullity of a matrix, we now provide further characterizations of invertible matrices.

Theorem 17.6: Let A be a n × n matrix. The following statements are equivalent:

The columns of A form a basis for Rⁿ.
Col(A) = Rⁿ
rank(A) = n
Null(A) = {0}

136 of 202

135

Lecture 17

nullity(A) = 0
A is an invertible matrix.

After this lecture you should know the following:

what the rank of a matrix is and how to compute it
what the nullity of a matrix is and how to compute it
the Rank Theorem

137 of 202

136

The Rank Theorem

138 of 202

Lecture 18

Coordinate Systems

Coordinates

Recall that a basis of a vector space V is a set of vectors B = {v₁, v₂, . . . , v_n} in V such that

the set B spans all of V, that is, V = span(B), and
the set B is linearly independent.

Hence, if B is a basis for V, each vector x^∗∈ V can be written as a linear combination of B:

x^∗= c₁v₁+ c₂v₂+ · · · + c_nv_n.

Moreover, from the definition of linear independence given in Definition 6.1, any vector x ∈ span(B) can be written in only one way as a linear combination of v₁, . . . , v_n. In other words, for the x^∗above, there does not exist other scalars t₁, . . . , t_nsuch that also

x^∗= t₁v₁+ t₂v₂+ · · · + t_nv_n.

To see this, suppose that we can write x^∗in two different ways using B:

x^∗= c₁v₁+ c₂v₂+ · · · + c_nv_n

x^∗= t₁v₁+ t₂v₂+ · · · + t_nv_n.

Then

0 = x^∗− x^∗= (c₁− t₁)v₁+ (c₂− t₂)v₂+ · · · + (c_n− t_n)v_n.

Since B = {v₁, . . . , v_n} is linearly independent, the only linear combination of v₁, . . . , v_nthat gives the zero vector 0 is the trivial linear combination. Therefore, it must be the case that c_i− t_i= 0, or equivalently that c_i= t_ifor all i = 1, 2 . . . , n. Thus, there is only one way to write x^∗in terms of B = {v₁, . . . , v_n}. Hence, relative to the basis B = {v₁, v₂, . . . , v_n}, the scalars c₁, c₂, . . . , c_nuniquely determine the vector x, and vice-versa.

Our preceding discussion on the unique representation property of vectors in a given basis leads to the following definition.

139 of 202

138

[x] =

Coordinate Systems

Definition 18.1: Let B = {v₁, . . . , v_n} be a basis for V and let x ∈ V. The coordinates of x relative to the basis B are the unique scalars c₁, c₂, . . . , c_nsuch that

x = c₁v₁+ c₂v₂+ · · · + c_nv_n.

In vector notation, the B-coordinates of x will be denoted by

_c₁

 

 

c_n

and we will call [x]_Bthe coordinate vector of x relative to B.

The notation [x]_Bindicates that these are coordinates of x with respect to the basis B. If it is clear what basis we are working with, we will omit the subscript B and simply write

[x] for the coordinates of x relative to B.

Example 18.2. One can verify that

B =

1 −1

is a basis for R . Find the coordinates of v =

relative to B.

Solution. Let v₁= (1, 1) and let v₂= (−1, 1). By definition, the coordinates of v with respect to B are the scalars c₁, c₂such that

1 1 2 2

v = c v + c v =

1 −1 c

1 1 c₂

If we put P = [v₁v₂], and let [v]_B= (c₁, c₂), then we need to solve the linear system

v = P[v]_B

Solving the linear system, one finds that the solution is [v]_B= (2, −1), and therefore this is the B-coordinate vector of v, or the coordinates of v, relative to B.

It is clear how the procedure of the previous example can be generalized. Let B =

{v₁, v₂, . . . , v_n} be a basis for Rⁿand let v be any vector in Rⁿ. Put P = v₁v₂· · ·

v_n.

Then the B-coordinates of v is the unique column vector [v]_Bsolving the linear system

Px = v

140 of 202

139

Lecture 18

that is, x = [v]_Bis the unique solution to Px = v. Because v₁, v₂, . . . , v_nare linearly independent, the solution to Px = v is

[v]_B= P⁻¹v.

We remark that if an inconsistent row arises when you row reduce the augmented matrix

P v then you have made an error in your row reduction algorithm. In summary, to find

coordinates with respect to a basis B in Rⁿ, we need to solve a square linear system.

Example 18.3. Let

₃ ₋₁  ₃

v =

6 , v 0

   

= , x = 12

 

2 1 7

and let B = {v₁, v₂}. One can show that B is linearly independent and therefore a basis for

W = span{v₁, v₂}. Determine if x is in W, and if so, find the coordinate vector of x relative to B.

Solution. By definition, x is in W = span{v₁, v₂} if we can write x as a linear combination

of v₁, v₂:

x = c₁v₁+ c₂v₂

Form the associated augmented matrix and row reduce:

  

_3 3 ₁

−1 0

6 0 12 ~ 0

₂

1 3

2 1 7 0 0 0



The system is consistent with solution c₁= 2 and c₂= 3.

Therefore, x is in W, and the

B-coordinates of x are

[x]_B=

Example 18.4. What are the coordinates of

 ₃

v = 11

 

−7

in the standard basis E = {e₁, e₂, e₃}?

Solution. Clearly,

v =

 ₃ ₁

−7

   

11 = 3 0 + 11

₀ ₀

1 − 7 0

   

141 of 202

140

Coordinate Systems

Therefore, the coordinate vector of v relative to {e₁, e₂, e₃} is

 ₃

[v] = 11

−7

 

Example 18.5. Let P₃[t] be the vector space of polynomials of degree at most 3.

Show that B = {1, t, t , t } is a basis for P₃[t].

2 3

Find the coordinates of v(t) = 3 − t − 7t relative to B.

Solution. The set B = {1, t, t², t³} is a spanning set for P₃[t]. Indeed, any polynomial u(t) = c₀+ c₁t + c₂t²+ c₃t³is clearly a linear combination of 1, t, t², t³. Is B linearly independent? Suppose that there exists scalars c₀, c₁, c₂, c₃such that

c₀+ c₁t + c₂t²+ c₃t³= 0.

Since the above equality must hold for all values of t, we conclude that c₀= c₁= c₂= c₃= 0. Therefore, B is linearly independent, and consequently a basis for P₃[t]. In the basis B, the coordinates of v(t) = 3 − t²− 7t³are

 ₃

[v(t)] =

−1

 

 

−7

The basis B = {1, t, t², t³} is called the standard basis in P₃[t].

Example 18.6. Show that

B =

, ,

1 0 0 1 0 0 0 0

0 0 0 0 1 0 0 1

is a basis for M₂_×₂. Find the coordinates of A =

3 0

−4 −1

relative to B.

Solution. Any matrix M =

m m

^m21 ^m22

can be written as a linear combination of the ma-

trices in B:

m m

^m21 ^m22

= m

1 0

0 0

+ m

0 1

0 0

+ m

0 0

1 0

+ m

0 0

0 1

1 0

0 0

+ c

0 1

0 0

+ c

0 0

1 0

+ c

0 0

0 1

c c

1 2

c₃c₄

0 0

142 of 202

141

Lecture 18

1 2 3 4

then clearly c = c = c = c = 0. Therefore, B is linearly independent, and consequently

a basis for M₂_×₂. The coordinates of A =

3 0

−4 −1

in the basis

B =

, ,

1 0 0 1 0 0 0 0

0 0 0 0 1 0 0 1

are

[A] =

 ₃

−4

 

 

−1

The basis B above is the standard basis of M₂_×₂.

18.2 Coordinate Mappings

Let B = {v₁, v₂, . . . , v_n} be a basis of Rⁿand let P = [v₁v₂· · · v_n] ∈ M_n_×_n. If x ∈ Rⁿand [x]_Bare the B-coordinates of x relative to B then

x = P[x]_B. (⋆)

Hence, thinking of P : Rⁿ→ Rⁿas a linear mapping, P maps B-coordinate vectors to

coordinate vectors relative to the standard basis of Rⁿ. For this reason, we call P the change-of-coordinates matrix from the basis B to the standard basis in Rⁿ. If we need to emphasize that P is constructed from the basis B we will write P_Binstead of just P. Multiplying equation (⋆) by P⁻¹we obtain

P⁻¹x = [x]_B.

Therefore, P⁻¹maps coordinate vectors in the standard basis to coordinates relative to B.

Example 18.7. The columns of the matrix P form a basis B for R³:

1 3

P = −

 ₃

 

1 −4 −2 .

0 0 −1

What vector x ∈ R³has B-coordinates [x]_B= (1, 0, −1).
Find the B-coordinates of v = (2, −1, 0).

Solution. The matrix P maps B-coordinates to standard coordinates in R³. Therefore,

₋₂

x = P[x] =

 

143 of 202

142

Coordinate Systems

On the other hand, the inverse matrix P⁻¹maps standard coordinates in R³to B-coordinates.

One can verify that

−1

4 3

P = −

1 −1 −1

 ₆

 

0 0 −1

Therefore, the B coordinates of v are

−1





4 3

[v]_B= P v = −

₆  ₂  ₅

1 −1 −1 −

0 0 −1 0

1 = −1

    

When V is an abstract vector space, e.g. P_n[t] or M_n_×_n, the notion of a coordinate mapping is similar as the case when V = Rⁿ. If V is an n-dimensional vector space and B = {v₁, v₂, . . . , v_n} is a basis for V, we define the coordinate mapping P : V → Rⁿrelative to B as the mapping

P(v) = [v]_B.

Example 18.8. Let V = M₂_×₂and let B = {A₁, A₂, A₃, A₄} be the standard basis for

M₂_×₂. What is P : M₂_×₂→ R⁴?

Solution. Recall,

1 2 3 4

B = {A , A , A , A } =

1 0

0 0

1 0 0

0 0 0 0 1 0 0 1

Then for any A =

a a

^a21 ^a22

we have

^a11 ^a12

^a21 ^a22

^a22

^^a11^

 

 

18.3 Matrix Representation of a Linear Map

Let V and W be vector spaces and let T : V → W be a linear mapping. Then by definition of a linear mapping, T(v + u) = T(v) + T(u) and T(αv) = αT(v) for every v, u ∈ V and α ∈ R. Let B = {v₁, v₂, . . . , v_n} be a basis of V and let γ = {w₁, w₂, . . . , w_m} be a basis of

W. Then for any v ∈ V there exists scalars c₁, c₂, . . . , c_nsuch that

v = c₁v₁+ c₂v₂+ · · · + c_nv_n

144 of 202

143

Lecture 18

and thus [v]_B= (c₁, c₂, . . . , c_n) are the coordinates of v in the basis B By linearity of the mapping T we have

T(v) = T(c₁v₁+ c₂v₂+ · · · + c_nv_n)

= c₁T(v₁) + c₂T(v₂) + · · · + c_nT(v_n)

Now each vector T(v_j) is in W and therefore because γ is a basis of W there are scalars

a₁_,j, a₂_,j, . . . , a_m,jsuch that

T(v_j) = a₁_,jw₁+ a₂_,jw₂+ · · · + a_m,jw_m

In other words,

[T(v_j)]_γ= (a₁_,j, a₂_,j, . . . , a_m,j)

Substituting T(v_j) = a₁_,jw₁+ a₂_,jw₂+ · · · + a_m,jw_mfor each j = 1, 2, . . . , n into

T(v) = c₁T(v₁) + c₂T(v₂) + · · · + c_nT(v_n) and then simplifying we get

m n

Σ Σ

i=1 j=1

^cj ^ai,j

T(v) = w_i

Therefore,

[T(v)]_γ= A[v]_B

where A is the m × n matrix given by

A =[T(v₁)]_γ[T(v₂)]_γ· · · [T(v_n)]_γ

The matrix A is the matrix representation of the linear mapping T in the bases B and γ.

Example 18.9. Consider the vector space V = P₂[t] of polynomial of degree no more than two and let T : V → V be defined by

T(v(t)) = 4v^′(t) − 2v(t)

It is straightforward to verify that T is a linear mapping. Let

B = {v₁, v₂, v₃} = {t − 1, 3 + 2t, t + 1}.

Verify that B is a basis of V.
Find the coordinates of v(t) = −t²+ 3t + 1 in the basis B.
Find the matrix representation of T in the basis B.

145 of 202

144

Coordinate Systems

Solution. (a) Suppose that there are scalars c₁, c₂, c₃such that

c₁v₁+ c₂v₂+ c₃v₃= 0 Then expanding and then collecting like terms we obtain

c₃t²+ (c₁+ 2c₂)t + (−c₁+ 3c₂+ c₃) = 0

Since the above holds for all t ∈ R we must have

c₃= 0, c₁+ 2c₂= 0, −c₁+ 3c₂+ c₃= 0

Solving for c₁, c₂, c₃we obtain c₁= 0, c₂= 0, c₃= 0. Hence, the only linear combination of the vectors in B that produces the zero vector is the trivial linear combination. This proves by definition that B is linearly independent. Since we already know that dim(P₂) = 3 and B contains 3 vectors, then B is a basis for P₂

The coordinates of v(t) = −t²+ 3t + 1 are the unique scalars (c₁, c₂, c₃) such that

c₁v₁+ c₂v₂+ c₃v₃= v

In this case the linear system is

c₃= −1, c₁+ 2c₂= 3, −c₁+ 3c₂+ c₃= 1

and solving yields c₁= 1, c₂= 1, and c₃= −1. Hence,

[v]_B= (1, 1, −1)

The matrix representation A of T is

1 B

A = [T(v )] [T(v₂)]_B[T(v₃)]_B

Now we compute directly that

T(v₁) = −2t + 6, T(v₂) = −4t + 2, T(v₃) = −2t²+ 8t − 2

And then one computes that

−18/5

[T(v₁)]_B= ^4/5

 

 _,

2 B

[T(v )] = −

^−6/5^

 

3 B

2/5 , [T(v )] = 8/5

−2

^24/5^

 

And therefore

A =

_



−18/5 −6/5 24/5^

4/5 −2/5 8/5

0 −2





146 of 202

145

Lecture 18

After this lecture you should know the following:

what coordinates are (you need a basis)
how to find coordinates relative to a basis
the interpretation of the change-of-coordinates matrix as a mapping that transforms one set of coordinates to another

147 of 202

146

Coordinate Systems

148 of 202

Lecture 19

Change of Basis

19.1 Review of Coordinate Mappings on Rⁿ

Let B = {v₁, . . . , v_n} be a basis for Rⁿand let

P_B= [v₁v₂· · · v_n].

If x ∈ Rⁿand [x]_Bis the coordinate vector of x in the basis B then

x = P_B[x]_B.

The components of the vector x are the coordinates of x in the standard basis E = {e₁, . . . , e_n}. In other words,

[x]_E= x.

Therefore,

[x]_E= P_B[x]_B.

We can therefore interpret P_Bas the matrix mapping that maps the B-coordinates of x to the E-coordinates of x. To make this more explicit, we sometimes use the notation

E ^PB

to indicate that _EP_Bmaps B-coordinates to E-coordinates:

[x]_E= (_EP_B)[x]_B.

If we multiply the equation

[x]_E= (_EP_B)[x]_B

on the left by the inverse of _EP_Bwe obtain

(_EP_B)⁻¹[x]_E= [x]_B

Hence, the matrix (_EP_B)⁻¹maps standard coordinates to B-coordinates, see Figure 19.1. It is natural then to introduce the notation

_BP_E= (_EP_B)⁻¹

149 of 202

148

Change of Basis

V = Rⁿ

_BP_E= (_EP_B)⁻¹

[x]_B

Figure 19.1: The matrix _BP_Emaps E coordinates to B coordinates.

Example 19.1. Let

₁

 

v = 0 , v

1 2

₋₃

 

= 4 , v

= ^−6^, x = ^2 ^.

3 3

 ₃

₋₈

Show that the set of vectors B = {v₁, v₂, v₃} forms a basis for Rⁿ.
Find the change-of-coordinates matrix from B to standard coordinates.
Find the coordinate vector [x]_Bfor the given x.

Solution. Let

₁



−3

P = 0

₃

4 −6

0 0 3



It is clear that det(P_B) = 12, and therefore v₁, v₂, v₃are linearly independent. Therefore, B is a basis for Rⁿ. The matrix P_Btakes B-coordinates to standard coordinates. The B-coordinate vector [x]_B= (c₁, c₂, c₃) is the unique solution to the linear system

x = P_B[x]_B

Solving the linear system with augmented matrix [P_Bx] we obtain

[x]_B= (−5, 2, 1)

We verify that [x]_B= (−5, 2, 1) are indeed the coordinates of x = (−8, 2, 3) in the basis

150 of 202

149

Lecture 19

B = {v₁, v₂, v₃}:

1 2 3

(−5)v + (2)v + (1)v = −

5 0 + 2 4 + −6

₁ ₋₃  ₃

     

0 0 3

₋₅ ₋₆  ₃

= ^0 ^+ ^8 ^+ ^−6^

0 0 3

₋₈

= 2

 

` ˛_x¸ x

19.2 Change of Basis

We saw in the previous section that the matrix

E ^PB

takes as input the B-coordinates [x]_Bof a vector x and returns the coordinates of x in the standard basis. We now consider the situation of dealing with two basis B and C where neither is assumed to be the standard basis E. Hence let B = {v₁, v₂, . . . , v_n} and let C = {w₁, . . . , w_n} be two basis of Rⁿand let

_EP_B= [v₁v₂· · · v_n]

_EP_C= [w₁w₂· · · w_n].

Then if [x]_Cis the coordinate vector of x in the basis C then

x = (_EP_C)[x]_C.

How do we transform B-coordinates of x to C-coordinates of x, and vice-versa? To answer this question, start from the relations

x = (_EP_B)[x]_B

x = (_EP_C)[x]_C.

Then

(_EP_C)[x]_C= (_EP_B)[x]_B

and because _EP_Cis invertible we have that

[x]_C= (_EP_C)⁻¹(_EP_B)[x]_B.

151 of 202

[x]_C

150

[x]_B

C^PB

Change of Basis

Hence, the matrix (_EP_C)⁻¹(_EP_B) maps the B-coordinates of x to the C-coordinates of x. For this reason, it is natural to use the notation (see Figure 19.2)

_CP_B= (_EP_C)⁻¹(_EP_B).

V = Rⁿ

E ^PB

E ^PC

Figure 19.2: The matrix _CP_Bmaps B-coordinates to C-coordinates.

If we expand (_EP_C)⁻¹(_EP_B) we obtain that

(_EP_C)⁻¹(_EP_B) =(_EP_C)⁻¹v₁(_EP_C)⁻¹v₂· · · (_EP_C)⁻¹v_n.

Therefore, the ith column of (_EP_C)⁻¹(_EP_B), namely

(_EP_C)⁻¹v_i,

is the coordinate vector of v_iin the basis C = {w₁, w₂, . . . , w_n}. To compute _CP_Bwe augment _EP_Cand _EP_Band row reduce fully:

E ^PC E ^PB^~^In C^PB^.

Example 19.2. Let

B =

1 −2

, , C = ,

−7 −5

−3 4 9 7

It can be verified that B = {v₁, v₂} and C = {w₁, w₂} are bases for R².

Find the matrix the takes B-coordinates to C-coordinates.
Find the matrix that takes C-coordinates to B-coordinates.
Let x = (0, −2). Find [x]_Band [x]_C.

Solution. The matrix _EP_B= [v₁v₂] maps B-coordinates to standard E-coordinates. The matrix _EP_C= [w₁w₂] maps C-coordinates to standard E-coordinates. As we just showed, the matrix that maps B-coordinates to C-coordinates is

_CP_B= (_EP_C)⁻¹(_EP_B)

152 of 202

151

Lecture 19

It is straightforward to compute that

−1

(_EP_C) =

^"−7/4

9/4

−5/4^#

7/4

Therefore,

−1

_CP_B= (_EP_C) (_EP_B) =

# "

−7/4 −5/4 1 −2

9/4 7/4 −3 4

# "

2 −3/2

−3 5/2

To compute _BP_C, we can simply invert _CP_B. One finds that

C B

−1

( P ) =

5 3

6 4

and therefore

B C

P =

5 3

6 4

Given that x = (0, −2), to find [x]_Bwe must solve the linear system

_EP_B[x]_B= x

Row reducing the augmented matrix [_EP_Bx] we obtain

[x] =

Next, to find [x]_Cwe can solve the linear system

_EP_C[x]_C= x

Alternatively, since we now know [x]_Band _CP_Bhas been computed, to find [x]_Cwe simply multiply _CP_Bby [x]_B:

" ₂# " # "

−3/2 2 5/2

[x]_C= _CP_B[x]_B= =

−3 5/2 1 −7/2

Let’s verify that [x]_C=

5/2

−7/2

are indeed the C-coordinates of x =

−2

_EP_C[x]_C=

^"−7 −5^{# "}5/2 0

# " #

9 7 −7/2 −2

After this lecture you should know the following:

how to compute a change of basis matrix
and how to use the change of basis matrix to map one set of coordinates into another

153 of 202

152

Change of Basis

154 of 202

Lecture 20

153

Lecture 20

Inner Products and Orthogonality

20.1 Inner Product on Rⁿ

The inner product on Rⁿgeneralizes the notion of the dot product of vectors in R²and R³

that you may are already familiar with.

Definition 20.1: Let u = (u₁, u₂, . . . , u_n) and let v = (v₁, v₂, . . . , v_n) be vectors in Rⁿ.

The inner product of u and v is

u ^•v = u₁v₁+ u₂v₂+ · · · + u_nv_n.

Notice that the inner product u ^•v can be computed as a matrix multiplication as follows:

_v₁

•

u v = u^Tv =u u · · · u

 

 

v_n

The following theorem summarizes the basic algebraic properties of the inner product.

Theorem 20.2: Let u, v, w be vectors in Rⁿand let α be a scalar. Then

u ^•v = v ^•u
(u + v) ^•w = u ^•w + v ^•w
(αu) ^•v = α(u ^•v) = u ^•(αv)
u ^•u ≥ 0, and u ^•u = 0 if and only if u = 0

155 of 202

154

Inner Products and Orthogonality

Example 20.3. Let u = (2, −5, −1) and let v = (3, 2, −3). Compute u ^•v, v ^•u, u ^•u, and

v ^•v.

Solution. By definition:

u ^•v = (2)(3) + (−5)(2) + (1)(−3) = −1

v ^•u = (3)(2) + (2)(−5) + (−3)(1) = −1

u ^•u = (2)(2) + (−5)(−5) + (−1)(−1) = 30

v ^•v = (3)(3) + (2)(2) + (−3)(−3) = 22.

We now define the length or norm of a vector in Rⁿ.

Definition 20.4: The length or norm of a vector u ∈ Rⁿis defined as

√

•

u = u u = u + u + · · · + u .

A vector u ∈ Rⁿwith norm 1 will be called a unit vector:

u = 1.

Below is an important property of the inner product.

Theorem 20.5: Let u ∈ Rⁿand let α be a scalar. Then

αu = |α| u .

Proof. We have

αu = ^√(αu) ^•(αu)

= ^√α²(u ^•u)

= |α|^√u ^•u

= |α| u .

By Theorem 20.5, any non-zero vector u ∈ Rⁿcan be scaled to obtain a new unit vector in the same direction as u. Indeed, suppose that u is non-zero so that u =/ 0. Define the

new vector

v =

156 of 202

155

Lecture 20

Notice that α = ¹is just a scalar and thus v is a scalar multiple of u. Then by Theorem 20.5

we have that

v = αu = |α| · u =

· u = 1

and therefore v is a unit vector, see Figure 20.1. The process of taking a non-zero vector u

and creating the new vector v = ¹u is sometimes called normalization of u.

v = ¹u

Figure 20.1: Normalizing a non-zero vector.

Example 20.6. Let u = (2, 3, 6). Compute u and find the unit vector v in the same direction as u.

Solution. By definition,

u = ^√u ^•u = ^√2²+ 3²+ 6²= ^√49 = 7.

Then the unit vector that is in the same direction as u is

v =

u =

₂ ₂_/₇

3 = 3/7

6/7

   

Verify that v = 1:

v = ^√(2/7)²+ (3/7)²+ (6/7)²= ^√4/49 + 9/49 + 36/49 = ^√49/49 = ^√1 = 1.

Now that we have the definition of the length of a vector, we can define the notion of distance between two vectors.

Definition 20.7: Let u and v be vectors in Rⁿ. The distance between u and v is the length of the vector u − v. We will denote the distance between u and v by d(u, v). In other words,

d(u, v) = u − v .

Example 20.8. Find the distance between u =

−2

and v =

−9

7 _.

Solution. We compute:

d(u, v) = u − v = ^√(3 − 7)²+ (−2 + 9)²= ^√65.

157 of 202

156

Inner Products and Orthogonality

20.2 Orthogonality

In the context of vectors in R²and R³, orthogonality is synonymous with perpendicularity. Below is the general definition.

Definition 20.9: Two vectors u and v in Rⁿare said to be orthogonal if u ^•v = 0.

In R²and R³, the notion of orthogonality should be familiar to you. In fact, using the Law of Cosines in R²or R³, one can prove that

u ^•v = u · v cos(θ) (20.1)

where θ is the angle between u and v. If θ = ₂then clearly u ^•v = 0. In higher dimensions,

i.e., n ≥ 4, we can use equation (20.1) to define the angle between vectors u and v. In other words, the angle between any two vectors u and v in Rⁿis define to be

θ = arccos

u · v

The general notion of orthogonality in Rⁿleads to the following theorem from grade school.

Theorem 20.10: (Pythagorean Theorem) Two vectors u and v are orthogonal if and only if u + v ²= u ²+ v ².

Solution. First recall that u + v = ^√(u + v) ^•(u + v) and therefore

u + v = (u + v) ^•(u + v)

= u ^•u + u ^•v + v ^•u + v ^•v

= u ²+ 2(u ^•v) + v ².

Therefore, u + v ²= u ²+ v ²if and only if u ^•v = 0.

We now introduce orthogonal sets.

Definition 20.11: A set of vectors {u₁, u₂, . . . , u_p} is said to be an orthogonal set if any pair of distinct vectors u_i, u_jare orthogonal, that is, u_i^•u_j= 0 whenever i /= j.

In the following theorem we prove that orthogonal sets are linearly independent.

158 of 202

157

Lecture 20

Theorem 20.12: Let {u₁, u₂, . . . , u_p} be an orthogonal set of non-zero vectors in Rⁿ.

1 2 p

Then the set {u , u , . . . , u } is linearly independent. In particular, if p = n then the set

1 2 n

{u , u , . . . , u } is basis for R .

Solution. Suppose that there are scalars c₁, c₂, . . . , c_psuch that

c₁u₁+ c₂u₂+ · · · + c_pu_p= 0.

Take the inner product of u₁with both sides of the above equation:

c₁(u₁^•u₁) + c₂(u₂^•u₁) + · · · + c_p(u_p^•u₁) = 0 ^•u₁.

Since the set is orthogonal, the left-hand side of the last equation simplifies to c₁(u₁^•u₁). The right-hand side simplifies to 0. Hence,

c₁(u₁^•u₁) = 0.

But u₁^•u₁= u₁is not zero and therefore the only way that c₁(u₁^•u₂) = 0 is if c₁= 0. Repeat the above steps using u₂, u₃, . . . , u_pand conclude that c₂= 0, c₃= 0, . . . , c_p=

0. Therefore, {u₁, . . . , u_p} is linearly independent. If p = n, then the set {u₁, . . . , u_p} is automatically a basis for Rⁿ.

Example 20.13. Is the set {u₁, u₂, u₃} an orthogonal set?

 ₁ ₀ ₋₅

u = −

   

2 3

2 , u = 1 , u = −

 

Solution. Compute

u₁^•u₂= (1)(0) + (−2)(1) + (1)(2) = 0

u₁^•u₃= (1)(−5) + (−2)(−2) + (1)(1) = 0

u₂^•u₃= (0)(−5) + (1)(−2) + (2)(1) = 0

1 2 3

Therefore, {u , u , u } is an orthogonal set. By Theorem 20.12, the set {u , u , u } is linearly

independent. To verify linear independence, we computed that det( u₁

u₂u₃) = 30,

which is non-zero.

159 of 202

158

Inner Products and Orthogonality

We now introduce orthonormal sets.

Definition 20.14: A set of vectors {u₁, u₂, . . . , u_p} is said to be an orthonormal set if it is an orthogonal set and if each vector u_iin the set is a unit vector.

Consider the previous orthogonal set in R³:

1 2 3





{u , u , u } = −

1 2 1

     

^     ^

⁰⁻⁵



2 , 1 , −2 .

1 2 3

It is not an orthonormal set because none of u , u , u are unit vectors. Explicitly, u₁=

√ √ √

6, u₂= 5, and u₃= 30. However, from an orthogonal set we can create an

orthonormal set by normalizing each vector. Hence, the set

1 2 3







{v , v , v } = −

√

1/ 6

 

√

2/ 5

 

1/ 6 ⁰−5/ 30

2/ 6 , ¹^/⁵, −

√

2/ 30

√

1/ 30



^ ^√    ^√^





is an orthonormal set.

20.3 Coordinates in an Orthonormal Basis

As we will see in this section, a basis B = {u₁, u₂, . . . , u_n} of Rⁿthat is also an orthonormal set is highly desirable when performing computations with coordinates. To see why, let x be any vector in Rⁿand suppose we want to find the coordinates of x in the basis B, that is we seek to find [x]_B= (c₁, c₂, . . . , c_n). By definition, the coordinates c₁, c₂, . . . , c_nsatisfy the

equation

x = c₁u₁+ c₂u₂+ · · · + c_nu_n.

Taking the inner product of u₁with both sides of the above equation and using the fact that

u₁^•u₂= 0, u₁^•u₃= 0, and u₁^•u_n= 0, we obtain

u₁^•x = c₁(u₁^•u₁) = c₁(1) = c₁

where we also used the fact that u_iis a unit vector. Thus, c₁= u₁^•x! Repeating this procedure with u₂, u₃, . . . , u_nwe obtain the remaining coefficients c₂, . . . , c_n:

c₂= u₂^•x

c₃= u₃^•x

.^.= .^.

c_n= u_n^•x.

Our previous computation proves the following theorem.

160 of 202

159

[x] =





•

u x

u_n^•x

Lecture 20

Theorem 20.15: Let B = {u₁, u₂, . . . , u_n} be an orthonormal basis for Rⁿ. The coordi- nate vector of x in the basis B is

_u₁_•_x







[x] =





•

u x

u_n^•x

 

u x u

. .

_uT

Hence, computing coordinates with respect to an orthonormal basis can be done without performing any row operations and all we need to do is compute inner products! We make the important observation that an alternate expression for [x]_Bis

   

^T

2 

  

  

x = U x

where U = [u₁u₂· · · u_n]. On the other hand, recall that by definition [x]_Bsatisfies

U[x]_B= x, and therefore [x]_B= U⁻¹x. If we compare the two identities

[x]_B= U⁻¹x and [x]_B= U^Tx

we suspect then that U⁻¹= U^T. This is indeed the case. To see this, let B = {u₁, u₂, . . . , u_n}

be an orthonormal basis for Rⁿand put

U = [u₁u₂· · · u_n].

Consider the matrix product U^TU, and recalling that u_i^•u_j= u^Tu_j, we obtain



_uT

 

 ^T

²

U U =

 

 

u u · · · u







u u

u^Tu₂

u^Tu₂· · · u u

· · · u u

. _.

u u

u^Tu₂· · · u u







= I_n.

161 of 202

160

Inner Products and Orthogonality

Therefore,

U⁻¹= U^T.

A matrix U ∈ Rⁿ^×ⁿsuch that

U^TU = UU^T= I_n

is called a orthogonal matrix. Hence, if B = {u₁, u₂, . . . , u_n} is an orthonormal set then

the matrix

U =u₁u₂· · ·

_u_n

is an orthogonal matrix.

Example 20.16. Consider the vectors

₁ ₋₁

v = 0 , v

1 2

   

= 4 , v

1 1 −2

 ₂

 ₁

−1

   

= 1 , x = 2 .

Show that {v₁, v₂, v₃} is an orthogonal basis for R³.
Then, if necessary, normalize the basis vectors v_ito obtain an orthonormal basis B =

{u₁, u₂, u₃} for R³.

For the given x find [x]_B.

Solution. (a) We compute that v₁^•v₂= 0, v₁^•v₃= 0, and v₂^•v₃= 0, and thus {v₁, v₂, v₃}

is an orthogonal set. Since orthogonal sets are linearly independent and {v₁, v₂, v₃}

1 2 3

consists of three vectors then {v , v , v } is basis for R .

√

(b) We compute that v₁= 2, v₂

√

= 18, and v₃= 3. Then let

u₁= ^0

^1/^√2^



−1/ 18

√

^√√

¹^/²1/ 18

 ^√

 

, u = 4/ 18 , u

2 3

= 1/3

−2/3

^2/3 ^

 

Then B = {u₁, u₂, u₃} is now an orthonormal set and thus since B consists of three vectors then B is an orthonormal basis of R³.

_u₁_•_x  _√₀

B 2

•

u₃^•x

[x] = ^u x= 2/ 18

5/3

   

Example 20.17. The standard unit basis

1 2 3

E = {e , e , e } =



0 , 1 , 0

     

^     ^

1 0 0

 



162 of 202

161

Lecture 20

in R³is an orthonormal basis. Given any x = (x₁, x₂, x₃), we have [x]_E= x. On the other hand, clearly

x₁= x ^•e₁x₂= x ^•e₂x₃= x ^•e₃

Example 20.18. (Orthogonal Complements) Let W be a subspace of Rⁿ. The orthogonal complement of W, which we denote by W^⊥, consists of the vectors in Rⁿthat are orthogonal to every vector in W. Using set notation:

W^⊥= {u ∈ Rⁿ: u ^•w = 0 for every w ∈ W}.

Show that W^⊥is a subspace.
Let w₁= (0, 1, 1, 0), let w₂= (1, 0, −1, 0), and let W = span{w₁, w₂}. Find a basis for

W^⊥.

Solution. (a) The vector 0 is orthogonal to every vector in Rⁿand therefore it is certainly orthogonal to every vector in W. Thus, 0 ∈ W^⊥. Now suppose that u₁, u₂are two vectors in W^⊥. Then for any vector w ∈ W it holds that

(u₁+ u₂) ^•w = u₁^•w + u₂^•w = 0 + 0 = 0.

Therefore, u₁+ u₂is also orthogonal to w and since w is an arbitrary vector in W then (u₁+ u₂) ∈ W^⊥. Lastly, let α be any scalar and let u ∈ W^⊥. Then for any vector w in W we have that

(αu) ^•w = α(u ^•w) = α · 0 = 0.

Therefore, αu is orthogonal to w and since w is an arbitrary vector in W then (αu) ∈ W^⊥.

This proves that W^⊥is a subspace of Rⁿ.

(b) A vector u = (u₁, u₂, u₃, u₃) is in W^⊥if u ^•w₁= 0 and u ^•w₂= 0. In other words, if

u₂+ u₃= 0

u₁− u₃= 0

This is a linear system for the unknowns u₁, u₂, u₃, u₄. The general solution to the linear system is

u = t

₁  ₀

1 −1

   

   

+ s .

0 0

Therefore, a basis for W^⊥is {(1, 0, 1, 0), (0, 1, −1, 0)}.

After this lecture you should know the following:

163 of 202

162

Inner Products and Orthogonality

how to compute inner products, norms, and distances
how to normalize vectors to unit length
what orthogonality is and how to check for it
what an orthogonal and orthonormal basis is
the advantages of working with orthonormal basis when computing coordinate vectors

164 of 202

Lecture 21

163

Lecture 21

Eigenvalues and Eigenvectors

21.1 Eigenvectors and Eigenvalues

An n × n matrix A can be thought of as the linear mapping that takes any arbitrary vector x ∈ Rⁿand outputs a new vector Ax. In some cases, the new output vector Ax is simply a scalar multiple of the input vector x, that is, there exists a scalar λ such that Ax = λx. This case is so important that we make the following definition.

Definition 21.1: Let A be a n × n matrix and let v be a non-zero vector. If Av = λv for some scalar λ then we call the vector v an eigenvector of A and we call the scalar λ an eigenvalue of A corresponding to v.

Hence, an eigenvector v of A is simply scaled by a scalar λ under multiplication by A. Eigenvectors are by definition nonzero vectors because A0 is clearly a scalar multiple of 0 and then it is not clear what that the corresponding eigenvalue should be.

Example 21.2. Determine if the given vectors v and u are eigenvectors of A? If yes, find the eigenvalue of A associated to the eigenvector.

−1

2 −1 8

_4 6

 

A = 2 1 6 ,

₋₃ ₋₁

 

v = 0 ,

 

u = 2 .

Solution. Compute

₄



−1

Av = 2

₆ ₋₃ ₋₆

1 6 0 = 0

2 −1 8 1

    

= 2 0

₋₃

 

= 2v

165 of 202

164

Eigenvalues and Eigenvectors

Hence, Av = 2v and thus v is an eigenvector of A with corresponding eigenvalue λ = 2. On the other hand,



−1

Au = 2

2 −1 8 1

_4 6 ₋₁ ₀

    

1 6 2 = 6 .

There is no scalar λ such that

₀ ₋₁

   

6 = λ 2 .

Therefore, u is not an eigenvector of A.

Example 21.3. Is v an eigenvector of A? If yes, find the eigenvalue of A associated to v:

A =

−1 2 −

−4 2 2

−1 −1^

 

 ₂₁

 

1 , v = 1 .

Solution. We compute

₀

 

Av = 0 = 0.

Hence, if λ = 0 then λv = 0 and thus Av = λv. Therefore, v is an eigenvector of A with corresponding eigenvalue λ = 0.

How does one find the eigenvectors/eigenvalues of a matrix A? The general procedure is to first find the eigenvalues of A and then for each eigenvalue find the corresponding eigenvectors. In this section, however, we will instead suppose that we have already found the eigenvalues of A and concern ourselves with finding the associated eigenvectors. Suppose then that λ is known to be an eigenvalue of A. How do we find an eigenvector v corresponding to the eigenvalue λ? To answer this question, we note that if v is to be an eigenvector of A with eigenvalue λ then v must satisfy the equation

Av = λv.

We can rewrite this equation as

Av − λv = 0

which, after using the distributive property of matrix multiplication, is equivalent to

(A − λI)v = 0.

The last equation says that if v is to be an eigenvector of A with eigenvalue λ then v must be in the null space of A − λI:

v ∈ Null(A − λI).

166 of 202

165

Lecture 21

In summary, if λ is known to be an eigenvalue of A, then to find the eigenvectors corre- sponding to λ we must solve the homogeneous system

(A − λI)x = 0.

Recall that the null space of any matrix is a subspace and for this reason we call the subspace Null(A − λI) the eigenspace of A corresponding to λ.

Example 21.4. It is known that λ = 4 is an eigenvalue of

−4 6

 ₃



A = ^1 7 9 .

8 −6 1

Find a basis for the eigenspace of A corresponding to λ = 4.

Solution. First compute



−4 6

A − 4I = ^1 7 9

8 −6 1

 

₃  ₄

 

 

−8 6

0 0

— 0 4 0 = 1 3 9

0 0 4 8 −6 −3

₃



Find a basis for the null space of A − 4I:



 ₃



R₁‡R₂





₉







₉

8R₁+R₂

−8R₁+R₃

 

 ₁

3 −−−−−−→ 0

₉



Finally,

 ₁

 

R₂+R₃



₉  ₁

0 30 75 −−−−→ 0

0 −30 −75 0

−8 6 1 3

1 3 9 −−−→ −8 6 3

8 −6 −3 8 −6 −3

1 3 3

−8 6 30

8 −6 −3 0 −30 −75

3 3

₉



x = t −5/2

Hence, the general solution to the homogenous system (A − 4I)x = 0 is

^−3/2^

 

where t is an arbitrary scalar. Therefore, the eigenspace of A corresponding to λ = 4 is





span −



^ ^

 ⁻³^/²

5/2 = span −

1 2

 

^ ^

 ⁻³

  

5 = span{v}

and {v} is a basis for the eigenspace. The vector v is of course an eigenvector of A with eigenvalue λ = 4 and also (of course) any multiple of v is also eigenvector of A with λ = 4.

167 of 202

166

Eigenvalues and Eigenvectors

Example 21.5. It is known that λ = 3 is an eigenvalue of



11 −4 −8 ^



A = ^4 1 −4 .

8 −4 −5

Find the eigenspace of A corresponding to λ = 3.

Solution. First compute



 

11 −4 −8 ^ 3 0

−4 − 0

A − 3I = ^4 1

8 −4 −5 0

 

₀  ₈

3 0 = 4

−4 −8 ^

−2 −4

0 3 8 −4 −8



Now find the null space of A − 3I:

 ₈



−4 −8 ^



R₁‡R₂

 ₄



4 −2 −4 −−−→ 8 −4 −8

8 −4 −8 8 −4 −8

−2 −4 ^



 ₄





⁻²⁻⁴^⁻²^R1 ⁺^R2

−2R₁+R₃

 ₄



8 −4 −8 −−−−−−→ 0 0 0

8 −4 −8 0 0 0

−2 −4 ^



Hence, any vector in the null space of

A − 3I =

 ₄



0 0 0

−2 −4 ^



0 0 0

₁ ₁

can be written as

1 2

x = t 0 + t 2

   

1 0

Therefore, the eigenspace of A corresponding to λ = 3 is

Null(A − 3I) = span

1 2

{v , v } = span 0 , 2

1 0

   

^   ^

1 1

 

 

The vectors v₁and v₂are two linearly independent eigenvectors of A with eigenvalue λ = 3. Therefore {v₁, v₂} is a basis for the eigenspace of A with eigenvalue λ = 3. You can verify that Av₁= 3v₁and Av₂= 3v₂.

As shown in the last example, there may exist more than one linearly independent eigen- vector of A corresponding to the same eigenvalue, in other words, it is possible that the dimension of the eigenspace Null(A − λI) is greater than one. What can be said about the eigenvectors of A corresponding to different eigenvalues?

168 of 202

167

Lecture 21

Theorem 21.6: Let v₁, . . . , v_kbe eigenvectors of A corresponding to distinct eigenvalues

λ₁, . . . , λ_kof A. Then {v₁, . . . , v_k} is a linearly independent set.

Solution. Suppose by contradiction that {v₁, . . . , v_k} is linearly dependent and {λ₁, . . . , λ_k} are distinct. Then, one of the eigenvectors v_p₊₁that is a linear combination of v₁, . . . , v_p, and {v₁, . . . , v_p} is linearly independent:

v_p₊₁= c₁v₁+ c₂v₂+ · · · + c_pv_p. (21.1) Applying A to both sides we obtain

Av_p₊₁= c₁Av₁+ c₂Av₂+ · · · + c_pAv_p

and since Av_i= λ_iv_iwe can simplify this to

λ_p₊₁v_p₊₁= c₁λ₁v₁+ c₂λ₂v₂+ · · · + c_pλ_pv_p.

On the other hand, multiply (21.1) by λ_p₊₁:

(21.2)

^λp+1^vp+1 ⁼^c1^λp+1^v1 ⁺^c2^λp+1^v2 ⁺^{· · ·}⁺^cp^vp^λp+1^.

Now subtract equations (21.2) and (21.3):

(21.3)

0 = c₁(λ₁− λ_p₊₁)v₁+ c₂(λ₂− λ_p₊₁)v₂+ · · · + c_p(λ_p− λ_p₊₁)v_p.

Now {v₁, . . . , v_p} is linearly independent and thus c_i(λ_i− λ_p₊₁) = 0. But the eigenvalues

{λ₁, . . . , λ_k} are all distinct and so we must have c₁= c₂= · · · = c_p= 0. But from (21.1) this implies that v_p₊₁= 0, which is a contradiction because eigenvectors are by definition non-zero. This proves that {v₁, v₂, . . . , v_k} is a linearly independent set.

Example 21.7. It is known that λ₁= 1 and λ₂= −1 are eigenvalues of

−4 6

 ₃



A = ^1 7 9 .

8 −6 1

Find bases for the eigenspaces corresponding to λ₁and λ₂and show that any two vectors from these distinct eigenspaces are linearly independent.

Solution. Compute

−5 6

A − λ₁I = ^1 6 9

8 −6 0

 ₃



and one finds that

(A − λ₁I) = span



−4

 

^ ^

 ⁻³



169 of 202

168

Eigenvalues and Eigenvectors

Hence, v₁= (−3, −4, 3) is an eigenvector of A with eigenvalue λ₁= 1, and {v₁} forms a basis for the corresponding eigenspace. Next, compute

−4 6

 

 ₃  ₁

A − λ₂I = ^1 7 9 + 0

0 0

1 0 =

8 −6 1 0 0 1

 

 

−3 6

₃

1 8 9

8 −6 2



and one finds that









−1

A − λ₂I = span −



 ^





Hence, v₂= (−1, −1, 1) is an eigenvector of A with eigenvalue λ₂= −1, and {v₂} forms a basis for the corresponding eigenspace. Now verify that v₁and v₂are linearly independent:





v₁v₂= −

−3 −1^



R₁+R₃

4 −1 −−−−→ −4 −1

^−3 −1^

 

3 1 0 0

The last matrix has rank r = 2, and thus v₁, v₂are indeed linearly independent.

When λ = 0 is an eigenvalue

What can we say about A if λ = 0 is an eigenvalue of A? Suppose then that A has eigenvalue

λ = 0. Then by definition, there exists a non-zero vector v such that

Av = 0 · v = 0.

In other words, v is in the null space of A. Thus, A is not invertible (Why?).

Theorem 21.8: The matrix A ∈ Rⁿ^×ⁿis invertible if and only if λ = 0 is not an eigenvalue of A.

In fact, later we will see that det(A) is the product of its eigenvalues.

After this lecture you should know the following:

what eigenvalues are
what eigenvectors are and how to find them when eigenvalues are known
the behavior of a discrete dynamical system when the initial condition is set to an eigenvector of the system matrix

170 of 202

Lecture 22

169

Lecture 22

The Characteristic Polynomial

22.1 The Characteristic Polynomial of a Matrix

Recall that a number λ is an eigenvalue of A ∈ Rⁿ^×ⁿif there exists a non-zero vector v such that

Av = λv

or equivalently if v ∈ Null(A − λI). In other words, λ is an eigenvalue of A if and only if the subspace Null(A − λI) contains a vector other than the zero vector. We know that any matrix M has a non-trivial null space if and only if M is non-invertible if and only if det(M) = 0. Hence, λ is an eigenvalue of A if and only if λ satisfies det(A − λI) = 0. Let’s compute the expression det(A − λI) for a generic 2 × 2 matrix:

det(A − λI) =

a − λ

^a12

^a21

a₂₂− λ

= (a₁₁− λ)(a₂₂− λ) − a₁₂a₂₂

= λ²− (a₁₁+ a₂₂)λ + a₁₁a₂₂− a₁₂a₂₂.

Thus, if A is 2 × 2 then

det(A − λI) = λ²− (a₁₁+ a₂₂)λ + a₁₁a₂₂− a₁₂a₂₂

is a polynomial in the variable λ of degree n = 2. This motivates the following definition.

Definition 22.1: Let A be a n × n matrix. The polynomial

p(λ) = det(A − λI) is called the characteristic polynomial of A.

171 of 202

170

The Characteristic Polynomial

In summary, to find the eigenvalues of A we must find the roots of the characteristic poly- nomial:

p(λ) = det(A − λI).

The following theorem asserts that what we observed for the case n = 2 is indeed true for all n.

Theorem 22.2: The characteristic polynomial p(λ) = det(A − λI) of a n × n matrix A

is an nth degree polynomial.

Solution. Recall that for the case n = 2 we computed that

11 11

det(A − λI) = λ − (a₁₁+ a₂₂)λ + a₁₁a₂₂− a₁₂a₂₂.

Therefore, the claim holds for n = 2. By induction, suppose that the claims hold for n ≥ 2. If A is a (n + 1) × (n + 1) matrix then expanding det(A − λI) along the first row:

det(A − λI) = (a − λ) det(A − λI) + (−

1+k

1) a det(A

1k 1k

— λI).

k=2

By induction, each of det(A₁_k−λI) is a nth degree polynomial. Hence, (a₁₁−λ) det(A₁₁−λI) is a (n + 1)th degree polynomial. This ends the proof.

Example 22.3. Find the characteristic polynomial of

A =

−2 4

−6 8

What are the eigenvalues of A?

Solution. Compute

A − λI =

−2 4

−6 8

—

λ 0

0 λ

−2 − λ 4

−6 8 − λ

Therefore,

p(λ) = det(A − λI)

−2 − λ 4

−6 8 − λ

= (−2 − λ)(8 − λ) + 24

= λ²− 6λ + 8

= (λ − 4)(λ − 2)

The roots of p(λ) are clearly λ₁= 4 and λ₂= 2. Therefore, the eigenvalues of A are λ₁= 4 and λ₂= 2.

172 of 202

171

Lecture 22

Example 22.4. Find the eigenvalues of



0 0 3

−4 −6 −7 ^



A = ^3 5 3 .

Solution. Compute

 





₌

−4 − λ

   

−4 −6 −7 λ 0 0

A − λI = ^3 5 3 − 0 λ

0 0 3 0 0 λ

−6 −7

3 5 − λ 3

0 0 3 − λ





Then

det(A − λI) = (−4 − λ)

5 − λ 3

— 3

−6 −7

−λ 3 − λ −λ 3 − λ

= (−4 − λ)[(3 − λ)(5 − λ) + 3λ] − 3[−6(3 − λ) − 7λ]

= λ³− 4λ²+ λ + 6

Factor the characteristic polynomial:

p(λ) = λ³− 4λ²+ λ + 6 = (λ − 2)(λ − 3)(λ + 1)

Therefore, the eigenvalues of A are

λ₁= 2, λ₂= 3, λ₃= −1.

Now that we know how to find eigenvalues, we can combine our work from the previous lecture to find both the eigenvalues and eigenvectors of a given matrix A.

Example 22.5. For each eigenvalue of A from Example 22.4, find a basis for the corre- sponding eigenspace.

Solution. Start with λ₁= 2:

A − 2I = ^3 3 3

0 0 1



−6 −6 −7 ^



After basic row reduction and back substitution, one finds that the null space of A − 2I is

spanned by

v =

₋₁

 

1 .

173 of 202

172

The Characteristic Polynomial

Therefore, v₁is an eigenvector of A with eigenvalue λ₁. For λ₂= 3:

A − 3I = ^3 2 3

0 0 0



−7 −6 −7 ^



The null space of A − 3I is spanned by

₋₁

v = 0

 

and therefore v₂is an eigenvector of A with eigenvalue λ₂. Finally, for λ₃= −1 we compute

A − λ₃I = ^3 6 3

0 0 4



−3 −6 −7 ^



and the null space of A − λ₃I is spanned by

₋₂

v = 1

 

and therefore v₃is an eigenvector of A with eigenvalue λ₃. Notice that in this case, the 3 × 3 matrix A has three distinct eigenvalues and the eigenvectors

^_₋₁ ₋₁ ₋₂_^

1 2 3

{v , v , v } =



1 , 0 , 1

0 1 0

     



correspond to the distinct eigenvalues λ₁, λ₂, λ₃, respectively. Therefore, the set β = {v₁, v₂, v₃} is linearly independent (by Theorem 21.6), and therefore β is a basis for R³. You can verify, for instance, that det([v₁v₂v₃]) /= 0.

By Theorem 21.6, the previous example has the following generalization.

Theorem 22.6: Suppose that A is a n × n matrix and has n distinct eigenvalues λ₁, λ₂, . . . , λ_n. Let v_ibe an eigenvector of A corresponding to λ_i. Then {v₁, v₂, . . . , v_n} is a basis for Rⁿ.

Hence, if A has distinct eigenvalues, we are guaranteed the existence of a basis of Rⁿconsisting of eigenvectors of A. In forthcoming lectures, we will see that it is very convenient to work with matrices A that have a set of eigenvectors that form a basis of Rⁿ; this is one of the main motivations for studying eigenvalues and eigenvectors in the first place. However, we will see that not every matrix has a set of eigenvectors that form a basis of Rⁿ. For example, what if A does not have n distinct eigenvalues? In this case, does there exist a

174 of 202

173

Lecture 22

basis for Rⁿof eigenvectors of A? In some cases, the answer is yes as the next example demonstrates.

Example 22.7. Find the eigenvalues of A and a basis for each eigenspace.

2 0

A = ^4 2 2

−2 0 1

 ₀



Does R³have a basis of eigenvectors of A?

Solution. The characteristic polynomial of A is

p(λ) = det(A − λI) = λ³− 5λ²+ 8λ − 4 = (λ − 1)(λ − 2)²

and therefore the eigenvalues are λ₁= 1 and λ₂= 2. Notice that although p(λ) is a polynomial of degree n = 3, it has only two distinct roots and hence A has only two distinct eigenvalues. The eigenvalue λ₂= 2 is said to be repeated and λ₁= 1 is said to be a simple eigenvalue. For λ₁= 1 one finds that the eigenspace Null(A − λ₁I) is spanned by

 ₀

v = −

 

and thus v₁is an eigenvector of A with eigenvalue λ₁= 1. Now consider λ₂= 2:

0 0

A − 2I = ^4 0 2

−2 0 −1

 ₀



Row reducing A − 2I one obtains



A − 2I = ^

 

0 −1 0 0 0



0 0 0 ^ −2 0 −1^

4 0 2 ~ 0 0 0 .

−2

Therefore, rank(A − 2I) = 1 and thus by the Rank Theorem it follows that Null(A − 2I) is

a 2-dimensional eigenspace. Performing back substitution, one finds the following basis for the λ₂-eigenspace:

2 3

{v , v } =







−1 0

0 , 1

2 0

  

^   ^





Therefore, the eigenvectors

1 2 3







{v , v , v } = −

2 , 0 , 1

1 2 0

    

^     ^

⁻^1 0



form a basis for R³. Hence, for the repeated eigenvalue λ₂= 2 we were able to find two

linearly independent eigenvectors.

175 of 202

174

The Characteristic Polynomial

Before moving further with more examples, we need to introduce some notation regard- ing the factorization of the characteristic polynomial. In the previous Example 22.7, the characteristic polynomial was factored as p(λ) = (λ − 1)(λ − 2)²and we found a basis for R³of eigenvectors despite the presence of a repeated eigenvalue. In general, if p(λ) is an nth degree polynomial that can be completely factored into linear terms, then p(λ) can be written in the form

p(λ) = (λ − λ₁)^k¹(λ − λ₂)^k²· · · (λ − λ_p)^k^p

where k₁, k₂, . . . , k_pare positive integers and the roots of p(λ) are then λ₁, λ₂, . . . , λ_k. Because p(λ) is of degree n, we must have that k₁+k₂+· · · +k_p= n. Motivated by this, we introduce the following definition.

Definition 22.8: Suppose that A ∈ M_n_×_nhas characteristic polynomial p(λ) that can be factored as

p(λ) = (λ − λ₁)^k¹(λ − λ₂)^k²· · · (λ − λ_p)^k^p.

The exponent k_iis called the algebraic multiplicity of the eigenvalue λ_i. The dimension Null(A − λ_iI) of the eigenspace associated to λ_iis called the geometric multiplicity of λ_i.

For simplicity and whenever it is convenient, we will denote the geometric multiplicity of the eigenvalue λ_ias

g_i= dim(Null(A − λ_iI)).

Example 22.9. A 6 × 6 matrix A has characteristic polynomial

p(λ) = λ⁶− 4λ⁵− 12λ⁴.

Find the eigenvalues of A and their algebraic multiplicities.

Solution. Factoring p(λ) we obtain

p(λ) = λ⁴(λ²− 4λ − 12) = λ⁴(λ − 6)(λ + 2)

Therefore, the eigenvalues of A are λ₁= 0, λ₂= 6, and λ₃= −2. Their algebraic multiplic- ities are k₁= 4, k₂= 1, and k₃= 1, respectively. The eigenvalue λ₁= 0 is repeated, while λ₂= 6 and λ₃= −2 are simple eigenvalues.

In Example 22.7, we had p(λ) = (λ−1)(λ−2)²and thus λ₁= 1 has algebraic multiplicity k₁= 1 and λ₂= 2 has algebraic multiplicity k₂= 2. For λ₁= 1, we found one linearly independent eigenvector, and therefore λ₁has geometric multiplicity g₁= 1. For λ₁= 2, we found two linearly independent eigenvectors, and therefore λ₂has geometric multiplicity g₂= 2. However, as we will see in the next example, the geometric multiplicity g_iis in general less than the algebraic multiplicity k_i:

g_i≤ k_i

176 of 202

175

Lecture 22

Example 22.10. Find the eigenvalues of A and a basis for each eigenspace:



A = −4 −6 −

 _2 3



3 3 1

For each eigenvalue of A, find its algebraic and geometric multiplicity. Does R³have a basis of eigenvectors of A?

Solution. One computes

p(λ) = −λ³− 3λ²+ 4 = −(λ − 1)(λ + 2)²

and therefore the eigenvalues of A are λ₁= 1 and λ₂= −2. The algebraic multiplicity of λ₁

is k₁= 1 and that of λ₂is k₂= 2. For λ₁= 1 we compute



A − I = −

4 −7 −3

3 3 0

 _1 3



and then one finds that

v = −

 ₁

 

is a basis for the λ₁-eigenspace. Therefore, the geometric multiplicity of λ₁is g₁=. For

λ₂= −2 we compute

A − λ I =

−4 −4 −

  

 

 _4 3 _4 3 ₁

4 4 1

3 ~ 1 1 1 ~ 0 0 1

₁



3 3 3 0 0 0 0 0 0

Therefore, since rank(A − λ₂I) = 2, the geometric multiplicity of λ₂= −2 is g₂= 1, which

v = 1

is less than the algebraic multiplicity k₂= 2. An eigenvector corresponding to λ₂= −2 is

^−1^

 

Therefore, for the repeated eigenvalue λ₂= −2, we are able to find only one linearly inde-

pendent eigenvector. Therefore, it is not possible to construct a basis for R³consisting of eigenvectors of A.

Hence, in the previous example, there does not exist a basis of R³of eigenvectors of A because for one of the eigenvalues (namely λ₂) the geometric multiplicity was less than the algebraic multiplicity:

g₂< d₂.

In the next lecture, we will elaborate on this situation further.

−7

Example 22.11. Find the algebraic and geometric multiplicities of each eigenvalue of the matrix

A = ^0

₀

^−7 1

−7 1 ^.

177 of 202

176

The Characteristic Polynomial

Eigenvalues and Similarity Transformations

To end this lecture, we will define a notion of similarity between matrices that plays an important role in linear algebra and that will be used in the next lecture when we dis- cuss diagonalization of matrices. In mathematics, there are many cases where one is inter- ested in classifying objects into categories or classes. Classifying mathematical objects into classes/categories is similar to how some physical objects are classified. For example, all fruits are classified into categories: apples, pears, bananas, oranges, avocados, etc. Given a piece of fruit A, how do you decide what category it is in? What are the properties that uniquely classify the piece of fruit A? In linear algebra, there are many objects of interest. We have spent a lot of time working with matrices and we have now reached a point in our study where we would like to begin classifying matrices. How should we decide if matrices A and B are of the same type or, in other words, are similar? Below is how we will decide.

Definition 22.12: Let A and B be n × n matrices. We will say that A is similar to B

if there exists an invertible matrix P such that

A = PBP⁻¹.

If A is similar to B then B is similar to A because from the equation A = PBP⁻¹we can multiply on the left by P⁻¹and on the right by P to obtain that

P⁻¹AP = B.

Hence, with Q = P⁻¹, we have that B = QAQ⁻¹and thus B is similar to A. Hence, if A is similar to B then B is similar to A and therefore we simply say that A and B are similar. Matrices that are similar are clearly not necessarily equal. However, there is a reason why the word similar is used. Here are a few reasons why.

Theorem 22.13: If A and B are similar matrices then the following are true:

rank(A) = rank(B)
det(A) = det(B)
A and B have the same eigenvalues

Proof. We will prove part (c). If A and B are similar then A = PAP⁻¹for some matrix P.

Then

det(A − λI) = det(A − λPP⁻¹)

= det(PBP⁻¹− λPP⁻¹)

= det(P(B − λI)P⁻¹)

= det(P) det(B − λI) det(P⁻¹)

= det(B − λI)

178 of 202

177

Lecture 22

Thus, A and B have the same characteristic polynomial, and hence the same eigenvalues.

In the next lecture, we will see that if Rⁿhas a basis of eigenvectors of A then A is similar to a diagonal matrix.

After this lecture you should know the following:

what the characteristic polynomial is and how to compute it
how to compute the eigenvalues of a matrix

that when a matrix A has distinct eigenvalues, we are guaranteed a basis of R con-

sisting of the eigenvectors of A

that when a matrix A has repeated eigenvalues, it is still possible that there exists a basis of Rⁿconsisting of the eigenvectors of A
what is the algebraic multiplicity and geometric multiplicity of an eigenvalue
that eigenvalues of a matrix do not change under similarity transformations

179 of 202

178

The Characteristic Polynomial

180 of 202

Lecture 23

Diagonalization

A = 0

^a22 23

0 0

^a33

23.1 Eigenvalues of Triangular Matrices

Before discussing diagonalization, we first consider the eigenvalues of triangular matrices.

Theorem 23.1: Let A be a triangular matrix (either upper or lower). Then the eigen- values of A are its diagonal entries.

Proof. We will prove the theorem for the case n = 3 and A is upper triangular; the general case is similar. Suppose then that A is a 3 × 3 upper triangular matrix:

^^a11 ^a12 ^a13^

 

Then



a − λ

^a12

^a13 ^a23

A − λI = ^0 a₂₂− λ

0 a₃₃− λ



 _.

and thus the characteristic polynomial of A is

p(λ) = det(A − λI) = (a₁₁− λ)(a₂₂− λ)(a₃₃− λ) and the roots of p(λ) are

λ₁= a₁₁, λ₂= a₂₂, λ₃= a₃₃.

In other words, the eigenvalues of A are simply the diagonal entries of A.

Example 23.2. Consider the following matrix

_₆





0	0	0
0	0	0
0	7	0
0	0	−4
−2	3	0

₀

−1

A = 0

−1





0 .

181 of 202

180

Diagonalization

Find the characteristic polynomial and the eigenvalues of A.
Find the geometric and algebraic multiplicity of each eigenvalue of A.

We now introduce a very special type of a triangular matrix, namely, a diagonal matrix.

Definition 23.3: A matrix D whose off-diagonal entries are all zero is called a diagonal matrix.

For example, here is 3 × 3 diagonal matrix

₃



D = 0

₀



−5 0 .

0 0 −8

and here is a 5 × 5 diagonal matrix





₆







0 0 0 0

0 0 0 0 0

D = ^0 0 − ⁷0 0 .

0 0 0 2 0

0 0 0 0 −

A diagonal matrix is clearly also a triangular matrix and therefore the eigenvalues of a

diagonal matrix D are simply the diagonal entries of D. Moreover, the powers of a diagonal

matrix are easy to compute. For example, if D =

λ 0

0 λ₂

then

D =

1 1

λ 0 λ 0

0 λ₂0 λ₂

λ 0

0 λ₂

and similarly for any integer k = 1, 2, 3, . . ., we have that

D =

λ 0

_λk

23.2 Diagonalization

Recall that two matrices A and B are said to be similar if there exists an invertible matrix

P such that

A = PBP⁻¹.

A very simple type of matrix is a diagonal matrix since many computations with diagonal matrices are trivial. The problem of diagonalization is thus concerned with answering the question of whether a given matrix is similar to a diagonal matrix. Below is the formal definition.

182 of 202

181

Lecture 23

Definition 23.4: A matrix A is called diagonalizable if it is similar to a diagonal matrix

D. In other words, if there exists an invertible P such that

A = PDP⁻¹.

How do we determine when a given matrix A is diagonalizable? Let us first determine what

conditions need to be met for a matrix A to be diagonalizable. Suppose then that A is diag-

v v₂· · ·

_v_n

onalizable. Then by Definition 23.4, there exists an invertible matrix P = and a diagonal matrix





D =





. .

λ 0 . . . 0

0 λ₂. . . 0

. _{. .}

0 0 . . . λ_n







such that A = PDP⁻¹. Multiplying on the right both sides of the equation A = PDP⁻¹

by the matrix P we obtain that

AP = PD.

Now

while on the other hand

PD = λ v

1 1

λ₂v₂· · ·

AP =Av₁Av₂· · · Av_n

λ_nv_n.

Therefore, since it holds that AP = PD then

Av₁Av₂· · · Av_n=λ₁v₁λ₂v₂· · · λ_nv_n.

or if we compare columns we must have that

Av_i= λ_iv_i.

Thus, the columns v₁, v₂, . . . , v_nof P are eigenvectors of A and form a basis for Rⁿbecause P is invertible. In conclusion, if A is diagonalizable then Rⁿhas a basis consisting of eigenvectors of A.

Suppose instead that {v₁, v₂, . . . , v_n} is a basis of Rⁿconsisting of eigenvectors of A. Let

λ₁, λ₂, . . . , λ_nbe the eigenvalues of A associated to v₁, v₂, . . . , v_n, respectively, and set

P =v₁v₂· · · v_n.

Then P is invertible because {v₁, v₂, . . . , v_n} are linearly independent. Let

D =







. .

λ 0 . . . 0

0 λ₂. . . 0

. _{. .}

0 0 . . . λ_n







183 of 202

182

Diagonalization

Now, since Av_i= λ_iv_iwe have that

AP = Av₁v₂· · ·

_v_n

=Av₁Av₂· · · Av_n

=λ₁v₁λ₂v₂· · · λ_nv_n.

n n

Therefore, AP =λ₁v₁λ₂v₂· · · λ v . On the other hand,

PD =v₁v₂· · · v







. .

λ 0 . . . 0

λ₂. . . 0

. _{. .}

0 0 . . . λ_n







1 1 n n

= λ v λ₂v₂· · · λ v .

Therefore, AP = PD, and since P is invertible we have that

A = PDP⁻¹.

Thus, if Rⁿhas a basis of consisting of eigenvectors of A then A is diagonalizable. We have therefore proved the following theorem.

Theorem 23.5: A matrix A is diagonalizable if and only if there is a basis {v₁, v₂, . . . , v_n}

of Rⁿconsisting of eigenvectors of A.

The punchline with Theorem 23.5 is that the problem of diagonalization of a matrix A is equivalent to finding a basis of Rⁿconsisting of eigenvectors of A. We will see in some of the examples below that it is not always possible to diagonalize a matrix.

23.3 Conditions for Diagonalization

We first consider the simplest case when we conclude that a given matrix is diagonalizable, namely, the case when all eigenvalues are distinct.

Theorem 23.6: Suppose that A ∈ Rⁿ^×ⁿhas n distinct eigenvalues λ₁, λ₂, . . . , λ_n. Then

A is diagonalizable.

Proof. Each eigenvalue λ_iproduces an eigenvector v_i. The set of eigenvectors {v₁, v₂, . . . , v_n} are linearly independent because they correspond to distinct eigenvalues (Theorem 21.6). Therefore, {v₁, v₂, . . . , v_n} is a basis of Rⁿconsisting of eigenvectors of A and then by Theorem 23.5 we conclude that A is diagonalizable.

What if A does not have distinct eigenvalues? Can A still be diagonalizable? The following theorem completely answers this question.

184 of 202

183

Lecture 23

Theorem 23.7: A matrix A is diagonalizable if and only if the algebraic and geometric multiplicities of each eigenvalue are equal.

Proof. Let A be a n × n matrix and let λ₁, λ₂, . . . , λ_pdenote the distinct eigenvalues of A. Suppose that k₁, k₂, . . . , k_pare the algebraic multiplicities and g₁, g₂, . . . , g_pare the geometric multiplicities of the eigenvalues, respectively. Suppose that the algebraic and geometric multiplicities of each eigenvalue are equal, that is, suppose that g_i= k_ifor each i = 1, 2 . . . , p. Since k₁+k₂+· · ·+k_p= n, then because g_i= k_iwe must also have that g₁+g₂+· · ·+g_p= n. Therefore, there exists n linearly eigenvectors of A and consequently A is diagonalizable. On the other hand, suppose that A is diagonalizable. Since the geometric multiplicity is at most the algebraic multiplicity, the only way that g₁+ g₂+ · · · + g_p= n is if g_i= k_i, i.e., that the geometric and algebraic multiplicities are equal.

Example 23.8. Determine if A is diagonalizable. If yes, find a matrix P that diagonalizes



A = ^3 5 3

0 0 3

−4 −6 −7 ^



Solution. The characteristic polynomial of A is

p(λ) = det(A − λI) = (λ − 2)(λ − 3)(λ + 1)

and therefore λ₁= 2, λ₂= 3, and λ₃= −1 are the eigenvalues of A. Since A has n = 3 distinct eigenvalues, then by Theorem 23.6 A is diagonalizable. Eigenvectors v₁, v₂, v₃corresponding to λ₁, λ₂, λ₃are found to be

₋₁ ₋₁ ₋₂

v =

1 , v = 0

   

, v = 1

 



P = 1 0 1

0 1 0

Therefore, a matrix that diagonalizes A is

^−1 −2 −2^



You can verify that



_λ₁

P 0

λ₂

0 0 λ₃

₀



−1

0 P = A

The following example demonstrates that it is possible for a matrix to be diagonalizable even though the matrix does not have distinct eigenvalues.

185 of 202

184

Diagonalization

Example 23.9. Determine if A is diagonalizable. If yes, find a matrix P that diagonalizes

2 0

A = ^4 2 2

−2 0 1

 ₀



Solution. The characteristic polynomial of A is

p(λ) = det(A − λI) = (λ − 1)(λ − 2)

and therefore λ₁= 1, λ₂= 2. An eigenvector corresponding to λ₁= 1 is

v = −

 ₀

 

One finds that g₂= dim(Null(A − λ₂I)) = 2, and two linearly independent eigenvectors for

λ₂are

2 3

{v , v } =



0 , 1

2 0

   

^   ^

 ⁻^1 0



Therefore, A is diagonalizable, and a matrix that diagonalizes A is

0 −1

P = v₁v₂v₃= −

2 0 1

1 2 0

 ₀

 

You can verify that

_λ₁



P 0

₀

0 0 λ₃



−1

λ₂0 P = A

Example 23.10. Determine if A is diagonalizable. If yes, find a matrix P that diagonalizes



A = −

4 −6 −3

 _2 3



3 3 1

Solution. The characteristic polynomial of A is

3 2 2

p(λ) = det(A − λI) = −λ − 3λ + 4 = −(λ − 1)(λ + 2)

and therefore the eigenvalues of A are λ₁= 1 and λ₂= −2. For λ₂= −2 one computes

A − λ I ~

0 0 1

_1 1

 

0 0 0

We see that the dimension of the eigenspace of λ₂= −2 is g₂= 1, which is less than the

algebraic multiplicity k₂= 2. Therefore, from Theorem 23.7 we can conclude that it is not possible to construct a basis of eigenvectors of A, and therefore A is not diagonalizable.

186 of 202

185

Lecture 23

Example 23.11. Suppose that A has eigenvector v with corresponding eigenvalue λ. Show

that if A is invertible then v is an eigenvector of A⁻¹with corresponding eigenvalue ¹.

Example 23.12. Suppose that A and B are n × n matrices such that AB = BA. Show that if v is an eigenvector of A with corresponding eigenvalue then v is also an eigenvector of B with corresponding eigenvalue λ.

After this lecture you should know the following:

Determine if a matrix is diagonalizable or not
Find the algebraic and geometric multiplicities of an eigenvalue
Apply the theorems introduced in this lecture

187 of 202

186

Diagonalization

188 of 202

Lecture 24

187

Lecture 24

Diagonalization of Symmetric Matrices

24.1 Symmetric Matrices

Recall that a square matrix A is said to be symmetric if A^T= A. As an example, here is

a 3 × 3 symmetric matrix:

1 −3 7

A = −

 

 

3 2 8 .

7 8 4

Symmetric matrices are ubiquitous in mathematics. For example, let f(x₁, x₂, . . . , x_n) be a function having continuous second order partial derivatives. Then Clairaut’s Theorem from multivariable calculus says that

∂f ∂f

= .

∂x_i∂x_j∂x_j∂x_i

Therefore, the Hessian matrix of f is symmetric:

Hess(f) =





_

∂f

∂x₁∂x₁

∂f

∂x₁∂x₂

· · ·

∂f

∂x₁∂x_n

∂f

∂x₂∂x_n

∂f

∂x₂∂x₁

∂f

∂x₂∂x₂

· · ·

. .

∂f ∂f

∂x_n∂x₁∂x_n∂x₂

. _{. .}

· · ·

∂f

^∂xn^∂xn





_

The Second Derivative Test of multivariable calculus then says that if P = (a₁, a₂, . . . , a_n) is a critical point of f, that is

∂f ∂f ∂f

1 2 n

(P ) = (P ) = · · · = (P ) = 0

∂x ∂x ∂x

then

P is a local minimum point of f if the matrix Hess(f) has all positive eigenvalues,
P is a local maximum point of f if the matrix Hess(f) has all negative eigenvalues, and

189 of 202

188

Diagonalization of Symmetric Matrices

(iii) P is a saddle point of f if the matrix Hess(f) has negative and positive eigenvalues.

In general, the eigenvalues of a matrix with real entries can be complex numbers. For example, the matrix

A =

0 −1

1 0

has characteristic polynomial

p(λ) = λ²+ 1

the roots of which are clearly λ₁= ^√−1 = i and λ₂= −^√−1 = −i. Thus, in general, a matrix whose entries are all real numbers may have complex eigenvalues. However, for symmetric matrices we have the following.

Theorem 24.1: If A is a symmetric matrix then all of its eigenvalues are real numbers. The proof is easy but we will omit it.

24.2 Eigenvectors of Symmetric Matrices

We proved earlier that if {v₁, v₂, . . . , v_k} are eigenvectors of a matrix A corresponding to distinct eigenvalues λ₁, λ₂, . . . , λ_kthen the set {v₁, v₂, . . . , v_k} is linearly independent (The- orem 21.6). For symmetric matrices we can say even more as the next theorem states.

Theorem 24.2: Let A be a symmetric matrix. If v₁and v₂are eigenvectors of A

corresponding to distinct eigenvalues then v₁and v₂are orthogonal, that is, v₁^•v₂= 0.

•

1 2

Proof. Recall that v v = v v . Let λ = λ

1 2

be the eigenvalues associated to v₁and v₂.

Then

λ₁v^Tv₂= (λ₁v₁)^Tv₂

= (Av₁)^Tv₂

= v^TA^Tv₂

= v^TAv₂

= v^T(λ₂v₂)

= λ₂v^Tv₂.

1 2

1 1 1

2 2 1 2 2 1 2

Therefore, λ v v = λ v v which implies that (λ − λ )v v = 0. But since (λ − λ ) = 0

then we must have v^Tv₂= 0, that is, v₁and v₂are orthogonal.

24.3 Symmetric Matrices are Diagonalizable

As we have seen, the main criteria for diagonalization is that for each eigenvalue the geometric and algebraic multiplicities are equal; not all matrices satisfy this condition and thus not

190 of 202

189

Lecture 24

all matrices are diagonalizable. As it turns out, any symmetric A is diagonalizable and moreover (and perhaps more importantly) there exists an orthogonal eigenvector matrix P that diagonalizes A. The full statement is below.

Theorem 24.3: If A is a symmetric matrix then A is diagonalizable. In fact, there is an orthonormal basis of Rⁿof eigenvectors {v₁, v₂, . . . , v_n} of A. In other words, the matrix P = [v₁v₂· · · v_n] is orthogonal, P^TP = I, and A = PDP^T.

The proof of the theorem is not hard but we will omit it. The punchline of Theorem 24.3 is that, for the case of a symmetric matrix, we will never encounter the situation where the geometric multiplicity is strictly less than the algebraic multiplicity. Moreover, we are guaranteed to find an orthogonal matrix that diagonalizes a given symmetric matrix.

Example 24.4. Find an orthogonal matrix P that diagonalizes the symmetric matrix



 

1 0 −1

A = ^0 1 1 .

−1 1 2

Solution. The characteristic polynomial of A is

p(λ) = det(A − λI) = λ³− 4λ²+ 3λ = λ(λ − 1)(λ − 3)

The eigenvalues of A are λ₁= 0, λ₂= 1 and λ₃= 3. Eigenvectors of A associated to

λ₁, λ₂, λ₃are

 ₁ ₁ ₋₁

u = −

   

2 3

 

1 , u = 1 , u = 1 .

1 0 2

As expected by Theorem 24.2, the eigenvectors u₁, u₂, u₃form an orthogonal set:

u^Tu₂= 0, u^Tu₃= 0, u^Tu₃= 0.

1 1 2

To find an orthogonal matrix P that diagonalizes A we must normalize the eigenvectors

1 2 3

u , u , u to obtain an orthonormal basis

1 2 3

{v , v , v }. To that end, first compute u u = 3,

T T

2 3

2 3 1

√

1 1

√

u u = 2, and u u = 6. Then let v = u , let v = u , and let v = u . Therefore,

an orthogonal matrix that diagonalizes A is

P =v₁v₂v =





√

1 1

√

—

√

 

—

√





You can easily verify that P^TP = I, and that

₀



A = P 0

₀

0 0 3



1 0 P

191 of 202

190

Diagonalization of Symmetric Matrices

Example 24.5. Let A and B be n × n matrices. Show that if A is symmetric then the matrix C = BAB^Tis also a symmetric matrix.

After this lecture you should know the following:

a symmetric matrix is diagonalizable with an orthonormal set of eigenvectors

192 of 202

Lecture 25

The PageRank Algortihm

In this lecture, we will see how linear algebra is used in Google’s webpage ranking algorithm used in everyday Google searches.

25.1 Search Engine Retrieval Process

Search engines perform a two-stage process to retrieve search results¹. In Stage 1, traditional text processing is used to find all relevant pages (e.g. keywords in title, body) and produces a content score. After Stage 1, there is a large amount of relevant pages. For example, the query “symmetric matrix ” results in about 3,830,000 pages (03/31/15). Or “homework help” results in 49,400,000 pages (03/31/15). How should the relevant pages be displayed? In Stage 2, the pages are sorted and displayed based on a pre-computed ranking that is query-independent, this is the popularity score. The ranking is based on the hyperlinked or networked structure of the web, and the ranking is based on a popularity contest; if many pages link to page P_ithen P_imust be an important page and should therefore have a high popularity score.

In January 1998, John Kleinberg from IBM (now a CS professor at Cornell) presented the HITS algorithm²(e.g., www.teoma.com). At Stanford, doctoral students Sergey Brin and Larry Page were busy working on a similar project which they had begun in 1995. Below is the abstract of their paper³:

“In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/ .”

¹A.N. Langville and C.D. Meyer, Google’s PageRank and Beyond, Princeton University Press, 2006

²J. Kleinberg, Authoritative sources in a hyperlinked environment, Journal of ACM, 46, 1999, 9th ACM- SIAM Symposium on Discrete Algorithms

³S. Brin and L. Page, The anatomy of a large-scale hypertextual Web search engine, Computer Networks and ISDN Systems, 33:107-117, 1998

193 of 202

192

The PageRank Algortihm

In both models, the web is defined as a directed graph, where the nodes represent webpages and the directed arcs represent hyperlinks, see Figure 25.1.

Figure 25.1: A tiny web represented as a directed graph.

25.2 A Description of the PageRank Algorithm

In the PageRank algorithm, each inlink is viewed as a recommendation (or vote). In general, pages with many inlinks are more important than pages with few inlinks. However, the quality of the inlink (vote) is important. The vote of each page should be divided by the total number of recommendations made by the page. The PageRank of page i, denoted x_i, is the sum of all the weighted PageRanks of all the pages pointing to i:

x =

_j_→_i|N_j|

where

N_jis the number of outlinks from page j
j → i means page j links to page i

Example 25.1. Find the PageRank of each page for the network in Figure 25.1.

From the previous example, we see that the PageRank of each page can be found by solving an eigenvalue/eigenvector problem. However, when dealing with large networks such as the internet, the size of the problem is in the billions (8.1 billion in 2006) and directly solving the equations is not possible. Instead, an iterative method called the power method

4 4 4 4

is used. One starts with an initial guess, say x₀= (¹, ¹, ¹, ¹). Then one updates the guess

by computing

x₁= Hx₀.

In other words, we have a discrete dynamical system

x_k₊₁= Hx_k.

A natural question is under what conditions will the the limiting value of the sequence

lim x_k= lim (H^kx₀) = q

k→∞ k→∞

194 of 202

193

k→∞

Lecture 25

converge to an equilibrium of H? Also, if lim x_kexists, will it be a positive vector? And

lastly, can x₀/= 0 be chosen arbitrarily?

To see what situations may occur, consider the

1 1

5 5

network displayed in Figure 25.2. Starting with x = ( , . . . , ) we obtain that for k ≥ 39,

the vectors x_k= H^kx₀cycle between (0, 0, 0, 0.28, 0.40) and (0, 0, 0, 0.40, 0.28). Therefore, the sequence x₀, x₁, x₂, . . . does not converge. The reason for this is that nodes 4 and 5 form a cycle.

2 ₃

0 0



H = 0

0 0

^0 0^

0 0



 3



0 0

₀



¹¹0 1

3 2 ^

 

0 0 0 1 0

Figure 25.2: Cycles present in the network

Now consider the network displayed in Figure 25.3. If we remove the cycle we are still left with a dangling node, namely node 1 (e.g. pdf file, image file). Starting with x₀= (¹, . . . , ¹) results in

5 5

lim x_k= 0.

k→∞

Therefore, in this case the sequence x₀, x₁, x₂, . . . converges to a non-positive vector, which for the purposes of ranking pages would be an undesirable situation.

2 ₃





H = ^0



₀

0 0



0 0

₀

1 1

2 2



0 0 0

1 1

3 2

0 0 0



0 1





Figure 25.3: Dangling node present in the network

To avoid the presence of dangling nodes and cycles, Brin and Page used the notion of a random surfer to adjust H. To deal with a dangling node, Brin and Page replaced

n n n n

the associated zero-column with the vector ¹1 = ( ¹, ¹, . . . , ¹). The justification for this

adjustment is that if a random surfer reaches a dangling node, the surfer will “teleport” to any page in the web with equal probability. The new updated hyperlink matrix H^∗may still not have the desired properties. To deal with cycles, a surfer may abandon the hyperlink structure of the web by ocassionally moving to a random page by typing its address in the

195 of 202

194

The PageRank Algortihm

browser. With these adjustments, a random surfer now spends only a proportion of his time using the hyperlink structure of the web to visit pages. Hence, let 0 < α < 1 be the proportion of time the random surfer uses the hyperlink structure. Then the transition matrix is

^∗1

G = αH + (1 − α) J.

The matrix G goes by the name of the Google matrix, and it is reported that Google uses α = 0.85 (here J is the all ones matrix). The Google matrix G is now a primitive and stochastic matrix. Stochastic means that all its columns are probability vectors, i.e., non- negative vectors whose components sum to 1. Primitive means that there exists k ≥ 1 such that G^khas all positive entries (k = 1 in our case). With these definitions, we now have the following theorem.

Theorem 25.2: If G is a primitive stochastic matrix then:

There is a stochastic G^∗such that lim_k_→∞G^k= G^∗.
G^∗=q q · · · qwhere q is a probability vector.
For any probability vector q₀we have lim_k_→∞G^kq₀= q.
The vector q is the unique probability vector which is an eigenvector of G with eigenvalue λ₁= 1.
All other eigenvalues λ₂, . . . , λ_nhave |λ_j| < 1.

Proof. We will prove a special case⁴. Assume for simplicity that G is positive (this is the case of the Google Matrix). If x = Gx, and x has mixed signs, then

|x | =

j=1

ij j

G x < G_ij|x_j|.

j=1

Then

|x | <

n n

Σ Σ

i ij j j

G |x | = |x |

i=1 i=1 j=1 j=1

which is a contradiction. Therefore, all the eigenvectors in the λ₁= 1 eigenspace are either negative or positive. One then shows that the eigenspace corresponding to λ₁= 1 is 1- dimensional. This proves that there is a unique probability vector q such that

q = Gq.

⁴K. Bryan, T. Leise, The $25,000,000,000 Eigenvector: The Linear Algebra Behind Google, SIAM Review, 48(3), 569-581

196 of 202

195

Lecture 25

Let λ₁, λ₂, . . . , λ_nbe the eigenvalues of G. We know that λ₁= 1 is a dominant eigenvalue:

|λ₁| > |λ_j|, j = 2, 3, . . . , n.

Let q₀be a probability vector and let q be as above, and let v₂, . . . , v_nbe the remaining eigenvectors of G. Then q₀= q + c₂v₂+ · · · + c_nv_nand therefore

G^kq₀= G^k(q + c₂v₂+ · · · + c_nv_n)

= G^kq + c₂G^kv₂+ · · · + c_nG^kv_n

2 2

k k

n n

= q + c λ v + · · · + c λ v .

From this we see that

lim G^kq₀= q.

k→∞

25.3 Computation of the PageRank Vector

The Google matrix G is completely dense, which is computationally undesirable. nately,

Fortu-

G = αH + (1 − α) ee

∗ 1 T

1 T

n n

= α(H + 11 ) + (1 − α) 11

= αH + (αa + (1 − α)1) 1

1 T

and H is very sparse and requires minimal storage. A vector-matrix multiplication generally requires O(n²) computation (n ≈ 8, 000, 000, 000 in 2006). Estimates show that the average webpage has about 10 outlinks, so H has about 10n non-zero entries. This means that multiplication with H reduces to O(n) computation. Aside from being very simple, the power method is a matrix-free method, i.e., no manipulation of the matrix H is done. Brin and Page, and others, have confirmed that only 50-100 iterations are needed for a satisfactory approximation of the PageRank vector q for the web.

After this lecture you should know the following:

Setup a Google matrix and compute PageRank vector

197 of 202

196

The PageRank Algortihm

198 of 202

Lecture 26

197

Lecture 26

Discrete Dynamical Systems

26.1 Discrete Dynamical Systems

Many interesting problems in engineering, science, and mathematics can be studied within the framework of discrete dynamical systems. Dynamical systems are used to model systems that change over time. The state of the system (economic, ecologic, engineering, etc.) is measured at discrete time intervals producing a sequence of vectors x₀, x₁, x₂, . . .. The relationship between the vector x_kand the next vector x_k₊₁is what constitutes a model.

Definition 26.1: A linear discrete dynamical system on Rⁿis an infinite sequence

{x₀, x₁, x₂, . . .} of vectors in Rⁿand a matrix A such that

x_k₊₁= Ax_k.

The vectors x_kare called the state of the dynamical system and x₀is the initial condition of the system. Once the initial condition x₀is fixed, the remaining state vectors x₁, x₂, . . . , can be found by iterating the equation x_k₊₁= Ax_k.

26.2 Population Model

Consider the dynamic system consisting of the population movement between a city and its suburbs. Let x ∈ R²be the state population vector whose first component is the population of the city and the second component is the population of the suburbs:

x = .

For simplicity, we assume that c + s = 1, i.e., c and s are population percentages of the total population. Suppose that in the year 1900, the city population was c₀and the suburban population was s₀. Suppose it is known that after each year 5% of the city’s population

199 of 202

198

Discrete Dynamical Systems

moves to the suburbs and that 3% of the suburban population moves to the city. Hence, the population in the city in year 1901 is

c₁= 0.95c₀+ 0.03s₀,

while the population in the suburbs in year 1901 is

s₁= 0.05c₀+ 0.97s₀.

The equations

c₁= 0.95c₀+ 0.03s₀

s₁= 0.05c₀+ 0.97s₀

can be written in matrix form as

c 0.95

0.03 c

" # " # " #

= .

s₁0.05 0.97 s₀

Performing the same analysis for the next year, the population in 1902 is

c 0.95 0.03 c

2 1

" # " # " #

= .

s₂0.05 0.97 s₁

Hence, the population movement is a linear dynamical system with matrix and state vector

0.05 0.97

A = , x_k=

0.95 0.03 c

" # " #

s_k

Suppose that the initial population state vector is

x₀=

0.70

" #

0.30

Then,

x₁= Ax₀=

0.95 0.03 0.70

0.05 0.97 0.30

# " # "

0.674

0.326

Then,

# "

0.95 0.03 0.674

0.05 0.97 0.326

x₂= Ax₁= =

# "

0.650

0.349

In a similar fashion, one can compute that up to 3 decimal places:

0.375

^x500 ^= x1000 ⁼

0.375

" # " #

, .

0.625 0.625

It seems as though the population distribution converges to a steady state or equilibrium. We predict that in the year 2400, 38% of the total population will live in the city and 62% in the suburbs.

Our computations in the population model indicate that the population distribution is reaching a sort of steady state or equilibrium, which we now define.

200 of 202

199

Lecture 26

Definition 26.2: Let x_k₊₁= Ax_kbe a discrete dynamical system. An equilibrium

state for A is a vector q such that Aq = q.

Hence, if q is an equilibrium for A and the initial condition is x₀= q then x₁= Ax₀= x₀, and x₂= Ax₁= x₀, and iteratively we have that x_k= x₀= q for all k. Thus, if the system starts at the equilibrium q then it remains at q for all time.

How do we find equilibrium states? If q is an equilibrium for A then from Aq = q we have that

Aq − q = 0

and therefore

(A − I)q = 0.

Therefore, q is an equilibrium for A if and only if q is in the nullspace of the matrix A − I:

q ∈ Null(A − I).

Example 26.3. Find the equilibrium states of the matrix from the population model

A =

^"0.95 0.03^#

0.05 0.97

Does the initial condition of the population x₀change the long term behavior of the discrete dynamical system? We will know the answer once we perform an eigenvalue analysis on A (Lecture 22). As a preview, we will use the fact that

x_k= A^kx₀

and then write x₀in an appropriate basis that reveals how A acts on x₀. To see how the last equation was obtained, notice that

x₁= Ax₀

and therefore

x₂= Ax₁= A(Ax₀) = A²x₀

and therefore

x₃= Ax₂= A(A²x₀) = A³x₀

etc.

26.3 Stability of Discrete Dynamical Systems

We first formally define the notion of stability of a discrete dynamical system.

201 of 202

200

Discrete Dynamical Systems

Definition 26.4: Consider the discrete dynamical system x_k₊₁= Ax_kwhere A ∈ Rⁿ^×ⁿ. The origin 0 ∈ Rⁿis said to be asymptotically stable if for any initial condition x₀∈ Rⁿof the dynamical system we have

lim x_k= lim A^kx₀= 0.

k→∞ k→∞

The following theorem characterizes when a discrete linear dynamical system is asymptoti- cally stable.

Theorem 26.5: Let λ₁, . . . , λ_nbe the eigenvalues of A. If |λ_j| < 1 for all j = 1, 2, . . . , n

then the origin 0 is asymptotically stable for x_k₊₁= Ax_k.

Solution. For simplicity, we suppose that A is diagonalizable. Let {v₁, . . . , v_n} be a basis of eigenvectors of A with eigenvalues λ₁, . . . , λ_nrespectively. Then, for any vector x₀∈ Rⁿ, there exists constants c₁, . . . , c_nsuch that

x₀= c₁v₁+ · · · + c_nv_n.

Now, for any integer k ≥ 1 we have that.

A^kv_i= λ^kv_i

Then

x_k= A^kx₀

= A^k(c₁v₁+ · · · + c_nv_n)

= c₁A^kv₁+ · · · + c_nA^kv_n

1 1

k k

n n

= c λ v + · · · + c λ v .

k→∞

Since |λ | < 1 we have that lim λ = 0. Therefore,

k→∞ k→∞

k 1 1

n n

lim x = lim (c λ v + · · · + c λ v )

1 1

k→∞ k→∞

= c lim λ v + · · · + c lim λ v

= 0v₁+ · · · + 0v_n

= 0.

This completes the proof.

202 of 202

201

Lecture 26

As an example of an asymptotically stable dynamical system, consider the 2D system

^xk+1 ⁼

1.1 −0.4

0.15 0.6

The eigenvalues of A =

1.1 −0.4

0.15 0.6

are λ₁= 0.8 and λ₂= 0.9. Hence, by Theorem 26.5,

for any initial condition x₀, the sequence {x₀, x₁, x₂, . . . , } converges to the origin in R². In

Figure 26.1, we plot four different state seq

0 1 2

uences {x , x , x , . . . , } corresponding to the four

−7

distinct initial conditions x₀= ³, x₀= ³, x₀= ⁻³, and x₀= ⁻³. As expected,

all trajectories converge to the origin.

Figure 26.1: A 2D asymptotically stable linear system

After this lecture you should know the following:

what a dynamical system is
and how to find its equilibrium states
how to determine if a discrete dynamical system has the origin as an asymptotically stable equilibrium