Linear-Algebra-Notes

@Author: Himanshu Gupta

Lecture#1: The Geometry of Linear Equations

Lecture#2: Elimination with Matrices

Lecture#3: Multiplication and Inverse Matrices

Lecture#4: Factorization into A = LU

Lecture#5: Transposes, Permutation and R^n spaces

Lecture#6: Column space and Null space

Lecture#7: Solving Ax = 0, pivot variables, special solutions

Lecture#8: Solving Ax = b, row reduced form R

Lecture#9: Independence, basis and dimension (of a subspace)

Lecture#10: The four fundamental subspaces

Lecture#11: Matrix spaces; rank 1; small world graphs

Lecture#14: Orthogonal vectors and Subspaces

Lecture#15: Projections onto subspaces

Lecture#16: Projection matrices and Least squares

Lecture#17: Orthogonal matrices and Gram-Schmidt

Lecture#18: Properties of Determinants

Lecture#19: Determinant Formulas and Cofactors

Lecture#20: Cramer’s Rule, Inverse Matrix and Volume

Lecture#21: Eigenvalues and eigenvectors

Lecture#22: Diagonalization and Powers of A

Lecture#25: Symmetric Matrices and Positive definiteness

Lecture#27: Positive definite matrices and minima

Lecture#28: Similar Matrices and Jordan form

Lecture#29: Singular Value Decomposition(SVD)

Lecture#30 Linear transformations and their matrices

I took following notes while watching some of the video lectures from MIT Linear Algebra course by Prof Gilbert Strang.

Introduces 3 ways of looking at a system of linear equations.

1. Row Picture

2. Column Picture

3. Matrix picture

Then it is shown that Row/Column picture become very hard to manage while Matrix Picture scales well with the dimensions.

Also, You need to realize that Ax is a linear combination of columns of A.

Note: Here system of linear equations mean, Ax = b and .

Elimination is the systematic way of solving a system of linear equations. This is how software packages solve them.

Elimination basically brings a matrix(A) to upper triangular(U) form. The numbers on the diagonal of U are called the pivots. There should not be a zero pivot, Sometimes it can be corrected by exchanging rows where a non-zero element is there in the same column somewhere down below.

(Note that when we’re solving we do same operations to b also. It is useful to just do elimination steps to the augmented matrix [ A b ])

Ideally for the unique solution to exist, we should be able to get n non-zero pivots for n equations in n variable. Otherwise it is a failure with either “no solution”() or “many solutions”(0 = 0).

System of equations with unique solution are called non-singular or else they are called singular.

Various steps of elimination can be done by multiplying coefficient matrix(A) with elimination and row-exchange/permutation matrices and that is why it is called elimination “with” matrices. Hence, all the steps in elimination are just some multiplications of matrices as follow..

[first we’re doing necessary row-exchanges, though they can come in between Es where needed, and then eliminations]

Notice that E = is also just a matrix and so all the operations of elimination can be recorded in one single matrix and this can be multiplied with b (in Ax = b) to do all the elimination step to b.

Next question is, elimination converts A to U. What will be the steps that take U back to A?

As EA = U, so A = U

where is such that E = I and is called inverse of matrix E.

How to multiply 2 matrices? (C = AB) There are different ways to look at it.

1. Simple rule that you know..

C(i,j) = i-th row of A DOT j-th column of B (here DOT means dot product)

2. a column of C is combination of columns of A

3. a row of C is combination of rows of B

4. columns 1,..,n of A multiply rows 1,..,n of B and add up to C that is

C = (column-1 of A)(row-1 of B) + (column-2 of A)(row-2 of B) + .. (column-n of A)(row-n of B)

Multiplication of Block matrix: Addition and Multiplication can be carried out just by using the block elements of the matrices.

Inverse of A:

Question to ask whether and when does it exist. If it does then how to find it.

By definition, [given that A is square]

for non-square matrix you will/may have 2 inverses, the left inverse and the right inverse.

We can easily infer following properties..

1. Ax = b, then x = b . Clearly if you can find some x such that Ax = 0 then does not exist as no matrix can multiply 0-matrix and result in non-zero x.

2. if A is not invertible then there is no solution to Ax = b and hence A is singular.

3. if A is invertible then solution to Ax = b exists and hence you can find n non-zero pivots.

4. A diagonal matrix has an inverse provided no diagonal entries are zero (because they are the pivots)

5. A is invertible if its determinant is non-zero

And,

Since , so from the rules of matrix multiplication

A * (j-th column of ) = j-th column of I

so essentially we can find inverse by solving systems of linear equations one per column of inverse (for all systems A stays the same).

Gauss-Jordan Elimination uses this insight and suggests that create the augmented matrix

[ A I ], apply the elimination steps to A part to get to U and continue(Jordan added this) the elimination until you get to reduced echelon form(R) where *only* diagonal entries are non-zero(0’s above *and* below pivot) and then divide each row by its pivot(so that all the pivots become 1) to finally convert A into I. After all such steps to A part in augmented matrix, I part automatically becomes . This is one systematic way of finding inverse of A.

Important thing to understand why it happens. Remember that all the steps of converting A to I can be represented by just doing multiplication of one matrix E to A. Since we converted A to I that means E was such that EA = I, right side of the augmented matrix converted into EI = E.

EA = I clearly means E is the inverse of A.

Let us first determine an important property..

(take transpose of both sides)

Hence

That is, inverse of transpose is just transpose of inverse.

A can be factored into LU iff elimination to U can happen without needing any row-exchange operations. We will talk about the row-exchange issue in the next lecture.

By elimination we know that EA = U and hence A = U . Now if full elimination happened without any row-exchange operations then it turns out that is lower-triangular and A = LU.

Let us try to see it using a 3X3 case.

=>

As all individual are lower triangular because i > j with all diagonal entries being 1 and hence multiplication of them also comes out to be a lower triangular matrix with all diagonal entries being 1.

Also, briefly mentioned, that A can be factored into A = LDU where

L is lower triangular with all diagonal entries 1

D is diagonal matrix with all the pivots of A on diagonal

U is upper triangular with all diagonal entries 1

Near the end it is shown by exploration that where P is a permutation(row-exchange) matrix, which is basically the identity matrix with 2 rows exchanged. And, in n dimension there are n! permutation matrices including the identity.

A = LU is valid only when we don’t need any row-exchanges. Let say we do need them, in that case let us apply all the individual row-exchanges first s.t.

PA = (A

Now matrix PA can get to U with eliminations without needing any row exchange and hence in general PA can always be factored into LU.

PA = LU

A matrix is symmetric iff . It is clear to infer that S is always square.

Also for any matrix A, is always symmetric.

Proof:

also you can prove similarly,

after this we move on to chapter#3 in the book (Vector spaces and subspaces) and here we start to get into the center of real linear algebra.

Vector Space:

A vector space is a set of “vectors”(it should be noted here that vector is just some abstract entity and can be anything as long as vector addition and constant multiplication is defined) that satisfy certain(8) conditions and well defined rules of vector addition and for multiplication with real number so that their linear combination can be taken. Those 8 conditions are as follow...

1. commutativity, x + y = y + x

2. associativity, x + (y + z) = (x + y) + z

3. existence of “zero vector” s.t. x + 0 = 0 + x = x

4. for all x there exists -x s.t. x + (-x) = 0

5. 1(as in constant 1) times x = x

6. (ab)x = a(bx)

7. a(x + y) = ax + ay

8. (a + b)x = ax + bx

Note that zero vector by itself alone is also a valid vector space.

A subspace of a vector space is a set of vectors s.t.

1. It contains the zero vector. (..surface has to go through the origin)

2. It is closed under linear combination. That is, if x and y are in the subspace then any linear combination of them, ax + by, is also in the same subspace.

Notice that a subspace is also a vector space.

Also, if you are given a set of vectors, which is not a subspace. You can create/span a subspace by including zero vector in it and also the linear combination of all the (independent)vectors present in the set.

Column space of A, C(A):

Remember Ax = b and if A is non-singular then Ax is just some linear combination of columns of A. Set of all the b’s such that Ax = b is solvable is called column space of A.

In other words set of all the linear combinations of columns of A is called the column space of A.

We are interested in C(A) because Ax = b is solvable iff b is a vector in C(A).

Mentioned briefly that the “independent” columns of A are called pivot columns. And, number of independent columns is called the dimension.

Null space of A, N(A):

In column space we were interested in b’s while here we are interested in x’s.

N(A) is the subspace, the set of all the vectors x s.t. Ax = 0 .

Near the end it is noted that set of all the solutions(possible x’s) to Ax = b does not form a subspace(prove it?).

This lecture is basically about a systematic way (algorithm) to find the Null space of A. Clearly A has to be singular for the null space to exist.

If we apply the elimination steps to A without worrying about zero pivots and end up with a (loosely speaking) upper triangular, staircase matrix which is called the echelon matrix. The number of non-zero pivots found is called the rank of A. And, the columns that contain pivots are called pivot columns while the rest are called free columns.

In Ax = 0, components of x corresponding to free columns are called free variables(can have any value). We find special solutions by giving free variables special values with one of them being 1 and rest 0, we solve the rest components of x using back substitution. All the special solutions are in N(A) and rest solutions can be found by taking linear combination of special solutions. This means, dimension of the null space is equal to the number of special solutions and hence equal to the number of free variables (or free columns).

[Note: We could have continued the elimination beyond getting the echelon form to reduced row echelon form(R) where all the pivots are equal to 1 and we have zeros below as well as above the pivots. With this form back substitution becomes much easier and also R has all the necessary information as clear as it can be. In fact, with a bit careful study you can simply read off special solutions from the R form.]

In the lecture, it is mentioned that elimination steps preserve the null space but not the column space.

Create augmented matrix [ A b ], bring the A part to reduced row echelon form. All the 0 rows should have o in the b part also or else no solution exist. A particular solution() can be read off from the b part in R and you combine it with null space () to get the solution to Ax = b.

Rank tells you basically whether the solutions exists and if they do then how many. There is a comprehensive table regarding this on page-159 in the book. This same section also contains the algorithm for solving the said equation.

Let say A is m X n matrix and r be the rank. clearly, r m and r n.

when r = m, it is called full row rank and if r = n, it is called full column rank.

From the general algorithm of solving Ax = b you will see that if r = m[no zero rows in R] then there is no constraint on the components of b and the system is always solvable. Hence when r = m, there is at least 1 solution always.

Independence:

A set of vectors is independent iff there is no linear combination of them that is zero vector. (zero combination, that is all coefficients being zero, is excluded of course)

Note that, if m < n then Ax = 0 always has the solutions because r can at most be m and still less than n, that means free columns and hence free variables exist and that means N(A) has more than just zero vector. So, columns in A are always dependent if m < n.

This fact can be used to prove that in 2-d plane, you take any 3 vectors then they have got to be dependent.

Proof: aV1 + bV2 + cV3 = 0 (a,b,c are components of unknown column vector x)

av11 + bv21 + cv31 = 0

av12 + bv22 + cv32 = 0

clearly m < n and hence some solutions exist and hence V1, V2, V3 have to be dependent.

So, we can check if a set of vectors is dependent/independent by putting them as columns in matrix A and see if null space of A has just zero vector or more.

They are independent if null space is just zero vector or else dependent.

In other words, They are independent if r = n or else(r < n) dependent.

This gives us a general framework of analysis in linear algebra. Given a set of vectors, We are usually interested in putting them as columns to create matrix A and look at the solutions of

Ax = 0 or rank of A.

Spanning:

A set of vectors span a subspace, means that subspace contains all the linear combination of the vectors in the set.

If those vectors were dependent then we can remove some of them and still span the same space. This leads to the idea of basis.

Basis:

Basis for a space S is a set of vectors with 2 properties...

1. They are independent.

2. They span the whole space S.

So, you can give all the information about a subspace by just looking at a set of basis vectors.

Note that there can be multiple basis for a space as long as they satisfy the 2 properties, but all of them have one thing in common and that is the size of basis and that is called the dimension of the space.

One important understanding emerges, that merges some concepts in matrix and a subspace, here is that..

# of pivot in A = Rank(A) = # of pivot columns in A = Dimension of the column space of A

from the algorithm of solving Ax = 0, it can be easily inferred that(A is m X n matrix)..

Dimension of the null space of A = # of free columns in A = n - Rank(A)

Let us first take note of the four fundamental subspaces.

1. Column space of A, C(A)

2. Null space of A, N(A)

3. Row space of A space spanned by rows of A Column space of , C()

4. Null space of == N() [also called left null space of A]

Given that A is m X n ..

What big space are they part/subspace of?

Clearly

C(A) is in (every column has m “components”)

N(A) is in

C() is in

N( is in

We want to understand the systematic way to produce basis for each of them and to find dimension of each subspace.

C(A): Rank r is the dimention. pivot columns in A are the basis(but be careful we mean the pivot columns from the original matrix as column space of R is not same as that of A, we need to get to R just to see which columns are the pivot columns)

C(): dimension of row space is same as that of column space, r. Though pivot columns in are basis for this. But, they are easier to find since row operations does not change the row space and hence basis here is the set of non-zero rows in R(derived from elimination to A).

N(A): clearly dimension is n - r and the basis is the set of special solutions

N(: clearly dimension is m - r and before we discuss the basis, let us find out why it is called the left-null space.

N( is set of all the column vectors y s.t.

Take transpose of both sides to get, and now you can see why it is called left-null space.

Finally, we can find the basis for this using the crude way i.e. treating as our matrix and finding basis for its null space. But we can find it smartly.

When we convert A to R, there is a set of steps that can be represented by a matrix, let say E

(Note that it can be found be eliminating augmented matrix [ A I ] and I becomes the E when elimination happens on A)

So EA = R

Some rows on the bottom of R are zero rows. And left null space is the vectors that combine rows of A to produce zero. Clearly same # rows on the bottom of E (as the zero rows on the bottom of R) are the basis for left-null space.

Bottom line is that all the basis for all the fundamental spaces can be found out by doing elimination steps just once on A.

Near the end of the lecture, idea of vector spaces is stretched from being only a set of column vectors by giving example of another vector space where “vectors” are not as traditional.

Set of all 3 X 3 matrices is a vector space, call it M, because 3 X 3 matrix can be considered a “vector” because it fulfills both the requirement. They can be added and can be multiplied with constants as well, so you can take linear combinations. Also the set of all of them satisfies the 8 conditions of being a vector space.

Next, what are some of the subspaces of this vector space?

1. Set of all the 3 X 3 symmetric matrices (call it S)

2. Set of all the 3 X 3 upper/lower triangular matrices (call upper triangular space U)

3. Set of all the 3 X 3 diagonal matrices (call it D = S U)

continuing the end of last lecture...

dim(M) = 9

dim(S) = dim(U) = 6

dim(D) = 3

it is easy to write standard basis for all of these. BTW, as we know and notice that S U is not a subspace but let us define another subspace spanned by the “vectors” in S and U. Let us denote it by S + U. This will be a subspace of dimension 9 and is actually same as M.

The point of this example and other vector spaces containing things not “looking” like vectors is that linear algebra is applicable in many fields and not just traditional vectors.

Rank 1 Matrices:

Key thing shown here by an example is that every m X n matrix A of rank 1 can be factored into

A =

where u is a column vector of size m X 1 and

v is a column vector of size 1 X n

In the example u turns out to be the basis for column space of A while v is the basis for row space of A.

Rank 1 matrices are interesting to learn because of the simple properties they have and they are the builiding blocks for all the matrices i.e. a matrix of any rank can be represented by a linear combination of some rank 1 matrices.

More examples of subspaces:

Later there were some interesting examples of subspaces and about finding their basis by looking at them like the null space(or some other from fundamental 4) of some matrix.

For example set of all the column vectors of size 4 X 1, sum of whose components is 0.

That is v = is s.t. v1 + v2 + v3 + v4 = 0

Now to find the dimension and basis for this subspaces, it is helpful to see that this is null space of Ax = 0 where A = [1 1 1 1 ] and from this you can very easily determine that N(A) is same as the given subspace and then it is a mechanical activity to find the basis and dimension.

This lecture is the quest about orthogonality. What it means for vectors to be orthogonal, what it means for subspaces to be orthogonal, what it means for basis to be orthogonal etc.

Orthogonality of 2 vectors:

Any vectors x and y are orthogonal iff (their dot product is 0). This can be checked from the view point of pythagoras theorem for right triangles which is..

=>

=> ()

=>

by definition zero vector is orthogonal to every other vector.

Orthogonality of 2 subspaces:

He starts by giving a very intuitive example. In the 3 dim room we are standing, let us talk about the floor(one subspace of ) and any line at the intersection of two walls(another subspace of ). Clearly they “look” orthogonal/perpendicular. So what should be the definition of subspace orthogonality that goes along with this intuition.

Subspaces S and T *of* a vector space are orthogonal iff every vector in S is orthogonal to every vector in T. Hence intersection of such S and T should contain *only* zero vector.

Note that Floor and Wall are really not orthogonal subspaces because their intersection contains a line(i.e. many vectors) and one vector can’t be orthogonal to itself.

Orthogonality of fundamental subspaces: (this is also called part 2 of fundamental theorem of linear algebra, part 1 is the facts about their dimensions)

1. Null space of A is orthogonal to row space of A. This can be easily proved by looking at Ax = 0.

2. Left null space of A is orthogonal to columns space of A. Similarly this This can be proved by looking at

There is one special fact about the null space and row space, their dimensions add to n which is equal to the dimension of the vector space they are subspace of. This means null space contains *all* the vectors that are orthogonal to vectors in row space. This leads to the concept of orthogonal compliment.

Orthogonal compliment of subspace V is a subspace that contains *all* the vectors orthogonal to V. This is called V-perpendicular and is denoted by .

Rest of the lecture is more to set the context for next lecture.

Can we find approximate solutions to Ax = b when this system has no solutions?

Next lecture will show that you can solve , this will be solvable and will contain closest possible solution.

Also note that interesting things about . It is always square and symmetric.

Discussion started with picture of vector b being projected onto to vector a in 2 dimensions. It is easy to derive that...

p = a() = ()b = Pb where p is projection of b onto a

P( = ) is the called the projection matrix. Now, what are some of the properties of this matrix.

clearly every possible p is in column space of P. Hence C(P) is a line through a and since it is a line, the rank of P is 1.

It can be easily proved that and hence P is symmetric.

What happens if I do the projection twice or thrice or multiple times?

From the picture, it seems you should get same answer and hence ideally...

also it can be proved using the value of P in terms of a.

Now let us move to higher dimensions. Why do we need projection?

Because we can not solve some(infact many) equations Ax = b which are not solvable otherwise when b is not in the column space of A, so we want to find closes possible solution.

In these cases we can instead solve Ay = p where p is the projection of b onto the column space of A. This is the best(“closest”) we can come to solving the original equation.

If we do same thing in 3d, That is find projection(p) of vector b onto a plane (a 2d subspace in ).

Let say a1 and a2 be the basis for plane and p be the projection, clearly

p = x1a1 + x2a2 (x1, x2 are our unknowns)

=> p = Ax ; where A = [ a1 a2 ] and x is the column vector .

error vector, e = b - Ax is orthogonal to the subspace and hence

and

above in matrix form reduces to

So x =

and hence projection p = Ax =

So, projection matrix P =

you can easily verify that, as expected, P is symmetric and

Above equation is also valid in n dimensions.

Note: We can not reduce into because A is not square. Note that A is 3 X 2.

However if A was square and 3 X 3 that means b is a vector in 3d and projection onto 3d subspace(which is the original vector space itself) is b itself and the projection matrix being Identity matrix.

Note: By definition, Pb is in the column space of A and hence Ax = Pb is solvable and closest possible solution which is equivalent to...

Ax = b

multiply both sides with to get..

that is why above eq can be solved to get closest possible solution to Ax = b when its not solvable(or b is not in C(A))

(it may be worthwhile to watch the lecture if you don’t understand the derivations above).

This lecture concludes with introducing least-square problem.

Ax = b, if not solvable, that means b is not in C(A). Pb gives you a vector that is as close as possible to b and is in C(A). So, there are 2 extreme cases.

1. b is in C(A) : then b has to be some combination of columns of A, that is b = Ax for some x and then Pb = b

2. b is orthogonal to C(A): that means b is in N() or in left nullspace of A. So, and hence, Pb = 0

If P projects b to subspace S and T is its orthogonal complement then (I - P) another projection matrix that projects b onto T.

p + e = b

=> Pb + e = b

=> e = b - Pb = (I - P)b

basically the concept of projection “divides” a vector into the projections on 2 orthogonal subspaces(in 2d, finding x and y components of a random vector) and hence 2 projections are perpendicular.

note: In last 2 lectures we’ve kinda assumed that is invertible, can we prove it? Well the reason is that columns of A are the basis vectors and hence independent. And,

Theorem: If columns of a matrix A are independent then is invertible.

Proof: We will prove it by contradiction. Suppose is not invertible then nullspace of contains more than just zero vector, that is there are non zero solutions to following eq(Let x be an arbirary such non zero solution)..

=>

=>

LHS is clearly the squared-length of vector Ax, so this can be true iff

Ax = 0

which means, there is a combination of columns of A which is zero and this is a contradiction as A’s columns are independent. And hence is invertible.

note: watch the lecture to see application of projection matrices in solving least square problem.

Orthonormal vectors:

It is Orthogonal + Normalized. Two vectos x and y are orthonormal iff they are orthogonal unit vectors.

Orthonormal basis:

A basis each vector of which is orthogonal to every other vector in the basis and all have unit length. We will denote such basis for an n-dim space by q1, q2, q3....qn

So,

Let us put all the q’s as column vectors in a matrix, let us call it Q = [q1 q2 … qn]. It is easy to see that . Such a matrix Q is called orthogonal matrix. Remember Q doesn’t have to be square to be orthogonal.

In other words, A matrix is orthogonal iff its columns are orthonormal.

This means, if Q is square and orthogonal then

Let us revisit the projection: Let say we want to get the projection of vector b onto the column space of an orthogonal matrix Q. Then

P = = (if Q is square then )

also solving Qx = b (when b is not in C(Q)) is easier because we need to solve

=> x =

=>

Since, it is much easier to work with an orthogonal matrix. So, second half of this lecture is dedicated to a systematic way to find an orthogonal matrix(Q) from any matrix(A) that has independent columns. This method was given by gram-schmidt.

This part starts with an example of 2 vectors(c and d) which are independent but not orthogonal. We want to find 2 independent vectors from those which are orthogonal. Then we can divide them by their length to get unit and hence orthonormal vectors. Central idea is that d - (projection of d on c) will be orthogonal to c. The idea is explained very nicely with pictures in the lecture. Please revisit it if you’ve forgotten what happens if there were 3 independent vectors to start with.

Note that the columns space of Q is same as that of A. In the end it is another factorization of

A = QR where Q is orthogonal matrix with C(Q) = C(A).

It turns out that R is upper triangular. (why? watch the lecture at 48:00).

1. det(I) = 1

2. Exchange of 2 rows reverses the sign of determinant’s original value.

due to these 2 properties det(P) = +1 or -1 where P is some permutation matrix

Remaining property is related to the linear combination of a row in the determinant.

3a. If you multiply one row with number k, determinant’s value gets multiplied with k as well.

So, det(kA) =

3b.

|a+a’ b+b’| |a b| |a’ b’|

| c d | = |c d| + |c d|

In words, property 3 means that determinants are linear IN EACH ROW.

Note: above does NOT mean that det(A + B) = det(A) + det(B)

Above properties are most basic, all the other properties can be derived from them.

4. If two rows of a determinant are identical then its value is 0.

Proof: Let say A is the determinant with two identical rows and det(A) = k. Now exchange the identical rows, it shouldn’t change the determinant, but from property#2, det(A) = -k now and hence k has to be 0.

5. Elimination step, where is ith row, preserves the determinant value.

Proof: I’m using 2X2 determinant but same logic can be applied to nXn

|a - kc b - kd| |a b| k|c d|

| c d | = |c d| - |c d|

where equality follows from property 3 and 2nd term has identical rows and hence 0.

6. If there exists a zero row in A, then det(A) = 0

Proof: By one elimination step, you can make zero row identical to any row and then the determinant becomes 0. Another method is that you multiply zero row with a non zero number k, though determinat remains same but value should get multiplied with k and hence value has got to be 0.

7. If U is a triangular matrix, then det(U) = product of its diagonal elements

So, to get determinant of any matrix A, we should first bring it to triangular form(by elimination, however remember to multiply with -1 every time you do a row exchange in this process) and then just multiply the diagonal entries.

Proof: Let say diagonal elements are d1, d2, ….. dn (assume all of them are non zero)

Make every non diagonal element 0 by using elimination steps, and you’re just left with a diagonal matrix with d1, d2, ...etc on diagonal. Then factor all these out to get d1.d2.d3...dn *det(I) = d1.d2...dn

If there is any 0 diagonal element, then by elimination you can come to a matrix with zero row and determinant value becomes 0.

8. If A is singular, then det(A) = 0

Proof: If A is singular then triangular form contains a 0 pivot at the diagonal and by property 7 then det(A) becomes 0.

And, so det(A) 0 when A is invertible(non-sigular).

9. det(AB) = det(A).det(B)

Interestingly this one is not proven in the class but just verified using two general 2X2 matrices.

Now, what is the

= 1

So,

Also it makes sense because if exists then A has to be invertible and det(A) has got to be non zero.

what is ?

10.

proof: This is trivial if A is singular. Then is also singular and both determinant’s value is 0.

Otherwise A can be factored into triangular matrices LU.

Since L and U both are triangular, so are triangular as well with same diagonal entries. And hence

So

So, from this property it becomes evident that though we talked about rows in all the above properties as if they were special. But, all those properties are applicable even if we do same operation to columns.

For exampls, if A has zero column then det(A) = 0

If you exchange 2 columns in A then determinant is negated. etc.

First half of the lecture is about finding formula for 2X2, then 3X3 and then extrapolate the results to find big formula for nXn determinant using the first 3 properties listed in last lecture. (revisit the video if you can’t remember how to do it!)

Next, the cofactor formula of finding determinant value is explained.

where cofactor of ,

where is the matrix of order n-1 with ith row and jth column removed from A. Usually, we take i = 1.

This lecture starts with a formula for finding inverse of a matrix...

where C is the matrix of cofactors. Next part shows why this formula is true by proving the equivalent.. for general nXn case. To prove, just write general representation of A and C, and then on multiplication LHS comes out to be RHS. Unfortunately google docs has no support for writing matrices at this point, so I’m not being able to do that here :(.. but watch the video if you can’t recall how its done.

Next comes the Cramer’s Formula which is an algebraic way of solving Ax = b (Prof. Strang recommends not to use this but elimination as this is too computation intensive).

If A is non-singular. Ax = b is solved by determinants

where is same as A but jth column replace by vector b.

Proof: hint...

Ax = b

=>

=>

and turn out to contain nothing but determinant of various B’s.

Third property covered in this lecture is that

|det(A)| = volume-of-a-box

which box?

Let say every row of A represents a vector in n-dimension. The box has these vectors as its edges(all of which start from origin).

From here on we assume that A is a square matrix.

Eigenvectors:

A way to look at Ax is to say, A operates on vector x(..like a function) and out comes another vector Ax. Now, the special vectors, which when operated by A, result in another vector in the “same” direction(.. parallel to) as original vector are called eigenvectors of A.

so for any eigenvector x, where is a constant and is called the eigenvalue.

Let us see a special case, before we discuss the general method of finding eigenvalues and eigenvectors.

Consider projection matrix P of a subspace S.

If x is any vector that is already in S then x’s projection onto S is x itself and hence Px = x

If x is any vector perpendicular to the subspace then its projection onto S is 0 and hence Px = 0

So, clearly P has 2 eigenvalues.. 0 and 1.

Some Facts:

1. n X n matrix has n eigenvalues(not necessarily distinct).

2. Trace(A) = sum of diagonal elements in A = sum of all the eigenvalues

Proof: a proof for 2X2 matrix is done, basically using the quadratic equation’s sum of roots formula

3. Det(A) = product of all eigenvalues

Proof: Let are the eigenvalues then they are found by solving the equation

det() = 0

So det() = = 0

now put above and get

det() = det(A) =

Now to the general way of finding eigenvectors and eigenvalues.

=>

If there is any non zero solution to above eq then matrix has to be singular and hence

det() = 0

From this we find the eigenvalues and then eigenvectors are just nullspace of

Another fact:

By way of examples, it is shown that if A has eigenvalues then (A + cI) has eigenvalues . That is eigenvalues are shifted by c if the diagonal entries in A are shifted by c. Also the eigenvectors stay same.

In general, it can be easily spotted by observing following...

If then

Next, Example of a rotation matrix(rotates a vector by 90 deg when operated upon) is given which has real numbers as its elements but eigenvalues come out to be complex numbers. Also, ideawise, can there be a vector that is rotated by 90 deg and still remain parallel to original?

More Facts:

1. Symmetric matrices have real eigenvalues. (proved in lecture 23)

2. Antisymmetric matrices have pure imaginary(complex number with no real part) eigenvalues.

3. Eigenvalues for a lower/upper triangular matrix are its diagonal elements. It happens because, for a triangular matrix, determinant is just the product of diagonal elements.

relook at the end... TODO

Suppose A has n independent eigenvectors, we put them as columns in a matrix and call it S(the eigenvector matrix).Then

AS = S

where is the diagonal matrix(..also called eigenvalue matrix) made up of the eigenvalues corresponding the eigenvectors in S, and hence

This is how we can diagonalize A.

Can you answer quickly, what are the eigenvalues and eigenvectors of ?

Method 1: => =>

Method 2: Also if A has n independent eigenvectors then,

So, eigenvalues of are kth power of eigenvalues of A while the eigenvectors are same as A. Due to this simple property eigenvalues provide a great way to explore powers of a matrix.

Theorem:If A has n independent eigenvectors and for any eigenvalue , then

Proof: , and so does kth power of A.

Theorem: If A has n distinct eigenvalues then A is sure to have n independent eigenvectors and hence is surely diagonalizable.

While if A does not have n distinct eigenvalues then A may or may not have n independent eigenvectors.(e.g. the Identity matrix has n independent eigenvectors but just 1 as eigenvalue)

An application of eigenvalues:

Let A be an n X n matrix with n independent eigenvectors. You’re supposed to solve for given that

and is known.

Clearly,

Let x1,x2,...xn be independent eigenvectors of A then

, where c’s are some constants.

So clearly

where

is eigenvalue matrix

S is eigenvector matrix

c is the column vector consisting of c1,c2,....cn

central idea here is that when a first order system is evolving in time with a matrix A starting from some known value, you can find the future values easily using the eigenvalues and eigenvectors of A.

This same idea can be used to very efficiently calculate the 100th or nth Fibonacci number() as shown in the lecture the trick is to realize that

and where

A = [ 1 1 ]

[ 1 0 ]

Symmetric Matrix: Square matrix A s.t.

For a symmetric matrix containing only real elements..

1. All eigenvalues are real.

Proof: , now take complex conjugate of both sides to get(note that A contain real numbers only, so A’s complex conjugate remains A itself)

, now take transpose of both sides to get

, Since

, multiply both sides with x on the right

=>

=> ; is definitely positive as can be shown by little calculation... So

=> , hence the eigenvalues are real.

Note that, above proof will still work if A was complex matrix but instead of being symmetric it fulfilled the condition that . So for complex matrices described just now, eigenvalues are real again.

2. Eigenvectors can be chosen orthonormal.

Proof: We will do this in 2 parts.

Lemma1: Prove that eigenvectors of a real symmetric matrix (when they correspond to distinct eigenvalues) are always perpendicular.

Suppose and then

=>

=>

=>

=>

Since , hence and hence x and y are perpendicular.

So, from Lemma1, if symmetric matrix has distinct eigenvalues then orthonormal eigenvectors can be chosen.

TODO: Lemma2: Even if a nXn symmetric matrix has duplicate eigenvalues, there always exist n independent eigenvectors. A proof for this is sketched at the end of section 6.4 in the book.

Both of lemmas together prove that eigenvectors for a symmetric matrix can be chosen orthonormal.

3. Pivots and Eigenvalues have same signs. That is # of positive pivots = # of positive eigenvalues.

Also,

product of eigenvalues = product of pivots = determinant of matrix

Spectral Theorem:

If A is symmetric, then A can be factorized into A = where

Q is matrix containing orthonormal eigenvectors of A as columns (and hence orthogonal matrix) and

is diagonal matrix containing respective eigenvalues on the diagonal.

Proof: With simple calculation

=>

=> (because for orthogonal matrix Q)

Positive definite matrix:

Symmetric matrix with all the eigenvalues being positive.

Hence, all the pivots are also positive and all subdeterminants[42:00 in the lecture] are positive

Positive semidefinite matrix:

Symmetric matrix with positive or 0 eigenvalues.

Another definition(or a test) of Positive definiteness:

If for all vectors x, then A is positive definite matrix.

You can sorta “see” why positive pivots lead to

You can see by a 2X2 matrix example. Let say

A = [2 1]

[1 3]

Let x be an arbitrary column vector

f(x) = = =

by completing the squares, you can clearly see that f(x) > 0 for any column vector x and hence A is positive definite.

But here we want to notice that matrix elimination is equivalent to completing the squares in f(x).

If you do elimination on A, you get

U = [2 1]

[0 5/2]

Notice, how two pivots above show up as coefficients in f(x) where square completion is done. So if any of the pivot is negative, there will be negative component in f(x) after completion of square and f(x) could be negative for some x.

Now on to Minima.

In single variable calculus f(x) has a minima at a point x = a if first derivative of f(x) at x = a is 0 and 2nd derivative at a is > 0.

And, in multivariable calculus, f( has a minima when first (partial)derivatives are 0 at a point in n-dimension and matrix of 2nd (partial)derivatives is positive definite.

This lecture starts with a bit of leftover discussion of positive definiteness.

1. For any matrix A, is positive definite or atleast semidefinite.

Proof:

So if A’s nullspace is just 0 vector then A is positive definite or else positive semidefinite.

2. If A is positive definite then is also positive definite.

Proof: If A is positive definite, then all of its eigenvalues are also positive. has eigenvalues which are again positive and hence is positive definite.

3. If A and B are positive definite then so is A+B.

Proof: Hint: Is

Now on to the Similar matrices.

Two square matrices A and B are similar if there exists an invertible matrix M s.t.

B =

Example:

where is eigenvalue matrix

S is eigenvector matrix.

So, and A are similar matrices.

If two square matrices are similar then they have same eigenvalues(and same number of independent eigenvectors).

Let say A and B are two arbitrary similar matrices s.t. B = for some matrix M.

=> => =>

=> =>

and hence A and B have same eigenvalues.

So, similarity defines a family of matrices which have common eigenvalues and most important of those matrices in the family is the eigenvalue matrix.

But some matrices are not diagonalizable. Sometimes all the eigenvalues of a matrix are same. Let us consider a 2X2 case. Let A be a 2X2 matrix s.t. both of its eigenvalues are same and equal to 4. Eigenvalue matrix in this case is 4I. Now If you take any matrix M and try

so clearly eigenvalue matrix is inside a family containing just this matrix. All the other 2X2 matrices with both eigenvalues 4 are in another different family(may not be true in higher dimensions). Jordan form just means to spot “best looking” matrix in a family. If possible it is the eigenvalue matrix or else it is some triangular matrix with eigenvalues on the diagonal.

Then there is discussion of “jordon blocks”, for a matrix # of jordan blocks is same as # of unique eigenvectors. At this point this discussion does not seem important, I will look into this when necessary.

For any matrix A, SVD is a factorization of A into

where U and V are orthogonal matrices and is a diagonal matrix.

You can notice that it is a generalisation of spectral theorem for arbitrary matrices.

The concept:

Let A be arbitrary mXn matrix with rank r.

Let be an orthonormal basis for row space of A, C()

Let be an orthonormal basis for column space of A, C(A), such that

for i in {1,..r}: where are constants.

Let be an orthonormal basis for nullspace of A, N(A)

Let be an orthonormal basis for left nullspace of A,

Define

V the nXn matrix containing as columns.

U the mXm matrix containing as columns.

the mXn diagonal matrix with first r diagonal entries being and then 0s.

clearly V and U are orthogonal matrices. And, you can check with small calculuation that

=> (Since V is orthogonal, so

So, SVD brings all 4 fundamental spaces together to diagonalize A.

How to find the SVD:

Let us try to remove U from the picture. Remember the magical symmetric positive definite matrix

=>

=>

Since U is orthogonal, so and

is diagonal so

So,

Now take a look back at “spectral theorem” for positive definite matrices and you can determine that V is eigenvector matrix for and is the eigenvalue matrix for

Now, using the knowledge from previous lectures, it is straightforward to determine and (and hence )

Then you can determine U from or easier might be to use in the same way as .

and hence U is eigenvector matrix of

A transformation T is linear iff

T(u + v) = T(u) + T(v)

T(cu) = cT(u)

clearly T(0) = 0 for a linear transformation and any transformation Ax (where A could be any matrix) is always linear.

First half of the lecture covers many example linear and non linear transformations.

Minimum information needed to describe a linear transformation T:

Let say is a basis of . If we describe the transformations

then we can find T(v) for any vector v in .

Also, every linear transformation can be described by a matrix A. And, here is the rule to find A.

Given a basis for

and a basis for

ith column of A would be.. s.t.

(can you figure this rule out by taking standard basis for input and output space?)

Now, Ax = y

is equivalent to T(x) = y

where x is a vector in whose coordinates are described in and

y is a vector in whose coordinates are described in