1 of 25

— course overview —�18.06: Linear Algebra

Prof. Steven Johnson,

MIT Applied Math, Physics

Fall 2022

all learning materials: https://github.com/stevengj/1806

pset submission/grades/announcements only: MIT Canvas/Gradescope

Textbook: Strang, Introduction to Linear Algebra, 5^th edition

+ supplementary notes

2 of 25

The textbook

Gil Strang, Introduction to Linear Algebra, 5^th edition

Significant but incomplete overlap with Strang’s OCW video lectures: these are a useful supplement but not a replacement for my lectures!

Linear algebra is a vast subject, no single semester (or single book) covers all of it …

3 of 25

Administrative Details

Lectures MWF11, 34-101 — video recording & notes links posted to github

Tuesday recitations, various office hours — use Canvas to switch sections

13 Weekly psets, due Fridays 11am (except when Friday is an exam or holiday, in which case due date is Wed or Mon). Submission via Gradescope

— lowest pset score will be dropped, anything else requires S3 + Prof. J.

— pset 1 posted today, due Friday 9/16

Grading: homework 15%, 3 in-class/in-person exams 45% (10/7, 11/14, & 12/9)

& final exam 40% — makeups require S3 + Prof. J (no class conflicts, sports ok)

Collaboration policy: talk to anyone you want, read anything you want, but:

• Make an effort on a problem before collaborating.

• Write up your solutions independently (from “blank sheet of paper”).

• List your collaborators and external sources (not course materials).

18.06 piazza.com discussion forum: https://piazza.com/class/l7g5ixuivav3rm

Problem-set partnering: psetpartners.mit.edu

Tutoring: Math Learning Center, also TSR ²

4 of 25

Syllabus and Calendar

Significant overlap with Strang’s OCW video lectures: these are a useful supplement but not a replacement for attending lecture. Likely topics:
Exam 1: 10/7. Matrices, Elimination, LU factorization, nullspaces and other subspaces, bases and dimensions, vector spaces, complexity. (Book: 1–3.5, 11.1)
Exam 2: 11/14. Orthogonality, projections, least-squares, QR, Gram–Schmidt, orthogonal functions, determinants, infinite dimensional vector spaces (Book: 1–5, 10.5). Concentrating on material since exam 1, but linear algebra is cumulative!
Other possible topics: defective matrices, SVD and principal-components analysis, sparse matrices and iterative methods, complex matrices, linear operators on functions, matrix calculus & optimization / root-finding.
Final exam: all of the above.

Before each exam, a definitive list of topics will be announced, along with a review session.

5 of 25

What is 18.06 about?

High school:

3 “linear” equations

(only ± and × constants)

in 3 unknowns

Method: eliminate unknowns one at a time.

Equivalent matrix problem

Ax = b

Ax is a “linear operation:”

A(x+y) = Ax + Ay

A(3x) = 3Ax

take “dot products” of rows × columns

matrix of

coefficients

vector of

unknowns

vector of

right-hand sides

6 of 25

What is 18.06 about?

Linear system of equations,

in matrix form

Ax = b

Will we learn faster methods to solve this? No. (Except if A is special.) The standard “Gaussian elimination” (and “LU factorization”) matrix methods are just a slightly more organized version of the high-school algebra elimination technique.

Will we get better at doing these calculations by hand? Maybe, but who cares? Nowadays, all important matrix calculations are done by computers.

Will we learn more about the computer algorithms? A little. But mostly the techniques for “serious” numerical linear algebra are topics for another course (e.g. 18.335).

7 of 25

How do we think about linear systems?�(imagine someone gives you a 10⁶×10⁶ matrix)

All the formulas for 2×2 and 3×3 matrices would fit on one piece of paper. They aren’t the reason why linear algebra is important (as a class or a field of study).
Large problems are solved by computers, but must be understood by human beings. (And we need to give computers the right tasks!)
Understand non-square problems: #equations > #unknowns or vice versa
Break up matrices into simpler pieces

Factorize matrices into products of simpler matrices: A=LU (triangular: Gauss), A=QR (orthogonal/triangular), A=XΛX^–1 (diagonal: eigenvecs/vals), A=UΣV* (orthogonal/diagonal: SVD), …
Submatrices (matrices of matrices).

Break up vectors into simpler pieces: subspaces and basis choices.
Algebraic manipulations to turn harder/unfamiliar problems (e.g. minimization or differential equations) into simpler/familiar ones: algebra on whole matrices at once

8 of 25

Don’t expect a lot of “turn the crank” problems

on psets or exams of the form

“solve this system of equations.”

… we will turn it upside-down, give you the answer and ask the question, ask about properties of the solution from partial information, give you a factorization and ask how to use it … the general goal is to require you to understand the crank rather than just turn it.

Fundamentally more abstract than 18.01–18.03!

9 of 25

Where do big matrices�come from?��Lots of examples in many fields,�but here are a couple that are�relatively easy to understand…

10 of 25

Engineering & Scientific Modeling�[ 18.303, 18.330, 6.7300, 6.7330, … ]

http://gmsh.info/

Unknown functions

(fluid flow, mechanical stress,

electromagnetic fields, …) approximated by values on a discrete mesh/grid

e.g. 100x100x100 grid

= 10⁶ unknowns!

http://www.electronics-eetimes.com/

comsol.com

11 of 25

Data analysis�and Machine Learning

Image processing:

images are just matrices of numbers

(red/green/blue intensity)

[ Mathworks ]

[ wikipedia ]

Regression:

(curve fitting)

[ computationalculture.net ]

Google “page rank” problem

(also for gene networks etc.)

Determine the “most important”

web pages just from how they link.

matrix = (# web pages) × (# web pages)

(entry = 1 if they link, 0 otherwise)

Machine Learning

(6.3900 requires 18.06!)

12 of 25

Not just matrices of numbers

There are lots of surprising and important generalizations of the ideas in linear algebra.
Example: Instead of vectors with a finite number of unknowns, similar ideas apply to functions with an infinite number of unknowns.

Instead of matrices multiplying vectors, we can think about linear operators on functions

“A”

“x”

“b”

linear

operator

∇²

unknown

function

u(x,y,z)

right-hand side

f(x,y,z)

Poisson’s equation

(e.g. 18.303, 18.152)

13 of 25

18.06 vs. 18.C06

≈ 60% overlap

more linear algebra vs. more optimization

(more depth in linalg., (more depth in optim. algorithms,

e.g. in factorizations, problem types, convexity, examples;

vector spaces, eigenstuff; less depth in matrix/vector concepts)

…a little optimization)

more physics vs. more data science/ML

(linear ODEs, PDEs, (discrete data classification,

functions as “vectors”) data bias, robustness)

some computers vs. computer mini-projects

(depends on (some programming

instructor!) experience assumed!)

Linear Algebra and Optimization

Linear Algebra

(both fulfill same course

requirements, prereqs!

…course 6-3 suggests

18.C06 for sufficiently

advanced students)

(faster paced, fall only)

14 of 25

18.06 vs. 18.700

“applied” vs. “pure” math

few proofs vs. formal proofs expected

(deduce patterns (definitions to

from examples, lemmas to theorems

informal arguments) … training in proof writing)

more applications vs. more theorems

more concrete vs. more abstract

(but still more

abstract than

other 18.0x!)

some computers vs. only pencil-and-paper

(depends on

instructor!)

(see also 18.701 for students with much more proof experience)

15 of 25

Computer software

[ image: Viral Shah ]

Lots of choices:

julialang.org

This semester: a new-ish (2012) language

that scales better to real problems.

No programming required for 18.06,

just a “glorified calculator” to turn the crank.

Only on psets, not exams

Install it on your computer (preferred) or use it online — see tutorial link on github

Optional tutorial: Mon 5pm via Zoom

16 of 25

18.06 vs. 18.C06/18.700

furry friend vs. no furry friends?

“Cookie”

6-year old mini labradoodle

17 of 25

18.06 warm-up material

We don’t want to think of matrices as just “bags of numbers” that we push around with some algorithms. We want to think algebraically and geometrically.

Many of the algebraic rules (for scalar numbers) you have used for years no longer work (for vectors and matrices)! Blindly pushing symbols around will mostly lead to nonsense.

Pro tip: Always think of the shapes of matrices and vectors and keep track of the order of operations

— you can only add/multiply things if their shapes are compatible

— AB ≠ BA, and re-ordering is likely nonsensical for non-square objects.

18 of 25

Column vectors:

“in”

n-component column vector of real ( ℝ) numbers

or x = [x₁, x₂, …, x_n] — note commas

= column vector

basic operations on vectors:

• add x + y, subtract x – y,

• multiply by scalars αx

(lower-case Greek = scalar)

later in 18.06:

other kinds of “vectors” too!

… matrices and functions

can be “vectors”!

Dot (“inner”) products:

transpose:

columns become rows

row vector (spaces,

not commas!)

19 of 25

Linear operations on vectors

Linear operations Ax (≠ xA!):

take a vector x in and give vector Ax out, satisfying linearity:

A(x + y) = (Ax) + (Ay)
A(2x) = 2(Ax) (or any scalar)

i.e. “distributive law”

A(αx+βy) = α(Ax) + β(Ay)

examples:

identity operation:

scaling by α:

“non-square” operations:

3 inputs

2 outputs

{

“linear combination” of x and y

20 of 25

Linear or nonlinear? Examples

linear

nonlinear

linear

x₁

x₁/x₂

2x₁+1

nonlinear!

= linear + constant

“affine”

21 of 25

Matrices: Representing linear ops

“rows times columns” rule is just a way of writing down a linear operation

… not always the most natural way!

identity matrix I:

blanks = zeros

scaling by α:

“non-square” operations = rectangular matrices:

3 inputs

2 outputs

2 x 3 matrix

22 of 25

Matrices: Representing linear ops

a_i_j = row i, column j = output i, input j

( …or sometimes I’ll write A_i_j )

output i (y_i)

= “dot product” of

row i of A

with column x

23 of 25

Matrix multiplication = Composition

A(Bx) = take input x, do operation B, then do operation A

= (AB)x where AB = operation that does B, then A

≠ (BA)x or B(Ax) … associative but not commutative

Matrix-multiplication “rows times columns” rule for AB is defined so that associativity works, but always remember that “AB” means “do B, then A” when acting on a vector

We will explore several ways to think about matrix multiplication AB besides mechanical “rows times columns”…

Our goal is not just to mindlessly “turn the crank.”

24 of 25

Matrix-multiplication example

2x3 matrix B

“1x2 matrix” A

(1x2)*(2x3)

matrix that

does x₃ – x₁

… whereas BA doesn’t even make sense! A has 1 output, but B has 3 inputs!

= sum of 2 rows of B

(really, a row vector)

25 of 25

Adding & scaling matrices

What does “A + B” mean for two m × n matrices?

— it means adding the results: (A+B)x = Ax + Bx

— but adding entries of A and B gives the same thing:

What does “αA” mean for a matrix and a scalar?

— it means scaling the results: (αA)x = α(Ax)

— but scaling entries of A gives the same thing:

… so you can add/subtract matrices

& multiply by scalars … sort of like vectors…