A practical example of linear equations in the “column picture” or “row picture”, bigger than the 2x2 and 3x3 made-up examples we use for hand calculations:
Radial Basis Function
Interpolation
Prof. Steven G. Johnson
MIT course 18.C06/6.C06, Fall 2024
Given (“training”) data at scattered points in multiple dimensions, how can one interpolate smoothly in between the points?
image: hec.usace.army.mil
Example:
Given measured temperatures at a few weather stations, how can you interpolate the temperatures T(x) to locations in between the stations?
Radial Basis Functions (RBFs): one of many methods
distance r from data point xk
Φ(r): “radial basis function” RBF
e.g. Gaussian / bell curve
put some kind of smooth “bump” function at the location of each data point:
& interpolated T(x) = linear combination of the RBFs with some coefficients c:
c1
c2
c3
c4
c5
T(x)
each data point
has a “localized”
effect
cartoon
1d plot
“spread”
distance r
from xk
(note: RBF Φ does not actually need to be localized!)
What linear combination (= what c’s)
of basis functions
matches the given data points?
Linear equation Ac=b!
Columns of A = basis function values!
b = data values to match (e.g. temperatures)!
RBF coefficients: Linear equation Ac = b!
must match measured data (xj, Tj):
for j = 1,2, …, n
n equations in n unknowns (ck) !
=
=
rhs b
unknowns c
n columns = RBFs centered at n data points
n × n matrix A
n rows = constraints to match n data values
distance r
from xk
RBF coefficients: Linear equation Ac = b!
must match measured data (xj, Tj):
for j = 1,2, …, n
n equations in n unknowns (ck) !
=
=
rhs b
unknowns c
n columns = basis functions
centered at n points
n × n matrix A = AT
n constraints
distance r
from xk
Temperature interpolation example:
image: hec.usace.army.mil
Given measured temperatures at a few weather stations, use RBFs to interpolate the temperatures T(x) to locations in between the stations.
(x = 2-component coordinates)
(n = 5 equations)
Temperature interpolation example:
Given measured fictitious temperatures at a few weather stations, use RBFs to interpolate the temperatures T(x) to locations in between the stations.
(x = 2-component coordinates)
(n = 5 equations)
Temperature interpolation with Gaussian RBFs:
“spread” ~ R
Gaussian basis function
Spread R ~ data separation:
Nice smooth interpolation
(note: extrapolating outside data
⟶ nonsense)
Temperature interpolation with too-small Gaussian RBFs:
“spread” ~ R
Gaussian basis function
Spread R << data separation:
isolated “islands”
What if spread is too large?
RBFs are nearly constant
… A has nearly parallel columns
… nearly singular?
… seems bad?
[“ill-conditioned” — future 18.C06 topic!]
(In practice RBFs are usually combined with some low-degree polynomial basis functions like {1, x, y, xy} to be better behaved.)
Meta-question: Validation and choice of hyper-parameters?
How do we choose “hyper-parameters” like R (or Φ itself)?
How do we check (“validate”) that the interpolation is “good”?
Radial Basis Functions 2.0:
An Example of Underdetermined Systems
Adding More Basis Functions? (“Augmenting” the RBFs.)
We saw that we could “guess wrong” on the basis functions, e.g. by making them too localized.
Maybe we can fix it by adding more basis functions?
Maybe try polynomials, for example: {1,x,y,x2,y2,xy}
new model (fitting) function:
As matrices:
RBFs + polynomials, for example: {1,x,y,x2,y2,xy}
Gives an n × (n+6) system of equations
Underdetermined system (“wide” matrix): # constraints n < # unknowns n+6
… i.e. columns not independent (not really a “basis”)
… i.e. infinitely many solutions d … which to pick?
Most common solution for RBF + polynomials:
For M = [A P] consisting of the square RBF part (A) and polynomial part (P), the solutions d = [c; p] consist of the RBF coefficients c and polynomial coefficients p.
Another common way to get a unique solution to the overdetermined problem is to require PTc = 0: make the RBF coefficients orthogonal to the polynomials at the xk.
This gives a square system:
Alternative: minimum-norm solution:
A “wide” (underdetermined) problem Md=b has infinitely many solutions (N(M) ≠ {0}).
There are many application-specific strategies to pick one, but a useful common choice is to pick the one that minimizes ‖d‖: the minimum-norm solution.
ℝⁿ
all solutions d to Md=b
(“affine subspace” = subspace + shift)
N(M)
minimum-norm solution d⋆
Minimum-norm solution (for full row rank):
d⋆ = Mᵀ(MMᵀ)⁻¹b ( = M \ b in Julia for “wide” M)
Here, it makes sense! Make coefficients as small as possible … avoid huge terms that happen to cancel at the training points!
Temperature interpolation example:
“spread” ~ R = 10mi
too-localized
Gaussian basis functions
add {1,x,y,x2,y2,xy},
& find
minimum-norm coefficients
looks sensible!
Geometry of minimum-norm solution to Ax=b:
ℝⁿ
all solutions x to Ax=b
(“affine subspace” = subspace + shift)
N(A)
minimum-norm solution
x⋆ ⟂ N(A) ⟺ x⋆ ∈ C(Aᵀ)
Why does this give our formula:
x⋆ = Aᵀ(AAᵀ)⁻¹b ?
( = A \ b in Julia for “wide” A)