1 of 37

Show Me A Function: More Than Meets The Eye

by

Chinedu Eleh

(Advisor: Dr. Hans Werner van Wyk)

1

2 of 37

Definition of Function

  • A function from a set to a set assigns to each element of exactly one element of

3 of 37

Using Simple Functions to Build Complex Ones

  • Functions we learn in precalculus, calculus, etc
    • polynomials
    • exponential
    • trigonometric
    • inverse
    • composite functions, etc

3

4 of 37

Polynomial Functions

4

  • Spline Interpolation
  • Finite elements
  • ReLU activation function

5 of 37

Spline Basis

Let be the indicator function of .

5

6 of 37

Spline Interpolation

6

7 of 37

Finite Element Basis

  • Finite elements method built on similar idea of spline basis
  • Basis could be constant, linear, quadratic, or higher order piecewise poly

7

8 of 37

Finite Element Approximation

8

  • Goal is to solve boundary value problems, say
  • Discretization of the form

is assumed, where , spanned by , is a finite dimensional approximation of the unknown infinite dimensional space.

Differential Form

Weak Form

Discretization

Linear System

9 of 37

ReLU Activation

9

  • The ReLU activation function is defined as
  • It can create sufficient nonlinearities in neural network layers to learn virtually any mapping

10 of 37

Neural Networks

10

  • Made up of composition of affine functions with activations creating nonlinearity where necessary
  • Can take any tensor input (CNN, RNN, VAE, etc)
  • is one of sigmoid, tanh, ReLU, linear, etc functions
  • is mostly linear, sigmoid, softmax depending if a regression, binary classification, or multiclass classification problem

11 of 37

Trigonometric Functions

  • Very well applicable in
    • Fourier transform
    • Activation functions (sinc)
    • Anywhere periodicity is desired

11

12 of 37

Exponential Functions

  • Exponential growth and decay
  • Density
  • Kernels (SVM, RKHS, covariance)
  • Activation functions (sigmoid, softmax)
  • Wavelets

12

13 of 37

Inverse Functions

  • Activation functions (arctan)
  • Loss function (e.g. log in cross entropy, KL divergence)

13

14 of 37

Function Discovery

  • Three main ways of discovering new functions
    • Calculus of variations
    • Statistics
    • Differential equations
  • Calculus of variations date back to Euler and Lagrange, statistical methods and differential equations have rich history as well, but part of state of the art

14

15 of 37

Calculus of Variations

15

A

B

Length of curve:

Area to be maximized:

  • The Classical Isoperimetric Problem: Determine a curve with a length of that connects points A to B, such that when combined with the line segment AB, forms the largest possible enclosed area.

16 of 37

Euler-Lagrange Equations

16

  • The maximizer of the constrained optimization is a section of the circle

which is a solution to the Euler-Lagrange (differential) equation

  • The isoperimetric problem is solved by a function.

17 of 37

More on Calculus of Variations

  • The arclength problem
  • Brachistochrone problem
  • Fermat’s principle
  • Shape of a hanging rope

17

18 of 37

Statistical Methods

18

Find equation of the line which passes through the points: and (slido only)

19 of 37

Question (slido only)

As a mathematician, in one sentence, describe

19

20 of 37

A Line Through Points?

20

  • Given a pencil, a ruler, and a pair of compass
  • Draw many circles and measure
    • the circumference
    • the radius
  • What is ?
  • Before was discovered, nobody knew is constant
  • However, given a circle, one could easily measure its radius and circumference

Abundance of Data

21 of 37

Yes, Using Tools from Linear Algebra

21

How to Solve

?

22 of 37

Moore-Penrose Inverse (Newton Method)

22

  • A dual formulation of the minimization problem is

the maximum likelihood estimate, where

and has 1 in its first dimension.

  • The Moore-Penrose Pseudo Inverse satisfies as a minimizer of the optimization problem

23 of 37

A Line Through Points

23

24 of 37

Moore-Penrose Inverse Sensitive to Outliers

24

25 of 37

Methods of Solving Least Squares Problems

Linear Least Squares:

  • Moore-Penrose Inverse
  • Newton Method

Nonlinear Least Squares:

  • Gradient Descent
  • Gauss-Newton Method
  • Levenberg-Marquardt method
  • Stochastic Gradient Descent

25

26 of 37

Remarks

  • Parallax errors can be minimized by statistical averages, but pose uncertainties in measurements
  • In heterogeneous media such as composites, geological media, gels, foams, and cell aggregates, these uncertainties could take any distribution, and in fact, could be undetermined useful material properties
  • An accurate description of a measured value would as well characterize uncertainties in the obtained value

26

  • Errors encountered in estimation are mostly parallax error

27 of 37

Differential Equations

  • The Euler-Lagrange equation is a differential equation
  • Rates are ubiquitous in day to day life
    • speed
    • acceleration
    • reaction rate
    • power
    • inflation rate
    • tax rate
    • unemployment rate
    • birth rate
    • interest rate
    • marginal
  • More rates from Newton’s laws and conservation laws in the natural and physical sciences

27

28 of 37

Functions From Differential Equations

28

  • Consider the simple elliptic equation

where could be

  • Young’s modulus of a material
  • Absolute permeability of rocks

29 of 37

Data Driven Modeling

29

  • For any of these problems, is never known but are easily, and in most cases, cheaply obtained
  • Finding from data is called data driven modeling
  • In certain community, is discovered through inverse problems
  • In general, may be heterogeneous

30 of 37

Regression

30

  • Close your eyes to the physical law and fit

where ‘ s are elements of any suitable basis known to the researcher such as

  • Suffers
    • inductive bias
    • futile adventure if solution lives outside the span of ‘s
    • prior knowledge of physical laws are not exploited
  • If solution lives in a subspace of the span of ‘s, techniques such as PCA are used to handle collinearity and dimensionality reduction

31 of 37

Weak Form (FEM)

31

  • Let . Multiplying the differential form by and integrating by parts gives

32 of 37

Finite Element Approximation - Revisit

32

  • Let be a finite dimensional subspace of in which we seek an approximate solution of the form
  • Within the Galerkin framework, we assume . So

simplifying to where and

33 of 37

Numerical Experiments

33

  • With , the load and 30 elements

34 of 37

Neural Networks - Experiments

34

  • We train a network of 3 fully connected layers, at 1000 sampled points, ReLU activation at the first two layers, MSE loss
  • Adam optimizer, learning rate of 0.0001

35 of 37

Convergence Requires High Epoch

35

36 of 37

Summary

  • We presented a brief trajectory of functions in mathematics, from Euler-Lagrange to the state of the art machine learning models
  • Gave insight on where the “least of the leasts” are applied in day to day life
  • Showed how functions are discovered from data via statistics and differential equations
  • Made connections between statistics, differential equations and calculus of variation
  • Whether you are interested in pure or applied mathematics, you are stuck with functions
  • The next time you think of pressing a button to get you a cup of coffee, I challenge you to think about the function behind the scene, no functions, no automation

37 of 37

Thanks for your attention