1 of 51

Danbury AI

2 of 51

@AndrewJRibeiro | AndrewRib.com

Art has always been cherished as the most expressive and human production. The idea that a computer, a logical machine, can create the most quintessential human objects is preposterous to some. As anyone that has engaged in the artistic process will tell you, a lot of art is based on emotion, not logical rules. In this talk we will discuss the connectivist history leading to convolutional networks and their application in style transfer. I hope that the topics herein demonstrate to you that machine learning is a dramatic departure from rule based computing and that it does mimic intelligent behavior.

3 of 51

Overview

  • Machine Learning Background
    • Origin of Neural Networks
    • Neural Network Basics
    • Convolutions
    • Convolutional Neural Networks ( CNN )
    • Very Deep Convolutional Networks ( VGG )
  • Style Transfer Predecessors
    • Non-Photorealistic Rendering
    • Texture Transfer
  • A Neural Algorithm of Artistic Style
    • Introduction
    • The Underlying Loss Function
    • Algorithm Overview
    • Methods
    • Results
  • Implementation
    • Direct Tensor Flow implementation
    • Magenta: Multistyle Pastiche Generator

4 of 51

Origin of Neural Networks

The Connectivist Timeline

  • 1943: Threshold Logic ( McCulloch and Pitts )
  • 1954: Hebbian Networks ( Wesley A. Clark )
  • 1958: Perceptrons ( Frank Rosenblatt )
  • 1969: AI Winter ( Minsky and Papert )
  • 1974: Multi-Layer Perceptrons and Backpropagation ( Werbos )
  • 1990: Convolutional Neural Networks ( LeCun first runaway success )
  • 1997: Long Short-Term Memory Networks ( Hochreiter & Schmidhuber )
  • 2014: Generative Adversarial Networks ( Goodfellow et al. )

“Either the universe is composable or God exists.”

-I heard Yann LeCun paraphrase this quote

*An incomplete history

5 of 51

Mark 1 Perceptron

Frank Rosenblatt

The Neuron

Biological Inspiration

Harbingers of the AI Winter

6 of 51

Neural Network Basics

  • Multi-Layer Perceptrons are universal function approximators ( the universal approximation theorem )
  • Feed-forward: processing flows in one direction (input -> hidden layers -> output)
  • Inputs/Features: the select properties of a problem which have enough information to produce separable classes.
  • Hidden Layer: neurons that take a weighted input and produce an activation based on some threshold.
  • Output: interpreted in most cases as a classification label.
  • Objective Function: usually interpreted as a cost function which is used to evaluate how well our network has learned from the data.
  • Learning: Finding the edge weights of a network such that the activations produce an output in line with observed cases aka an optimization of the objective function.
    • Typically done with gradient descent

7 of 51

8 of 51

Regularization coefficient

Regularization

Cost function

K Classes - Multi-Class Classification

Essentially Multinomial Logistic Regression

Hypothesis Parameterized by Theta ( Feature Weights )

Learning as an optimization problem: Minimize theta of J( Theta )

Hypothesis function: A linear combination of the bias and weights on features.

9 of 51

Convolutions

Key Question: Why do ConvNets give us better accuracy for visual object recognition over the standard MLP?

Key Point: Convolutions, in the form we are interested in for ConvNets, compute new values of a matrix based on surrounding values. This gives us a means of producing higher abstractions of local structure.

  • The convolutional layers of a ConvNet are often interpreted as learned “filters” or kernels.
  • The first layer of a ConvNet is usually associated e efeff d d d d with edge filters.

“In order to make decisions, or make sense of things, we must reduce the input feature space to a lower dimension.”

Try performing image recognition with the atoms of the objects as the features.

10 of 51

Sobel Operator

3X3 Kernel Convolutions which approximate the derivatives -- x,y changes. Let A be the source img and Gx,Gy be the x,y derivative approx.

11 of 51

Convolutional Neural Networks ( CNN )

  • Good for processing data that has a grid-like topology. ( Where local relationships matter! )
    • Time Series
    • Visual Objects
  • Convolutional networks are simply neural networks that use convolution in place of general matrix multiplication in at least one of their layers.
  • Pooling
    • Summarizes the responses over a whole neighborhood.
    • The use of pooling can be viewed as adding an infinitely strong prior that the function the layer learns must be invariant to small translations.
    • Max Pooling takes the max of a region which is used in downsampling.

A lot of stuff on this slide was ripped from the deep learning book: http://www.deeplearningbook.org/

12 of 51

Pooling and Feature Maps

13 of 51

14 of 51

15 of 51

16 of 51

Very Deep Convolutional Networks ( VGG )

  • VGG refers to a particular configuration of a CNN that performed outstandingly in the competition ImageNet ILSVRC-2014.
  • ImageNet challenges
    • Object localization for 1000 categories.
    • Object detection for 200 fully labeled categories.
    • Object detection from video for 30 fully labeled categories.
    • Scene classification for 365 scene categories (Joint with MIT Places team) on Places2 Database http://places2.csail.mit.edu.
    • Scene parsing for 150 stuff and discrete object categories (Joint with MIT Places team).
  • Previous CNN’s weren’t close to being as deep.
  • In the style transfer algorithm we introduce later we use a VGG trained on object localization for 1,000 categories.
  • VGG is named after the Visual Geometry Group which produced the configuration.

17 of 51

18 of 51

STYLE VS. CONTENT

THE BIG IDEA

Figure out how to do this for different domains, and you will be the king/queen of ML.

19 of 51

Style Transfer Predecessors

  • There have been a few attempts prior with varying success.
  • Non-Photorealistic Rendering
    • First attempt at copying style. ( Was not learned - hardcoded )
    • Very limited. Usually relegated to some form of shading or simple filters ( like photoshop filters which are quite linear ).
  • Texture Transfer
    • In 2001 a paper Image Quilting for Texture Synthesis and Transfer was published and introduced the method of Image Quilting for texture transfer. The results, shown on the next slide, are impressive, but they come nowhere near the nonlinearity the neural model produces.
  • Previous approaches mainly rely on non-parametric techniques to directly manipulate the pixel representation of an image. In contrast, by using Deep Neural Networks trained on object recognition, we carry out manipulations in feature spaces that explicitly represent the high level content of an image.

20 of 51

Image Quilting

21 of 51

Image Quilting Vs. Neural Style Transfer

Style

Content

Image Quilting

Neural

Note: I took a screenshot from the paper to get the style,content, and image quilting result images. They were probably scaled down in the paper so we didn’t get great results, but it’s still illustrative.

content loss: 1.22706e+06

style loss: .659507

total loss: 1.89246e+06

22 of 51

A Neural Algorithm of Artistic Style

  • Published September 2015
  • As of December 5, 2016: it has around 120 citations in academic publications.
  • Uses a trained VGG.
  • Does dramatically better than previous attempts at style transfer.
  • Commercial products have already been created around it.

23 of 51

This is the composite loss function, consisting of the loss function for style and content, which we minimise to get our wonderful style transfer result.

Where:

  • Alpha: is the content weight factor
  • Beta: is the style weight factor
  • A: image style to be transferred
  • P: image content to be stylized
  • X: the output image which is initially random.

I posit: creativity is optimization with competing constraints

24 of 51

Vocabulary and Concepts

  • Content Representation: higher layers in a ConvNet trained on object recognition capture high-level representations of content in terms of their arrangement in the input image , but are abstracted from the actual pixel values.
  • To capture a representation of style, a feature space originally designed to capture texture information is used.
    • Built upon the filter responses in each layer of the network.
    • Computes feature correlations between different filter responses over the feature maps.
    • End result: A stationary, mutli-scale representation of the input image which captures texture but not the global arrangement.
  • Style Representation: in order to gain an understanding of style, we compute correlations between different features in different layers of the CNN ( discussed later ).
    • Style features produce texturised versions of the input image that capture its general appearance in terms of colour and localised structures.

“Extracting correlations between neurons is a biologically plausible computation that is, for example, implemented by so-called complex cells in the primary visual system”

25 of 51

We can visualise the information at different processing stages in the CNN by reconstructing the input image from only knowing the network’s responses in a particular layer. We reconstruct the input image from from layers ‘conv1 1’ (a), ‘conv2 1’ (b), ‘conv3 1’ (c), ‘conv4 1’ (d) and ‘conv5 1’ (e) of the original�VGG-Network.

26 of 51

Algorithm Overview

  • Images are synthesised by finding an image that simultaneously matches the content representation of the photograph and the style representation of the respective piece of art.
  • While the global arrangement of the original photograph is preserved,�the colours and local structures that compose the global scenery are provided by the artwork.
  • Style representation: a multi-scale representation that includes multiple layers of the neural network
  • Trained to perform one of the core computational tasks of biological vision, automatically learns image representations that allow the separation of image content from style.

27 of 51

28 of 51

  • Rows: matching the style representation on increasing subsets of the CNN layers.
  • Local image structures captured by style representation increase in size and complexity when including style features from higher layers of the network.
  • Due to increasing receptive field sizes and feature complexity along the network’s processing hierarchy.
  • Columns: different relative weightings between the content and style reconstruction. ( content weight / style weight )

Mixing Style and Content

29 of 51

Methods

  • Uses the feature space provided by 16 convolutional and 5 pooling layers of the 19 layer VGG network.
    • Does not use any of the fully connected layers.
    • Max-Pooling is replaced by average pooling to improve gradient flow and more appealing results.
  • Each layer in the network defines a non-linear filter bank which increases in complexity further down the network.
  • Now let’s jump into the equations.

30 of 51

Formulae at a Glance

31 of 51

Key Definitions

  • We take an input image ( p ) and run it feed forward through the trained VGG. This gives us a version of the image which is encoded in each layer of the VGG by the filter responses to p.
  • A layer with Nl distinct filters has Nl feature maps of size Ml, where Ml is the height * width of the feature map.
  • The responses in a layer l are stored in a matrix Fl ϵ R N_l * M_l where F lij is the activation of the ith filter at position j in layer l.
  • To visualise the image information that is encoded at different layers of the hierarchy (Fig 1, content reconstructions) we perform gradient descent on a white noise image to find another image that matches the feature responses of the original image.
  • p : The original image.
  • x : the image that is generated.
  • P l the feature representation of p in layer l.
  • F l the feature representation of x in layer l.

32 of 51

Content Representation

  • In order to figure out how much of the content of p is preserved in x, we define a squared error loss function which iterates over all layers and activations and compares the difference between the activations of the networks trained on p and x.
  • We find the derivative of the loss function with respect to the activations in layer l from which the gradient can be calculated via back-propagation.
  • Thus we can change the initially random image x until it generates the same�response in a certain layer of the CNN as the original image p.

33 of 51

Style Representation

  • On top of the CNN responses in each layer of the network we build a style representation that computes the correlations between the different filter responses.
  • These feature correlations are given by the Gram matrix Gl ϵ R N_l * N_l where Gil j is the inner product between the vectorised feature map i and j in layer l.
  • To generate a texture that matches the style of a given image, we use gradient descent from a white noise image to find another image that matches the style representation of the original image.
    • Done by minimising the mean-squared distance between the entries of the Gram matrix ( G ) from the original image and the Gram matrix of the image to be generated.

a is the original image. x is the image to be generated. Al and Gl is the style representations in layer l for a,x respectively

34 of 51

Style Representation ( 2 )

  • The loss function for style based upon the previous slide with the derivative of El with respect to the activations in layer l.
  • The gradients of El with respect to the activations in lower layers of the network can be computed with back propagation.
  • wl are the weighting factors of the contribution of each layer to the total loss.

See the paper for a discussion on what they experimentally found was good for the weighting factor wl for each convolutional layer.

35 of 51

Style Representation ( 3 )

36 of 51

Mixing Style and Content

  • To generate a mix of content and style, we jointly minimise the distance of a white noise image from the content representation of the photograph in one layer of the network and the style representation of the painting/image style source in a number of layer of the CNN.
  • Found experimentally.
  • Matched content representation layers: conv4_2
  • Matched style representations: conv1_1, conv2_1, conv3_1, conv4_1, conv5_1 ( with a weighting wl = 1/5 in those layers, wl = 0 in all other layers.
    • Different configurations give different results.

Where:

  • Alpha: is the content weight factor
  • Beta: is the style weight factor
  • A: image style to be transferred
  • P: image content to be stylized
  • X: the output image which is initially random.

37 of 51

Implementations

  • Magenta, a Google Brain project, is heavily engaged in how to use machine learning in art.
    • They have an environment you can install which gives you access to all the tools you need to perform style transfer and even some musical things ( like learning musical style ).
  • Products which have arisen from this algorithm:
    • DeepArt.io
    • Adobe Stylit
  • See the Resources and Sources section at the end of this presentation for a list of various implementations and variations of the algorithm.
  • There’s a nice implementation in Tensor Flow which I used in creating some assets for this presentation.

38 of 51

RESULTS

Look at the links

39 of 51

Discussion & Wrap Up

  • Is anything presented relevant to your work? Do you see using any projects or methods discussed here in your future or current projects?
  • Do you use deep learning? What NN architecture are you most interested in? CNN, RNN, MLP, GAN?
  • What frameworks and systems do you use for machine learning?

Thanks for coming!

Questions for you:

Questions for me?

By: Andrew Ribeiro of Knowledge-Exploration Systems

@kexpsocial

https://github.com/k-exp

WWW.KEXP.IO

Andrewnetwork@gmail.com

@AndrewJRibeiro

https://github.com/Andrewnetwork

AndrewRib.com

40 of 51

I REALLY WANT A TALK ON

Generative Adversarial Networks

Interested in giving one? Co-Author one with us?

41 of 51

Resources and Sources

Papers

42 of 51

Resources and Sources

43 of 51

Resources and Sources

44 of 51

Resources and Sources

  • VGG
    • https://www.youtube.com/watch?v=j1jIoHN3m0s&index=1&t=192s&list=FLMc_J9IiEHk1rFi-Sa2IFn
  • http://image-net.org/challenges/LSVRC/2016/

45 of 51

BEST Machine Learning Book:

http://www.deeplearningbook.org/

46 of 51

Unused Slides

47 of 51

Google Deep Dream

48 of 51

The Connectivist Ideas of Leibniz

TBA

49 of 51

50 of 51

51 of 51

Wavenet

  • Visual art isn’t the only field that is being investigated by Machine Learning researchers.