1 of 45

A Brief Introduction to AI

2 of 45

Who am I?

Simon Bernhard

  • Teaching myself AI for the past 5+ years

  • Currently completing a Master in Computer Science at Schaffhausen Institute of Technology

  • Working as a PLC developer at Bibliotheca

3 of 45

Some of my recent AI projects

  • Predicting the stock market
  • Style transfer
  • Image generation using GANs and Diffusion
  • Music production using symbolic AI

4 of 45

What can AI even do?

5 of 45

6 of 45

7 of 45

8 of 45

The history of AI

  • The pursuit of AI developed concurrently with Computer Science

  • Connectionism vs. Symbolic AI
    • Symbolic AI
      • Build AI systems based on rule-based manipulation of symbols
      • Heuristic searching, Expert systems
    • Connectionism
      • Build AI systems based on networks of simple interconnected nodes
      • Artificial neural nets

9 of 45

The history of AI

  • First AI winter 1967–1977
  • Rise of Expert Systems
    • Concurrently, research continued on Connectionist approaches
  • Second AI winter 1988-1993
  • 2012 onwards - Machine Learning Golden Age

10 of 45

Symbolic Artificial Intelligence

Did it fail?

It did lead to two AI winters, but NO.

It gave us amazingly useful things like:

  • Garbage collection
  • Dynamic typing
  • Higher-order functions
  • Recursion
  • Conditionals

11 of 45

AI besides Machine Learning

12 of 45

Support Vector Machines

  • Draws a hyperplane which divides data.
  • Chooses the dividing line which maximizes the distance from the nearest points
  • Can be used for classification, regression
  • There are both linear kernels and non-linear kernels

13 of 45

Clustering

  • Clustering algorithms group data based on certain criteria
  • Used for classification
  • There are a lot of different algorithms and different implementations
    • Which implementation is best depends on what the data looks like and what the end goal is

14 of 45

Clustering

Centroid-based Clustering (K-means)

Density-based Clustering

Distribution-based Clustering

Hierarchical Clustering

15 of 45

Decision trees

  • Decision trees are trees that progressively divide the data into smaller and smaller groups until the data can’t be divided more

  • There are several ways to divide the trees:
    • Positive Correctness
    • Gini impurity
    • Information gain

  • Trees are easy to interpret (white box)
  • Trees can work on diverse types of data
  • Trees are also great candidates for ensemble learning

16 of 45

Ensemble learning (Bagging, Boosting)

  • Ensemble learning is essentially running multiple instances of an algorithm and taking the combined answer.
  • Bagging
    • The dataset the algorithms train on is a random equally-sized subset of the original dataset, with replacement
    • Each instance is trained on a different dataset
    • The answers of each instance are averaged
  • Boosting
    • Similar to Bagging, but instead of creating datasets randomly, datasets are weighted based on misclassified data from earlier instances

17 of 45

Ensemble learning (Boosting)

18 of 45

Machine Learning

19 of 45

What is a Neural Net?

  • An input layer
  • A hidden layer
  • An output layer
  • Between each layer is an array of weights

20 of 45

Deep Neural Nets

  • A neural net with more hidden layers

21 of 45

How does a Neural Net learn?

Backpropagation!

  1. The neural net makes a prediction of the data
  2. The error in the prediction is converted into loss by a loss function
  3. This loss is sent to an optimizer which calculates the changes to the weights needed to minimize the loss
  4. This changes are then sent backwards along the gradients of the neural net, correcting the weights

  • The optimizer will not completely correct the loss, but only take a small step towards improving it
  • The end goal is to slowly move towards the optimal set of weights

22 of 45

What are problems with DNN

  • Training is difficult
    • Lots of connections means lots of GPU memory and processing power is necessary
    • Lots of data is needed
    • Could overfit or underfit
    • Lots of weights may not be used properly
  • There are solutions to some of the problems
    • Tuning hyperparameters
    • Dropout
    • Different types of networks
    • Creating train, test, validation sets of the data

23 of 45

Convolutional Neural Nets (CNNs)

  • Very popular for networks dealing with images
  • A DNN would have a complete connection of all the pixels in an image
    • This leads to a massive and cumbersome network
  • A CNN learns convolutions to run over the image instead
    • Smaller network
    • Easier to train
  • CNNs where some of the first neural networks to outperform humans

24 of 45

Convolutional Neural Nets (CNNs)

25 of 45

Convolutional Neural Nets (CNNs)

26 of 45

Recurrent Neural Nets (RNNs)

  • In sequential data, the sequence itself is often more important than the data at each point
  • RNNs attempt to provide address this by providing the network with a kind of memory
  • This provided excellent results for things like word processing and numeric regression
  • Training can be difficult because of vanishing gradient

27 of 45

Long Short Term Memory Networks (LSTMs)

  • To address the shortcomings of RNNs, LSTMs were introduced
  • LSTMs use a series of gates to decide what to store
  • This allows for much longer memory
  • The downside is added complexity

28 of 45

Attention Networks

  • Attention networks are aimed at simulating how human attention works.
  • It looks at the entire input at the same time and focuses on areas of interest
  • The areas of interest are weighed more by the network
  • Attention Networks can be used for all sorts of applications from images to understanding text

29 of 45

Attention Networks

30 of 45

Transformers

  • Transformers are a type of neural network architecture that can be made up of different kinds of layers
  • They are defined by the idea that there is an input network and an output network
  • The input transform the data into some form that the output can use

31 of 45

32 of 45

Autoencoder

33 of 45

Generative Adversarial Networks (GANs)

  • A GAN is really two networks
    • A generator
    • A discriminator
  • The generator generates content
  • The discriminator discriminates if the content is real or generated by the generator
  • Can be difficult to train because you are training two networks and they have to be balanced

34 of 45

Diffusion Learning

  • The most recent AI hotshot
  • The network is trained to generate content from noise
  • To train, gradually add noise to a image
  • Then use a neural net to convert noise back into an image
  • After it is trained, we can input pure noise or an existing image
  • Most popular for generating images, upscaling images

35 of 45

Diffusion Learning

36 of 45

Diffusion Learning

37 of 45

How can you apply Machine Learning?

38 of 45

When does Machine Learning make sense?

  • Machine Learning is just a really complicated statistics algorithm
  • Does it make sense to solve the problem with statistics?
    • Image Recognition
      • Yes
    • Translation
      • Yes
    • Image Generation
      • Maybe
    • Music Generation
      • Maybe
    • Raising children
      • No
    • Organizing a kitchen
      • No

39 of 45

Get the data

  • This is the most important step
  • Most time consuming step
    • Collect data
    • Clean data
    • Preprocess data
    • Feature engineering
  • Mistakes here can be disastrous
    • Leaking data in the training set

40 of 45

Decide on an architecture

  • Many factors affect this
    • Training data
    • Desired output
    • Available training time
    • Available evaluation time
    • Available hardware resources

41 of 45

Training

  • Data is split into training, test, and often evaluation data sets
  • Data is preprocessed and split into batches
    • Batches make training more smooth and help reduce gpu memory requirments
  • Network is trained on the training data
  • Network progress is tested on the evaluation data set
  • Finally, the test data is used to test the real world usefulness of the network

42 of 45

Tune hyperparameters

  • Re-evaluate the architecture
  • Tune the hyperparameters
    • Grid search
    • Random search
    • Gradient based
    • Bayesian
    • etc.

43 of 45

Pitfalls

  • Overfitting/Underfitting
    • Network is too specific or too general
    • More data
    • Early stopping
  • Network is Exploding/Vanishing gradient
    • Activation function
  • Training data is not representative of test data
  • Hardware
    • Long training times, not enough gpu memory

44 of 45

Transfer Learning - The shortcut to success

  • Training neural networks is really expensive
  • Fortunately, lots of companies release pre-trained networks for free
  • These networks can be retrained for your needs
    • Most often only some layers are trained
    • Saves lots of time and money

45 of 45

Where should you get started?

  • Google Colab
    • A online python notebook with free gpus
    • A great way to get started without much investment
  • 🤗Hugging face is a great company which is trying to make AI for everyone
  • Lots of online resources