1 of 33

Machine learning landscape

Ian @ Volvo

2018-09-13

1

2 of 33

Text books

2

  1. The audience of this document is the technical person, but without specific knowledge in machine learning �
  2. Two books, i have read

3 of 33

Machine Learning != AI

3

  • Often confusion between ML and AI�
  • ML learns from data, pattern recognistion, time series, predict

  • But does not “learn” as from scratch (Tic-Tac-Toe from a war scenario) �
  • Material from “ Algorithms for Data Mining and Machine Learning in BADA” https://goo.gl/sXvwjm

4 of 33

Data Mining

4

Data mining is about finding patterns in data, typically so that one can explain some phenomenon.

Data mining is usually carried out by a person, in a specific situation, on a particular data set, with a set goal in mind.

Quite often, the data set is massive and complicated. Moreover, data mining procedures are either unsupervised, ‘we don't know the answer yet’ or supervised ‘we know the answer’.

As an example, in BADA, data mining was performed to find common congestion bottlenecks from 11 years of traffic data. Data mining techniques include cluster analysis, classification and regression trees, and neural networks.

5 of 33

Machine learning

5

Machine Learning uses algorithms to build models of what is happening behind processes to predict future outcomes.

What distinguishes these algorithms is that predictions based on the model improve as amount of data processed by the algorithm grows.

Machine learning involves the study of algorithms that can extract information automatically typically without online human guidance.

Training of ML algorithms benefits greatly from Big data. in BADA, we used machine learning for training a model to recognize which conditions lead to unexpected queue accumulation on road networks.

Common machine learning techniques include cluster analysis, classification and regression trees, and neural networks.

6 of 33

Classification of methods

6

Supervised

Learning

Unsupervised

Learning

Discrete

data

Classification

Clustering

Continuous

data

Regression

Dimensionality

Reduction

7 of 33

Use case I : Accidents within Sweden

7

Clustering approach for showing vehicle safety in Sweden.

8 of 33

Use case II : Hazard warning location

8

Hazard warning: classification of streaming data

9 of 33

Use case III : prediction

9

Flow analysis using neural networks, to achieve regression.

10 of 33

Algorithms for data mining and machine learning

10

  1. Association rules
  2. Statistical methods
  3. Case-based methods
  4. Artificial neural networks
  5. Logical-based methods
  6. Heuristic search

11 of 33

Association rules

11

If weather = warm

then slippery = normal

If slippery = normal and windy = false

then driving = safe

If visibility = clear and driving = safe

then weather = hot

If windy = false and driving = safe

then visibility = clear and slippery = false

Association rule: predicts values of arbitrary attribute (or a combination)

12 of 33

Statistical (prediction/regression)

12

Essentially regression is a prediction procedure. From Table 1, regression outputs continuous values.

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships among variables.

The inputs are also known as predictors or features. So classification and regression are similar processes but with different types of output,

For example in a BADA context, the average speed through Gothenburg central.

13 of 33

Dimensionality reduction

13

Dimensionality reduction has been widely applied in many scientific fields.

Reduction techniques are used to lower the amount of data from the original dataset and thus leave only the statistically relevant components in the processed data.

Dimension reduction also improves the performance of classification algorithms, by removing noisy irrelevant data. In the ML community, dimensionality reduction is known as feature extraction.

Principal Component Analysis (PCA) is ubiquitous in dimensionality reduction, it is computationally efficient, and parameterless.

14 of 33

Case-based methods

14

Classification is the best known case-based methods.

It is a data mining function that assigns items in a collection to target categories or classes. The goal of classification is to accurately predict the target class for each case in the data.

The basic idea is that similar patterns belong to the same class. They are easy to train, as one has to just save every pattern seen.

The disadvantages include the model size increases with the number of examples seen and some notion of a distance (metric) is needed.

15 of 33

Artificial neural network

15

Inspired by the neural structure of the brain, neural networks (NN) are units connected by weights.

Artificial refers to the neurons not being biological. Weights are adjusted to produce a mapping between inputs and outputs.

In neural networks, one often sees the terms layers, where each layer represents and learns one feature. ‘Deep’ in deep learning typically refers to multiple layers, arranged in layers, so that the output of one layer is fed to the next one.

Typically, lower layers process simpler features, and higher layers more complex features.

16 of 33

Convolutional neural networks

16

Sometimes known as ConvNets, a convolutional neural network represents the data as a map.

Some notion of distance is needed between the points on the map.

An example is in the two images below, where the top figure left indicates each layer, of a vehicle. In the learned features, the first layer indicates a 93% probability of the image being a car.

The other features in this case are noise, and probably not a car. Note, some noise in almost always present in all real-world images.

17 of 33

Logic-based approach

17

Inductive logic programming (ILP) is a branch of machine learning which uses logic programming as a representation.

Examples, background knowledge and hypotheses can be represented in the language of choice. Prolog is a popular choice coding ILP-suitable problems.

With an encoding of the known background knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hypothesised logic program which entails all the positive and none of the negative examples.

Essentially, logical expressions are constructed to characterise the input classes. The theory of ILP is based on proof theory and model theory for the first order predicate calculus.

18 of 33

Heuristic search

18

Search is well established, research and deployment within computer science.

The idea is to search through a number of different models, or parameters in a model expression, to find something that matches the data and can be used during training of other machine learning models.

Algorithms, many stemming from optimisation, genetic algorithms and reinforcement learning, and simulated annealing are well known (was agents).

Typically one wants to find a maximum or minimum point, in a set of data without actually know the true lie of the data. In some cases one would like to find more than 1 maximum or minimum, e.g. “the maximum value and the minimum cost”.

19 of 33

Data issues

19

20 of 33

Data representation

20

How data is represented is an important facet in any system. If the system is to be designed from the start, a concise, standard, secure representation is usually a good starting point.

Concise due to processing millions of records, standardised so packages such as Python/Ruby/C++ can read and write them, and of course secure, which means the data stream can be encrypted when sent over an open network and secured end-to-end if not.

Large organisations go with non-standard solutions and formats, which makes using cheap, fast, open and importantly constantly evolving solutions, e.g. from Apache more difficult.

21 of 33

Algorithmic transparency

21

  1. How do we know what the black box is doing?
  2. Who is responsible for the black box, ML?
    • Uber case
      • Software designer, driver, test engineer, pedestrian
  3. Simpler solutions usually preffered
  4. Can we detect biases in the data?
  5. Can we retrace the solution?
  6. Reason from it
    • Why did the ML do what it did

22 of 33

ML example

22

23 of 33

Example

23

  1. Select data�
  2. Build model�
  3. Compile and build�
  4. Run training data�
  5. Run validation (hyperparameters)�
  6. Test and validate

24 of 33

Problem setting

24

  1. We want to predict the traffic flow from one point to the next�
  2. To the situation down the road & in time
  3. Flow can speed, velocity and density

  • 2750 sensors around Stockholm�
  • Often more than one per section (fast and slow lanes)

25 of 33

Mcs sensors

25

  • Kind of encapsulate Stockholm�
  • The E4N

  • Example to the right�
  • 2750 sensors around Stockholm (Msc)�
  • Often more than one per lane

26 of 33

Problem setting

26

  • To predict the traffic behaviour in space and time
    • Which means along the road and the next time interval. In fact, traffic behaviour is aTB = f(space, time, weather, friction, concerts, rage, ...)�
  • Traffic behaviour could be flow, speed or density
  • Human decision making not part of this work

Research Institutes of Sweden

27 of 33

Real traffic jams

Queues

  • Build quickly
  • Dissipate slowly
  • Extreme example in video
  • Occur frequently, however

27

28 of 33

Another illustration of queue accumlulation

1. Queue buildup

Front vehicles slow down, and causes the following to contract behind it, causing the density to increase.

2. Queue dissipation

Vehicles behind a traffic light, changes from green to red, and front ones move away, density lowers.

28

29 of 33

Do we need 12 years of data?

29

  1. daily prediction needs info about hourly events, commuting & non-commuting hrs
  2. weekly prediction needs info about daily events, Mon-Fri and weekends (Sat & Sun)
  3. monthly predictions needs info about weekly events, working days holidays
  4. yearly prediction needs info about monthly events, summer or winter

Research Institutes of Sweden

30 of 33

Ground truth

30

  • Do we actually have a queue?
  • People may have different opinions
    • “Not normally this slow”
    • “There are so many cars”
    • “Speed has not past 30 km/h”
  • One method is to use cameras
  • This would allow
    • Human labelling (a queue)
    • Machine learn from correct labelling

Research Institutes of Sweden

31 of 33

Results

31

  • Prediction problem�
  • Want to predict congestion on Stockholm’s data up to 30 mins before�
  • Automatically and then compare to real situation (can be done via cameras, e.g. trafiken.nu)

32 of 33

Future : transparency

32

  • Explainable model
    • interpretable machine learning
      • Boström et. al�
    • If DL works then go back to the domain and say “solution available”�
    • Specifically DL -> Bayesian time series

Problem with domain know.

Deep learning

Works

?

Y

N

Train again

Works

?

New model

Y

N

Research Institutes of Sweden

33 of 33

Summary

33

Uncovering black box approaches (BDVA)

Data cleaning (and readiness) is important�

Method choice important too

Luckily software situation is good and improving, many packages available.

Some basic maths, course and software experience helpful

AND DOMAIN KNOWLEDGE!!!