1 of 33

Machine learning landscape

Ian @ Volvo

2018-09-13

http://www.curvingspace.com/tag/ai

2 of 33

Text books

The audience of this document is the technical person, but without specific knowledge in machine learning �
Two books, i have read

3 of 33

Machine Learning != AI

Often confusion between ML and AI�
ML learns from data, pattern recognistion, time series, predict

But does not “learn” as from scratch (Tic-Tac-Toe from a war scenario) �
Material from “ Algorithms for Data Mining and Machine Learning in BADA” https://goo.gl/sXvwjm

4 of 33

Data Mining

Data mining is about finding patterns in data, typically so that one can explain some phenomenon.

Data mining is usually carried out by a person, in a specific situation, on a particular data set, with a set goal in mind.

Quite often, the data set is massive and complicated. Moreover, data mining procedures are either unsupervised, ‘we don't know the answer yet’ or supervised ‘we know the answer’.

As an example, in BADA, data mining was performed to find common congestion bottlenecks from 11 years of traffic data. Data mining techniques include cluster analysis, classification and regression trees, and neural networks.

5 of 33

Machine learning

Machine Learning uses algorithms to build models of what is happening behind processes to predict future outcomes.

What distinguishes these algorithms is that predictions based on the model improve as amount of data processed by the algorithm grows.

Machine learning involves the study of algorithms that can extract information automatically typically without online human guidance.

Training of ML algorithms benefits greatly from Big data. in BADA, we used machine learning for training a model to recognize which conditions lead to unexpected queue accumulation on road networks.

Common machine learning techniques include cluster analysis, classification and regression trees, and neural networks.

6 of 33

Classification of methods

	Supervised Learning	Unsupervised Learning
Discrete data	Classification	Clustering
Continuous data	Regression	Dimensionality Reduction

7 of 33

Use case I : Accidents within Sweden

Clustering approach for showing vehicle safety in Sweden.

8 of 33

Use case II : Hazard warning location

Hazard warning: classification of streaming data

9 of 33

Use case III : prediction

Flow analysis using neural networks, to achieve regression.

10 of 33

Algorithms for data mining and machine learning

Association rules
Statistical methods
Case-based methods
Artificial neural networks
Logical-based methods
Heuristic search

11 of 33

Association rules

If weather = warm

then slippery = normal

If slippery = normal and windy = false

then driving = safe

If visibility = clear and driving = safe

then weather = hot

If windy = false and driving = safe

then visibility = clear and slippery = false

Association rule: predicts values of arbitrary attribute (or a combination)

12 of 33

Statistical (prediction/regression)

Essentially regression is a prediction procedure. From Table 1, regression outputs continuous values.

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships among variables.

The inputs are also known as predictors or features. So classification and regression are similar processes but with different types of output,

For example in a BADA context, the average speed through Gothenburg central.

13 of 33

Dimensionality reduction

Dimensionality reduction has been widely applied in many scientific fields.

Reduction techniques are used to lower the amount of data from the original dataset and thus leave only the statistically relevant components in the processed data.

Dimension reduction also improves the performance of classification algorithms, by removing noisy irrelevant data. In the ML community, dimensionality reduction is known as feature extraction.

Principal Component Analysis (PCA) is ubiquitous in dimensionality reduction, it is computationally efficient, and parameterless.

14 of 33

Case-based methods

Classification is the best known case-based methods.

It is a data mining function that assigns items in a collection to target categories or classes. The goal of classification is to accurately predict the target class for each case in the data.

The basic idea is that similar patterns belong to the same class. They are easy to train, as one has to just save every pattern seen.

The disadvantages include the model size increases with the number of examples seen and some notion of a distance (metric) is needed.

15 of 33

Artificial neural network

Inspired by the neural structure of the brain, neural networks (NN) are units connected by weights.

Artificial refers to the neurons not being biological. Weights are adjusted to produce a mapping between inputs and outputs.

In neural networks, one often sees the terms layers, where each layer represents and learns one feature. ‘Deep’ in deep learning typically refers to multiple layers, arranged in layers, so that the output of one layer is fed to the next one.

Typically, lower layers process simpler features, and higher layers more complex features.

16 of 33

Convolutional neural networks

Sometimes known as ConvNets, a convolutional neural network represents the data as a map.

Some notion of distance is needed between the points on the map.

An example is in the two images below, where the top figure left indicates each layer, of a vehicle. In the learned features, the first layer indicates a 93% probability of the image being a car.

The other features in this case are noise, and probably not a car. Note, some noise in almost always present in all real-world images.

17 of 33

Logic-based approach

Inductive logic programming (ILP) is a branch of machine learning which uses logic programming as a representation.

Examples, background knowledge and hypotheses can be represented in the language of choice. Prolog is a popular choice coding ILP-suitable problems.

With an encoding of the known background knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hypothesised logic program which entails all the positive and none of the negative examples.

Essentially, logical expressions are constructed to characterise the input classes. The theory of ILP is based on proof theory and model theory for the first order predicate calculus.

18 of 33

Heuristic search

Search is well established, research and deployment within computer science.

The idea is to search through a number of different models, or parameters in a model expression, to find something that matches the data and can be used during training of other machine learning models.

Algorithms, many stemming from optimisation, genetic algorithms and reinforcement learning, and simulated annealing are well known (was agents).

Typically one wants to find a maximum or minimum point, in a set of data without actually know the true lie of the data. In some cases one would like to find more than 1 maximum or minimum, e.g. “the maximum value and the minimum cost”.

19 of 33

Data issues

20 of 33

Data representation

How data is represented is an important facet in any system. If the system is to be designed from the start, a concise, standard, secure representation is usually a good starting point.

Concise due to processing millions of records, standardised so packages such as Python/Ruby/C++ can read and write them, and of course secure, which means the data stream can be encrypted when sent over an open network and secured end-to-end if not.

Large organisations go with non-standard solutions and formats, which makes using cheap, fast, open and importantly constantly evolving solutions, e.g. from Apache more difficult.

21 of 33

Algorithmic transparency

How do we know what the black box is doing?
Who is responsible for the black box, ML?

Uber case

Software designer, driver, test engineer, pedestrian

Simpler solutions usually preffered
Can we detect biases in the data?
Can we retrace the solution?
Reason from it

Why did the ML do what it did

22 of 33

ML example

23 of 33

Example

Select data�
Build model�
Compile and build�
Run training data�
Run validation (hyperparameters)�
Test and validate

24 of 33

Problem setting

We want to predict the traffic flow from one point to the next�
To the situation down the road & in time�
Flow can speed, velocity and density

2750 sensors around Stockholm�
Often more than one per section (fast and slow lanes)

25 of 33

Mcs sensors

Kind of encapsulate Stockholm�
The E4N

Example to the right�
2750 sensors around Stockholm (Msc)�
Often more than one per lane

26 of 33

Problem setting

To predict the traffic behaviour in space and time

Which means along the road and the next time interval. In fact, traffic behaviour is a�TB = f(space, time, weather, friction, concerts, rage, ...)�

Traffic behaviour could be flow, speed or density
Human decision making not part of this work

Research Institutes of Sweden

27 of 33

Real traffic jams

Queues

Build quickly
Dissipate slowly
Extreme example in video
Occur frequently, however

https://www.youtube.com/watch?v=Suugn-p5C1M

28 of 33

Another illustration of queue accumlulation

1. Queue buildup

Front vehicles slow down, and causes the following to contract behind it, causing the density to increase.

2. Queue dissipation

Vehicles behind a traffic light, changes from green to red, and front ones move away, density lowers.

Demo

29 of 33

Do we need 12 years of data?

daily prediction needs info about hourly events, commuting & non-commuting hrs
weekly prediction needs info about daily events, Mon-Fri and weekends (Sat & Sun)
monthly predictions needs info about weekly events, working days holidays
yearly prediction needs info about monthly events, summer or winter

Research Institutes of Sweden

30 of 33

Ground truth

Do we actually have a queue?
People may have different opinions

“Not normally this slow”
“There are so many cars”
“Speed has not past 30 km/h”

One method is to use cameras
This would allow

Human labelling (a queue)
Machine learn from correct labelling

Research Institutes of Sweden

31 of 33

Results

Prediction problem�
Want to predict congestion on Stockholm’s data up to 30 mins before�
Automatically and then compare to real situation (can be done via cameras, e.g. trafiken.nu)

32 of 33

Future : transparency

Explainable model

interpretable machine learning

Boström et. al�

If DL works then go back to the domain and say “solution available”�
Specifically DL -> Bayesian time series

Problem with domain know.

Deep learning

Works

Train again

Works

New model

Research Institutes of Sweden

33 of 33

Summary

Uncovering black box approaches (BDVA)

Data cleaning (and readiness) is important�

Method choice important too

Luckily software situation is good and improving, many packages available.

Some basic maths, course and software experience helpful

AND DOMAIN KNOWLEDGE!!!