Machine learning landscape
Ian @ Volvo
2018-09-13
1
Text books
2
Machine Learning != AI
3
Data Mining
4
Data mining is about finding patterns in data, typically so that one can explain some phenomenon.
Data mining is usually carried out by a person, in a specific situation, on a particular data set, with a set goal in mind.
Quite often, the data set is massive and complicated. Moreover, data mining procedures are either unsupervised, ‘we don't know the answer yet’ or supervised ‘we know the answer’.
As an example, in BADA, data mining was performed to find common congestion bottlenecks from 11 years of traffic data. Data mining techniques include cluster analysis, classification and regression trees, and neural networks.
Machine learning
5
Machine Learning uses algorithms to build models of what is happening behind processes to predict future outcomes.
What distinguishes these algorithms is that predictions based on the model improve as amount of data processed by the algorithm grows.
Machine learning involves the study of algorithms that can extract information automatically typically without online human guidance.
Training of ML algorithms benefits greatly from Big data. in BADA, we used machine learning for training a model to recognize which conditions lead to unexpected queue accumulation on road networks.
Common machine learning techniques include cluster analysis, classification and regression trees, and neural networks.
Classification of methods
6
| Supervised Learning | Unsupervised Learning |
Discrete data | Classification | Clustering |
Continuous data | Regression | Dimensionality Reduction |
Use case I : Accidents within Sweden
7
Clustering approach for showing vehicle safety in Sweden.
Use case II : Hazard warning location
8
Hazard warning: classification of streaming data
Use case III : prediction
9
Flow analysis using neural networks, to achieve regression.
Algorithms for data mining and machine learning
10
Association rules
11
If weather = warm
then slippery = normal
If slippery = normal and windy = false
then driving = safe
If visibility = clear and driving = safe
then weather = hot
If windy = false and driving = safe
then visibility = clear and slippery = false
Association rule: predicts values of arbitrary attribute (or a combination)
Statistical (prediction/regression)
12
Essentially regression is a prediction procedure. From Table 1, regression outputs continuous values.
In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships among variables.
The inputs are also known as predictors or features. So classification and regression are similar processes but with different types of output,
For example in a BADA context, the average speed through Gothenburg central.
Dimensionality reduction
13
Dimensionality reduction has been widely applied in many scientific fields.
Reduction techniques are used to lower the amount of data from the original dataset and thus leave only the statistically relevant components in the processed data.
Dimension reduction also improves the performance of classification algorithms, by removing noisy irrelevant data. In the ML community, dimensionality reduction is known as feature extraction.
Principal Component Analysis (PCA) is ubiquitous in dimensionality reduction, it is computationally efficient, and parameterless.
Case-based methods
14
Classification is the best known case-based methods.
It is a data mining function that assigns items in a collection to target categories or classes. The goal of classification is to accurately predict the target class for each case in the data.
The basic idea is that similar patterns belong to the same class. They are easy to train, as one has to just save every pattern seen.
The disadvantages include the model size increases with the number of examples seen and some notion of a distance (metric) is needed.
Artificial neural network
15
Inspired by the neural structure of the brain, neural networks (NN) are units connected by weights.
Artificial refers to the neurons not being biological. Weights are adjusted to produce a mapping between inputs and outputs.
In neural networks, one often sees the terms layers, where each layer represents and learns one feature. ‘Deep’ in deep learning typically refers to multiple layers, arranged in layers, so that the output of one layer is fed to the next one.
Typically, lower layers process simpler features, and higher layers more complex features.
Convolutional neural networks
16
Sometimes known as ConvNets, a convolutional neural network represents the data as a map.
Some notion of distance is needed between the points on the map.
An example is in the two images below, where the top figure left indicates each layer, of a vehicle. In the learned features, the first layer indicates a 93% probability of the image being a car.
The other features in this case are noise, and probably not a car. Note, some noise in almost always present in all real-world images.
Logic-based approach
17
Inductive logic programming (ILP) is a branch of machine learning which uses logic programming as a representation.
Examples, background knowledge and hypotheses can be represented in the language of choice. Prolog is a popular choice coding ILP-suitable problems.
With an encoding of the known background knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hypothesised logic program which entails all the positive and none of the negative examples.
Essentially, logical expressions are constructed to characterise the input classes. The theory of ILP is based on proof theory and model theory for the first order predicate calculus.
Heuristic search
18
Search is well established, research and deployment within computer science.
The idea is to search through a number of different models, or parameters in a model expression, to find something that matches the data and can be used during training of other machine learning models.
Algorithms, many stemming from optimisation, genetic algorithms and reinforcement learning, and simulated annealing are well known (was agents).
Typically one wants to find a maximum or minimum point, in a set of data without actually know the true lie of the data. In some cases one would like to find more than 1 maximum or minimum, e.g. “the maximum value and the minimum cost”.
Data issues
19
Data representation
20
How data is represented is an important facet in any system. If the system is to be designed from the start, a concise, standard, secure representation is usually a good starting point.
Concise due to processing millions of records, standardised so packages such as Python/Ruby/C++ can read and write them, and of course secure, which means the data stream can be encrypted when sent over an open network and secured end-to-end if not.
Large organisations go with non-standard solutions and formats, which makes using cheap, fast, open and importantly constantly evolving solutions, e.g. from Apache more difficult.
Algorithmic transparency
21
ML example
22
Example
23
Problem setting
24
Mcs sensors
25
Problem setting
26
Research Institutes of Sweden
Real traffic jams
Queues
27
Another illustration of queue accumlulation
1. Queue buildup
Front vehicles slow down, and causes the following to contract behind it, causing the density to increase.
2. Queue dissipation
Vehicles behind a traffic light, changes from green to red, and front ones move away, density lowers.
28
Do we need 12 years of data?
29
Research Institutes of Sweden
Ground truth
30
Research Institutes of Sweden
Results
31
Future : transparency
32
Problem with domain know.
Deep learning
Works
?
Y
N
Train again
Works
?
New model
Y
N
Research Institutes of Sweden
Summary
33
Uncovering black box approaches (BDVA)
Data cleaning (and readiness) is important�
Method choice important too
Luckily software situation is good and improving, many packages available.
Some basic maths, course and software experience helpful
AND DOMAIN KNOWLEDGE!!!