Correlation and Causality:
Case Studies
Prof. Shih-wei Liao
Reference:
Judea Pearl: The Book of Why
More References:
History of AI
Correlation ≠ Causality
X doesn’t cause Y
�
Z
X
Y
Z
X
Y
Causality ⇒ Correlation
Correlation ⇏ Causality
3 Variables: Heat, Ice cream, Sunglasses
Heat
Ice�cream
Sun glasses
“99% Correlation” != Causality
迴歸參數的效力: 自由心證
oftentimes
?
Wine
#fund managers
99%
Correlation ≠ Causality
Why Steve Jobs Doesn’t Poll:
Note: Decision science CANNOT just count on correlation.
Correlation ≠ Causality
Recap: Causality vs. Correlation:
Outline:
Domain Knowledge vs. �Data-Driven
Challenges in Causality Science:
Outline:
Machine Learning vs. Causality Science:
Mathematical Foundation of Causality:
Causality uses Bayesian Networks:
For example:
The joint probability function is:
Causality: Use Bayesian Networks
G = Grass wet (true/false), S = Sprinkler turned on (true/false), and R = Raining (true/false)�
What is the probability that it is raining, given the grass is wet?�
Challenges in Causality Science:
System Dynamics
The Three-Layer Causal Hierarchy
Structural Causal Models (SCM)
Extension: Multimodal Metadata Fusion Using Causal Strength
Outline:
Machine Learning uses Correlation or Causality?
Deep Learning uses Correlation or Causality?
Outline:
Outline:
Challenges & Opportunities: Causality
Challenge: Where Correlation is Effective
Outline:
Case Study: Beer & Diaper
Case Study: Medical
Multi-Armed Bandit
Case Study: Bitcoin Tracing
Bitcoin Tracing by causality
Unsupervised Analyzer
This presentation uses a free template provided by FPPT.com
www.free-power-point-templates.com
Unsupervised Analyzer
This presentation uses a free template provided by FPPT.com
www.free-power-point-templates.com
Unsupervised Analyzer
This presentation uses a free template provided by FPPT.com
www.free-power-point-templates.com
Unsupervised Analyzer
7
_
52
3
_
52
This presentation uses a free template provided by FPPT.com
www.free-power-point-templates.com
Supervised Analyzer
Retrieved from https://www.walletexplorer.com
This presentation uses a free template provided by FPPT.com
www.free-power-point-templates.com
Supervised Analyzer
This presentation uses a free template provided by FPPT.com
www.free-power-point-templates.com
Supervised Analyzer
This presentation uses a free template provided by FPPT.com
www.free-power-point-templates.com
Supervised Analyzer
This presentation uses a free template provided by FPPT.com
www.free-power-point-templates.com
Monitor
This presentation uses a free template provided by FPPT.com
www.free-power-point-templates.com
Conclusion
Backup
K-means Clustering
From: http://stanford.edu/~cpiech/cs221/img/kmeansViz.png
K-means Clustering
Input: A set of data points (x, y), and the number of clusters, K.
Output: The centers of the K clusters.
Algorithm:
K-means Clustering using MapReduce
"centers": Initialize the K cluster centers.
Mapper:
Reducer:
Repeat the above Mapper/Reducer steps, until convergence.
MapReduce Programming Model
Programmer specifies two functions:
Main data structure:
Mappers and Reducers
Mapper/Reducer Template:
list(out_key, intermediate_value)
list(out_value)
Mapper
Reducer
Mappers and Reducers: Observations
Example: Count Word Occurrences
Input: A set of words.
Output: Number of occurrences of each�word.
Mapper/Reducer implementation:
Mapper/Reducer template:
Supporting Parallelized and Distributed Processing
Mapper/Reducer Template:
list(out_key, intermediate_value)
list(out_value)
Mapper
Reducer
Parallel, Distributed Execution
Mapper
Reducer
Parallel, Distributed Execution
Combiner
Shuffler
Mapper
Reducer
Key components of MapReduce
Mapper:
Combiner (optional, in the same machine of a mapper):
Shuffler:
Reducer:
From Smart Explorer to Big Explorer
Big Data
Google Cloud
Smart Explorer: Machine learning-based data center optimization
See Google’s paper
Big Explorer: Machine learning-based Big Data 3S optimization
Tunables + Observables → Performance↑
Hardware
Prefetchers
+
CPI ?
Cache miss rate ?
Smart Explorer
Parameter Space Search Problem
E.g., with linear regression
t1 = a0 + a1 * o1 + a2 * o2 + a3 * o3 + … + am* om
Machine Learning!
Hardware Prefetcher
int a[], b[], c[];
for (i = 0; i < 1000000; i++) {
if (flag) a[i] = b[i] + c[i];
}
10000, 10004, 10008, 10012, …
Prefetch 10028, 10032, 10036, ...
Stride prefetcher
How to Apply Machine Learning (ML)
Applications
myprofile
smarty
Observables
Predicted Configuration
A Simplified Example
smarty: 3 Phases in Machine Learning
high CM and low BU → should prefetch!
Workflow of Model Construction
Workflow of Model Construction
Select representative programs as training dataset
1, 2, 3, 4
Workflow of Model Construction
Collect the observables with all possible combinations off-line
myprofile
Configuration Tuner
(CM1, BU1, Y), (CM1’, BU1’, N), (CM2, BU2, Y), (CM2’, BU2’, N),
(CM3, BU3, Y), (CM3’, BU3’, N), (CM4, BU4, Y), (CM4’, BU4’, N)
Workflow of Model Construction
Label the observables and feed them into the model for training
(CM1, BU1, Y), (CM1’, BU1’, N), (CM2, BU2, Y), (CM2’, BU2’, N),
(CM3, BU3, Y), (CM3’, BU3’, N), (CM4, BU4, Y), (CM4’, BU4’, N)
Workflow of Model Construction
Select machine learning algorithm and train the model
smarty
(CM1, BU1, Y)
(CM2, BU2, Y)
(CM3, BU3, N)
(CM4, BU4, N)
Workflow of Model Evaluation
Workflow of Model Evaluation
New program interested in obtaining good performance
5
Workflow of Model Evaluation
Collect the observables only once
myprofile
(CM5, BU5)
5
Workflow of Model Evaluation
Feed the collected observables into the model
smarty
(CM5, BU5, ?)
5
Workflow of Model Evaluation
Configuration Tuner
A guided configuration is generated for future runs
Y
5
(CM5, BU5)
In Short, Model Construction and Evaluation
Performance
Figure 3. Individual speedup for each program in the DCA benchmark.
From Smart Explorer to Big Explorer
Big Data
Google Cloud
Smart Explorer: Machine learning-based data center optimization
See Google’s paper
Big Explorer: Machine learning-based Big Data 3S optimization