1 of 32

Knowledge-guided �Data Science

Overview and Case Studies

2 of 32

Outline

  • Motivation for Knowledge-guided Data Science
  • Knowledge Sources and Representations
  • Fusing Knowledge with Data: Case Studies

3 of 32

Data-driven Methods vs Knowledge-driven Methods

  •  

a

b

c

(3, 4)

(6, 8)

……

regression model f

5

10

……

4 of 32

Knowledge-guided Data Science

  • Fusing data-driven methods with knowledge
  • Why?
    • Data-driven methods tend not to work well on small data.

    • Data-driven methods might not meet constraints such as dictated by natural laws, or given through regulatory or security guidelines.

    • Complex data-driven methods tend to have poor interpretability.

[1] Laura von Rueden, et al. "Informed Machine Learning--A Taxonomy and Survey of Integrating Knowledge into Learning Systems." arXiv preprint arXiv:1903.12394 (2019).

5 of 32

Sources of Knowledge

  • Natural sciences: universal laws of physics, bio-molecular descriptions of genetic sequences, material-forming production processes…

  • Social Sciences: effects in social networks, the syntax and semantics of language…

  • Expert Knowledge: working experience of an expert…

  • World Knowledge: common sense

[1] Laura von Rueden, et al. "Informed Machine Learning--A Taxonomy and Survey of Integrating Knowledge into Learning Systems." arXiv preprint arXiv:1903.12394 (2019).

6 of 32

Representations of Knowledge

  • Algebraic equations: mainly from natural science, can also come from expert knowledge
  • Logic rules: mainly from world knowledge, can also come from natural and social sciences
  • Simulation results: mainly from natural sciences
  • Differential equations: mainly from natural sciences
  • Knowledge graphs: mainly from world knowledge, can also come from natural and social sciences
  • Probabilistic relations: mainly from expert knowledge, can also come from natural science and world knowledge
  • Invariances: mainly from natural science, can also come from world knowledge
  • Human feedback: mainly from expert knowledge, can also come from world knowledge

[1] Laura von Rueden, et al. "Informed Machine Learning--A Taxonomy and Survey of Integrating Knowledge into Learning Systems." arXiv preprint arXiv:1903.12394 (2019).

7 of 32

Fusing Knowledge with Data

Training data

Model architecture, hyperparameters, etc.

Cost function, learning algorithm, etc.

Trained model

preprocess

define

execute

get

  • Enrich the dataset with prior knowledge

  • Use knowledge to guide the selection of model architecture, hyperparameters, etc.

  • Integrate knowledge into the cost function, etc.

  • Evaluate and refine a trained model with knowledge

[1] Laura von Rueden, et al. "Informed Machine Learning--A Taxonomy and Survey of Integrating Knowledge into Learning Systems." arXiv preprint arXiv:1903.12394 (2019).

8 of 32

Fusing Knowledge with Data

Training data

Model architecture, hyperparameters, etc.

Cost function, learning algorithm, etc.

Trained model

preprocess

define

execute

get

  • Enrich the dataset with prior knowledge

  • Use knowledge to guide the selection of model architecture, hyperparameters, etc.

  • Integrate knowledge into the cost function, etc.

  • Evaluate and refine a trained model with knowledge

[1] Laura von Rueden, et al. "Informed Machine Learning--A Taxonomy and Survey of Integrating Knowledge into Learning Systems." arXiv preprint arXiv:1903.12394 (2019).

9 of 32

Generating New Training Data

  • Case: smart grid monitoring
  • Challenge: Historical data cannot cover all possible scenarios
      • E.g. How would the electricity load change if we were to sever a cable for maintenance?
  • Solution: Use expert knowledge to simulate all possible scenarios and the causality of events in such scenarios.

[1] Thomas Hartmann, Assaad Moawad, François Fouquet, Grégory Nain, Jacques Klein, Yves Le Traon, Jean-Marc Jézéquel: Model-Driven Analytics: Connecting Data, Domain Knowledge, and Learning. CoRR abs/1704.01320 (2017)

10 of 32

Data Compression

  • Case: smart grid monitoring
  • Challenge: Can’t efficiently handle massive raw data collected by sensors in the grid.
  • Solution: Compress the raw data with polynomial regression by expert knowledge; store only the polynomial coefficients.

[1] Thomas Hartmann, Assaad Moawad, François Fouquet, Grégory Nain, Jacques Klein, Yves Le Traon, Jean-Marc Jézéquel: Model-Driven Analytics: Connecting Data, Domain Knowledge, and Learning. CoRR abs/1704.01320 (2017)

11 of 32

Dividing The Training Set

  • Case: finding faulty parts for truck engines
  • Challenge: high intra-class heterogeneity due to different engine models; some parts can only be found on engines of specific series/types/models.
  • Solution: Using knowledge on engine model division to divide the training set; train an individual classifier for each engine series/type/model.

[1] Vitali Hirsch, Peter Reimann, Bernhard Mitschang: Exploiting Domain Knowledge to address Multi-Class Imbalance and a Heterogeneous Feature Space in Classification Tasks for Manufacturing Data. Proc. VLDB Endow. 13(12): 3258-3271 (2020)

12 of 32

Knowledge as Model Input

  • Case: tunnel construction work progress identification
  • Knowledge: knowledge on the equipment used in each work type
  • Under a Bayesian model, use computer vision methods to identify the equipment, inject the knowledge in to the model as probabilistic relation between the work processes and the equipment.
    • E.g. Suppose we have identified equipment A. By domain knowledge we know A is usually used in work type B, thus the current work type is likely to be B.

[1] Renjie We, Yuji Fujita, Kenichi Soga. "Integrating domain knowledge with deep learning models: An interpretable AI system for automatic work progress identification of NATM tunnels." Tunnelling and Underground Space Technology 105 (2020): 103558.

13 of 32

Fusing domain-specific features with learned features

  • Case: cardiac arrhythmia detection
  • Domain-specific feature: RR-intervals of ECG signals
  • Concatenate the domain-specific feature with the output of a hidden layer in the deep neural network.

[1] Genshen Yan, Shen Liang, Yanchun Zhang, Fan Liu: Fusing Transformer Model with Temporal Features for ECG Heartbeat Classification. BIBM 2019: 898-905

14 of 32

Generating New Features

  • Case: fault diagnosis for wind turbines
  • Based on the directly observable features, generate new features by knowledge.
    • E.g. Inside the turbine resides two temperature sensors. Normally their readings should not differ much. Thus, we can use the difference between the two readings as a new feature.

[1] R. Lily Hu, Kevin Leahy, Ioannis C. Konstantakopoulos, David M. Auslander, Costas J. Spanos, Alice M. Agogino: Using Domain Knowledge Features for Wind Turbine Diagnostics. ICMLA 2016: 300-307

15 of 32

Generating Pseudo Labels

  • Case: self-supervision for bearing fault diagnosis
    • Self-supervision: Instead of using labels related to the actual learning task, use the unlabeled data itself to create pseudo supervised learning tasks with pseudo labels. Learn features for the data via these tasks so they can be transferred to the actual learning task later. This is useful when actual labels are scarce.
  • Knowledge-based pseudo Labels: use domain knowledge to select metrics the reflect the working conditions of the bearings, use these metrics as pseudo labels to guide a deep network to learn features.

[1] Feng Jian, Yao Yu, Lu Senxiang, Liu Yue. Domain knowledge-based deep-broad learning framework for fault diagnosis[J], IEEE Transactions on Industrial Electronics, DOI: 10.1109/TIE.2020.2982085

16 of 32

Fusing Knowledge with Data

Training data

Model architecture, hyperparameters, etc.

Cost function, learning algorithm, etc.

Trained model

preprocess

define

execute

get

  • Enrich the dataset with prior knowledge

  • Use knowledge to guide the selection of model architecture, hyperparameters, etc.

  • Integrate knowledge into the cost function, etc.

  • Evaluate and refine a trained model with knowledge

[1] Laura von Rueden, et al. "Informed Machine Learning--A Taxonomy and Survey of Integrating Knowledge into Learning Systems." arXiv preprint arXiv:1903.12394 (2019).

17 of 32

Guiding Model Selection

  • Case: Modelling power of water pumps and cooling tower fans in a chiller plant
  • Knowledge: Power is proportional to the cube of shaft speed.
  • By this knowledge, we can simply construct a polynomial regression model, rather than a sophisticated deep learning model.

[1] Hoang Dung Vu, Kok-Soon Chai, Bryan Keating, Nurislam Tursynbek, Boyan Xu, Kaige Yang, Xiaoyan Yang, Zhenjie Zhang: Data Driven Chiller Plant Energy Optimization with Domain Knowledge. CIKM 2017: 1309-1317

18 of 32

Guiding Model Design

  • Case: Image Classification
  • Knowledge: Knowledge encoded in a knowledge graph.
  • We can design a hybrid model of convolutional neural network (CNN) which is suitable for computer vision, and fuse it with a graph neural network (GNN) to exploit the knowledge embedded in the knowledge graph.

[1] Kenneth Marino, Ruslan Salakhutdinov, Abhinav Gupta: The More You Know: Using Knowledge Graphs for Image Classification. CVPR 2017: 20-28

  • CNN: identifies cars, people and a stop sign in the image;

  • GNN: analyzes the knowledge graph to determine that the image shows a cross walk.

19 of 32

Knowledge As The Main Model

  • Case: power demand forecasting for smelting process
  • Knowledge: Mechanism Model for Demand Forecasting
  • Use the mechanism model as the main model, while using a data-driven method to estimate some of its parameters.

A hybrid of data- and knowledge-driven parameter estimation modules

Mechanism model

[1] Jie Yang, Tianyou Chai, Chaomin Luo, Wen Yu: Intelligent Demand Forecasting of Smelting Process Using Data-Driven and Mechanism Model. IEEE Trans. Ind. Electron. 66(12): 9745-9755 (2019)

20 of 32

Knowledge as Part of The Model

  • Case: quality control for electrochemical micro-machining
  • Knowledge: four important intermediate variables that are linearly related to the input.
  • Embed these four variables as the first hidden layer of a neural network. The mappings between the input and this layer are linear. This layer is followed by standard data-driven non-linear layers.

[1] Lu Yanfei, Manik Rajora, Pan Zou, Steven Y. Liang. Physics-embedded machine learning: case study with electrochemical micro-machining. Machines 5.1 (2017): 4.

Intermediate variables

Linear coefficients

21 of 32

Modifying The Internal Structure of The Model

  •  

[1] Arka Daw, R. Quinn Thomas, Cayelan C. Carey, Jordan S. Read, Alison P. Appling, Anuj Karpatne: Physics-Guided Architecture (PGA) of Neural Networks for Quantifying Uncertainty in Lake Temperature Modeling. SDM 2020: 532-540

22 of 32

Translating Knowledge Into The Model

  • Knowledge: Logic rules
  • Translate the logic rules into a neural network

[1] Artur S. d'Avila Garcez, Gerson Zaverucha: The Connectionist Inductive Learning and Logic Programming System. Appl. Intell. 11(1): 59-77 (1999)

23 of 32

Guiding Parameter Selection

  • Case: clinical decision making
  • Challenge: Need to estimate the conditional probability values of a Bayesian model, but not enough training data is available.
  • Solution: Estimate the probabilities via medical domain knowledge (meta-analysis of large amounts of medical literature)

[1] Barbaros Yet, Zane B. Perkins, Todd E. Rasmussen, Nigel R. M. Tai, D. William R. Marsh: Combining data and meta-analysis to build Bayesian networks for clinical decision support. J. Biomed. Informatics 52: 373-385 (2014)

24 of 32

Fusing Knowledge with Data

Training data

Model architecture, hyperparameters, etc.

Cost function, learning algorithm, etc.

Trained model

preprocess

define

execute

get

  • Enrich the dataset with prior knowledge

  • Use knowledge to guide the selection of model architecture, hyperparameters, etc.

  • Integrate knowledge into the cost function, etc.

  • Evaluate and refine a trained model with knowledge

[1] Laura von Rueden, et al. "Informed Machine Learning--A Taxonomy and Survey of Integrating Knowledge into Learning Systems." arXiv preprint arXiv:1903.12394 (2019).

25 of 32

Knowledge as Penalty Terms in Loss Functions

  • Case: lake temperature modelling
  • Knowledge: the greater the depth, the higher the water density is.

  • Using this knowledge, we can add a penalty term in the loss function that measures how many times this knowledge is violated in the current model.

[1] Anuj Karpatne, William Watkins, Jordan S. Read, Vipin Kumar: Physics-guided Neural Networks (PGNN): An Application in Lake Temperature Modeling. CoRR abs/1710.11431 (2017)

Density relation

Depth relation

26 of 32

Knowledge as Constraints

  • Case: chiller plant energy optimization
  • Knowledge: minimum and maximum allowed values for water flow, temperature, etc.
  • Use the knowledge as constraint for the cost function

[1] Hoang Dung Vu, Kok-Soon Chai, Bryan Keating, Nurislam Tursynbek, Boyan Xu, Kaige Yang, Xiaoyan Yang, Zhenjie Zhang: Data Driven Chiller Plant Energy Optimization with Domain Knowledge. CIKM 2017: 1309-1317

27 of 32

Learning Algorithm Calibration

  • Case: discovery of time series motifs (frequently recurring patterns)
  • Challenge: The top-1 motif discovered is not what is expected by the domain experts.
  • Solution: let domain experts identify segments in time series where expected motifs are likely (or unlikely) to be found.

[1] Hoang Anh Dau, Eamonn J. Keogh: Matrix Profile V: A Generic Technique to Incorporate Domain Knowledge into Motif Discovery. KDD 2017: 125-134

28 of 32

Fusing Knowledge with Data

Training data

Model architecture, hyperparameters, etc.

Cost function, learning algorithm, etc.

Trained model

preprocess

define

execute

get

  • Enrich the dataset with prior knowledge

  • Use knowledge to guide the selection of model architecture, hyperparameters, etc.

  • Integrate knowledge into the cost function, etc.

  • Evaluate and refine a trained model with knowledge

[1] Laura von Rueden, et al. "Informed Machine Learning--A Taxonomy and Survey of Integrating Knowledge into Learning Systems." arXiv preprint arXiv:1903.12394 (2019).

29 of 32

Model Evaluation

  • Case: lake temperature modelling
  • Knowledge: the greater the depth, the higher the water density is.

  • We can use the number of times this knowledge is violated in the testing phase as an evaluation metric.

[1] Anuj Karpatne, William Watkins, Jordan S. Read, Vipin Kumar: Physics-guided Neural Networks (PGNN): An Application in Lake Temperature Modeling. CoRR abs/1710.11431 (2017)

Density relation

Depth relation

30 of 32

Model Enhancement

  • Case: clinical decision making
  • For a trained Bayesian model, select examples that it incorrectly predicted in the cross validation process for expert review. Use the review to enhance the model.
    • If the expert agrees with the prediction, then correct the label of this example in the dataset.
    • If the expert determines that an inevitable external factor has caused the false prediction, report this to the users so they better understand the limitations of the model.
    • If the expert determines that the model fails to take into account certain hidden variables, add them to the model and retrain it.

[1] Barbaros Yet, Zane Perkins, Norman E. Fenton, Nigel Tai, William Marsh: Not just data: A method for improving prediction with knowledge. J. Biomed. Informatics 48: 28-37 (2014)

31 of 32

References

  1. Laura von Rueden, et al. "Informed Machine Learning--A Taxonomy and Survey of Integrating Knowledge into Learning Systems." arXiv preprint arXiv:1903.12394 (2019).
  2. Thomas Hartmann, Assaad Moawad, François Fouquet, Grégory Nain, Jacques Klein, Yves Le Traon, Jean-Marc Jézéquel: Model-Driven Analytics: Connecting Data, Domain Knowledge, and Learning. CoRR abs/1704.01320 (2017).
  3. Vitali Hirsch, Peter Reimann, Bernhard Mitschang: Exploiting Domain Knowledge to address Multi-Class Imbalance and a Heterogeneous Feature Space in Classification Tasks for Manufacturing Data. Proc. VLDB Endow. 13(12): 3258-3271 (2020).
  4. Renjie We, Yuji Fujita, Kenichi Soga. "Integrating domain knowledge with deep learning models: An interpretable AI system for automatic work progress identification of NATM tunnels." Tunnelling and Underground Space Technology 105 (2020): 103558.
  5. Genshen Yan, Shen Liang, Yanchun Zhang, Fan Liu: Fusing Transformer Model with Temporal Features for ECG Heartbeat Classification. BIBM 2019: 898-905.
  6. R. Lily Hu, Kevin Leahy, Ioannis C. Konstantakopoulos, David M. Auslander, Costas J. Spanos, Alice M. Agogino: Using Domain Knowledge Features for Wind Turbine Diagnostics. ICMLA 2016: 300-307.
  7. Feng Jian, Yao Yu, Lu Senxiang, Liu Yue. Domain knowledge-based deep-broad learning framework for fault diagnosis[J], IEEE Transactions on Industrial Electronics, DOI: 10.1109/TIE.2020.2982085.
  8. Hoang Dung Vu, Kok-Soon Chai, Bryan Keating, Nurislam Tursynbek, Boyan Xu, Kaige Yang, Xiaoyan Yang, Zhenjie Zhang: Data Driven Chiller Plant Energy Optimization with Domain Knowledge. CIKM 2017: 1309-1317.

32 of 32

References

  1. Jie Yang, Tianyou Chai, Chaomin Luo, Wen Yu: Intelligent Demand Forecasting of Smelting Process Using Data-Driven and Mechanism Model. IEEE Trans. Ind. Electron. 66(12): 9745-9755 (2019).
  2. Lu Yanfei, Manik Rajora, Pan Zou, Steven Y. Liang. Physics-embedded machine learning: case study with electrochemical micro-machining. Machines 5.1 (2017): 4.
  3. Arka Daw, R. Quinn Thomas, Cayelan C. Carey, Jordan S. Read, Alison P. Appling, Anuj Karpatne: Physics-Guided Architecture (PGA) of Neural Networks for Quantifying Uncertainty in Lake Temperature Modeling. SDM 2020: 532-540.
  4. Artur S. d'Avila Garcez, Gerson Zaverucha: The Connectionist Inductive Learning and Logic Programming System. Appl. Intell. 11(1): 59-77 (1999)
  5. Barbaros Yet, Zane B. Perkins, Todd E. Rasmussen, Nigel R. M. Tai, D. William R. Marsh: Combining data and meta-analysis to build Bayesian networks for clinical decision support. J. Biomed. Informatics 52: 373-385 (2014).
  6. Anuj Karpatne, William Watkins, Jordan S. Read, Vipin Kumar: Physics-guided Neural Networks (PGNN): An Application in Lake Temperature Modeling. CoRR abs/1710.11431 (2017).
  7. Barbaros Yet, Zane Perkins, Norman E. Fenton, Nigel Tai, William Marsh: Not just data: A method for improving prediction with knowledge. J. Biomed. Informatics 48: 28-37 (2014).
  8. Oscar Serradilla, Ekhi Zugasti, Carlos Cernuda, Andoitz Aranburu, Julian Ramirez de Okariz, Urko Zurutuza: Interpreting Remaining Useful Life estimations combining Explainable Artificial Intelligence and domain knowledge in industrial machinery. FUZZ-IEEE 2020: 1-8.