Knowledge-guided �Data Science
Overview and Case Studies
Outline
Data-driven Methods vs Knowledge-driven Methods
a
b
c
(3, 4)
(6, 8)
……
regression model f
5
10
……
Knowledge-guided Data Science
[1] Laura von Rueden, et al. "Informed Machine Learning--A Taxonomy and Survey of Integrating Knowledge into Learning Systems." arXiv preprint arXiv:1903.12394 (2019).
Sources of Knowledge
[1] Laura von Rueden, et al. "Informed Machine Learning--A Taxonomy and Survey of Integrating Knowledge into Learning Systems." arXiv preprint arXiv:1903.12394 (2019).
Representations of Knowledge
[1] Laura von Rueden, et al. "Informed Machine Learning--A Taxonomy and Survey of Integrating Knowledge into Learning Systems." arXiv preprint arXiv:1903.12394 (2019).
Fusing Knowledge with Data
Training data
Model architecture, hyperparameters, etc.
Cost function, learning algorithm, etc.
Trained model
preprocess
define
execute
get
[1] Laura von Rueden, et al. "Informed Machine Learning--A Taxonomy and Survey of Integrating Knowledge into Learning Systems." arXiv preprint arXiv:1903.12394 (2019).
Fusing Knowledge with Data
Training data
Model architecture, hyperparameters, etc.
Cost function, learning algorithm, etc.
Trained model
preprocess
define
execute
get
[1] Laura von Rueden, et al. "Informed Machine Learning--A Taxonomy and Survey of Integrating Knowledge into Learning Systems." arXiv preprint arXiv:1903.12394 (2019).
Generating New Training Data
[1] Thomas Hartmann, Assaad Moawad, François Fouquet, Grégory Nain, Jacques Klein, Yves Le Traon, Jean-Marc Jézéquel: Model-Driven Analytics: Connecting Data, Domain Knowledge, and Learning. CoRR abs/1704.01320 (2017)
Data Compression
[1] Thomas Hartmann, Assaad Moawad, François Fouquet, Grégory Nain, Jacques Klein, Yves Le Traon, Jean-Marc Jézéquel: Model-Driven Analytics: Connecting Data, Domain Knowledge, and Learning. CoRR abs/1704.01320 (2017)
Dividing The Training Set
[1] Vitali Hirsch, Peter Reimann, Bernhard Mitschang: Exploiting Domain Knowledge to address Multi-Class Imbalance and a Heterogeneous Feature Space in Classification Tasks for Manufacturing Data. Proc. VLDB Endow. 13(12): 3258-3271 (2020)
Knowledge as Model Input
[1] Renjie We, Yuji Fujita, Kenichi Soga. "Integrating domain knowledge with deep learning models: An interpretable AI system for automatic work progress identification of NATM tunnels." Tunnelling and Underground Space Technology 105 (2020): 103558.
Fusing domain-specific features with learned features
[1] Genshen Yan, Shen Liang, Yanchun Zhang, Fan Liu: Fusing Transformer Model with Temporal Features for ECG Heartbeat Classification. BIBM 2019: 898-905
Generating New Features
[1] R. Lily Hu, Kevin Leahy, Ioannis C. Konstantakopoulos, David M. Auslander, Costas J. Spanos, Alice M. Agogino: Using Domain Knowledge Features for Wind Turbine Diagnostics. ICMLA 2016: 300-307
Generating Pseudo Labels
[1] Feng Jian, Yao Yu, Lu Senxiang, Liu Yue. Domain knowledge-based deep-broad learning framework for fault diagnosis[J], IEEE Transactions on Industrial Electronics, DOI: 10.1109/TIE.2020.2982085
Fusing Knowledge with Data
Training data
Model architecture, hyperparameters, etc.
Cost function, learning algorithm, etc.
Trained model
preprocess
define
execute
get
[1] Laura von Rueden, et al. "Informed Machine Learning--A Taxonomy and Survey of Integrating Knowledge into Learning Systems." arXiv preprint arXiv:1903.12394 (2019).
Guiding Model Selection
[1] Hoang Dung Vu, Kok-Soon Chai, Bryan Keating, Nurislam Tursynbek, Boyan Xu, Kaige Yang, Xiaoyan Yang, Zhenjie Zhang: Data Driven Chiller Plant Energy Optimization with Domain Knowledge. CIKM 2017: 1309-1317
Guiding Model Design
[1] Kenneth Marino, Ruslan Salakhutdinov, Abhinav Gupta: The More You Know: Using Knowledge Graphs for Image Classification. CVPR 2017: 20-28
Knowledge As The Main Model
A hybrid of data- and knowledge-driven parameter estimation modules
Mechanism model
[1] Jie Yang, Tianyou Chai, Chaomin Luo, Wen Yu: Intelligent Demand Forecasting of Smelting Process Using Data-Driven and Mechanism Model. IEEE Trans. Ind. Electron. 66(12): 9745-9755 (2019)
Knowledge as Part of The Model
[1] Lu Yanfei, Manik Rajora, Pan Zou, Steven Y. Liang. Physics-embedded machine learning: case study with electrochemical micro-machining. Machines 5.1 (2017): 4.
Intermediate variables
Linear coefficients
Modifying The Internal Structure of The Model
[1] Arka Daw, R. Quinn Thomas, Cayelan C. Carey, Jordan S. Read, Alison P. Appling, Anuj Karpatne: Physics-Guided Architecture (PGA) of Neural Networks for Quantifying Uncertainty in Lake Temperature Modeling. SDM 2020: 532-540
Translating Knowledge Into The Model
[1] Artur S. d'Avila Garcez, Gerson Zaverucha: The Connectionist Inductive Learning and Logic Programming System. Appl. Intell. 11(1): 59-77 (1999)
Guiding Parameter Selection
[1] Barbaros Yet, Zane B. Perkins, Todd E. Rasmussen, Nigel R. M. Tai, D. William R. Marsh: Combining data and meta-analysis to build Bayesian networks for clinical decision support. J. Biomed. Informatics 52: 373-385 (2014)
Fusing Knowledge with Data
Training data
Model architecture, hyperparameters, etc.
Cost function, learning algorithm, etc.
Trained model
preprocess
define
execute
get
[1] Laura von Rueden, et al. "Informed Machine Learning--A Taxonomy and Survey of Integrating Knowledge into Learning Systems." arXiv preprint arXiv:1903.12394 (2019).
Knowledge as Penalty Terms in Loss Functions
[1] Anuj Karpatne, William Watkins, Jordan S. Read, Vipin Kumar: Physics-guided Neural Networks (PGNN): An Application in Lake Temperature Modeling. CoRR abs/1710.11431 (2017)
Density relation
Depth relation
Knowledge as Constraints
[1] Hoang Dung Vu, Kok-Soon Chai, Bryan Keating, Nurislam Tursynbek, Boyan Xu, Kaige Yang, Xiaoyan Yang, Zhenjie Zhang: Data Driven Chiller Plant Energy Optimization with Domain Knowledge. CIKM 2017: 1309-1317
Learning Algorithm Calibration
[1] Hoang Anh Dau, Eamonn J. Keogh: Matrix Profile V: A Generic Technique to Incorporate Domain Knowledge into Motif Discovery. KDD 2017: 125-134
Fusing Knowledge with Data
Training data
Model architecture, hyperparameters, etc.
Cost function, learning algorithm, etc.
Trained model
preprocess
define
execute
get
[1] Laura von Rueden, et al. "Informed Machine Learning--A Taxonomy and Survey of Integrating Knowledge into Learning Systems." arXiv preprint arXiv:1903.12394 (2019).
Model Evaluation
[1] Anuj Karpatne, William Watkins, Jordan S. Read, Vipin Kumar: Physics-guided Neural Networks (PGNN): An Application in Lake Temperature Modeling. CoRR abs/1710.11431 (2017)
Density relation
Depth relation
Model Enhancement
[1] Barbaros Yet, Zane Perkins, Norman E. Fenton, Nigel Tai, William Marsh: Not just data: A method for improving prediction with knowledge. J. Biomed. Informatics 48: 28-37 (2014)
References
References