Introduction to Machine Learning
XIX Seminar on Software for Nuclear, Subnuclear and Applied Physics
June 6-10 2022
1
Topics
2
Virtual Machine setup (you can launch and let it go)
# run once
su
yum install rh-python38
exit
# run each time you want to start the python 3.8 environment
scl enable rh-python38 bash
#run once in the python 3.8 environment to install the ML stuff
pip install –user keras tensorflow numpy matplotlib opencv-python install sklearn jupyter seaborn ipympl
#run a notebook
Jupyter notebook
#then use firefox to open the link
3
#in case of error with matplotlib backend
pip uninstall matplotlib
pip install matplotlib --user
Quick Poll
4
Introduction on these ML lectures
This lectures are based on the ML lectures of the Unipi course “Computing Methods in Experimental Physics and Data Analysis”, slides are updated every year as this is an active research field, new ideas emerge every year, if interested you should learn to stay tuned (read, study, update your knowledges)
Suggested books (both having free online versions)
5
Machine learning
A possible definition (from wikipedia):
Replace “programmers” with computer programs
Multiple applications, for example:
6
In experimental and applied physics
examples are everywhere..
… and your ideas! This is a growing field ...
7
PRL paper observation of Higgs to bb
4 different ML algorithms used for different tasks
in this analysis
Computing in High Energy Physics Conference
8
Real time alerts and automatic telescope pointing
9
Machine Learning basics
(or the “dictionary” for next lectures)
10
Types of typical ML problems
11
Function approximation
12
classification
regression
x2
x1
x1
y
y=f(x)
We usually call “x” the inputs or features
We usually call “y” the output or target
Model and Hyper-parameters
13
Parameters
y(x) = ax + bx^2 + cx^3 + d (a,b,c,d are the parameters)
14
Objective function
The process is not very different from a typical phys-lab1 chi2 fit… but the number of parameters can be several orders of magnitude larger (10^3 to 10^6)
15
Objective function: binary cross entropy
16
Learning / Training
17
Supervised learning
18
Unsupervised learning
19
Supervised vs unsupervised
Supervised and unsupervised are not as different as one would imagine, in fact
20
Reinforcement learning
Applies to “agents” acting in an “environment” that updates their state
21
Capacity and representational power
22
Capacity and representational power
23
Capacity and representational power
24
Generalization
25
Regularization
In order to control the “generalization gap”
https://xgboost.readthedocs.io/en/latest/tutorials/model.html
26
Hyperparameters(model) optimization
27
K-folding cross validation
28
Inference
29
30
ROC
Confusion Matrix
Accuracy, Precision, Sensitivity, Specificity
Examples of ML techniques
31
Linear regression
32
Supervised
Principal Component Analysis (aka PCA)
More complex dimensionality reduction Manifold Learning: https://github.com/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/05.10-Manifold-Learning.ipynb
33
Unsupervised
Nearest neighbors
34
Decision trees
35
Ensembles of trees
36
bagging
Gradient boosting
Limitations of decision trees
37
x1
x2
Many more ML techniques!
Scikit-learn library offers many ML techniques implementation in python
38
Today hands on session
In the next lectures we will use “colab” from google to run py notebooks
First exercise is taken from Python Data Science Handbook by Jake VanderPlas with some minor edits (the content is available on GitHub. The text is released under the CC-BY-NC-ND license, and code is released under the MIT license. If you find this content useful, please consider supporting the work by buying the book!)
Click here and “make a copy” to be able to edit: https://colab.research.google.com/drive/1Sqn5fuiB5-2EP6UKUmwqjQd_b3uUNu2r?usp=sharing
Or directly download with
wget "https://drive.google.com/uc?export=download&id=1Sqn5fuiB5-2EP6UKUmwqjQd_b3uUNu2r" -o sk.ipynb
39