Neural Networks
and Deep Learning
DATA 621
Cristiano Fanelli
08/29/2024 - Lecture 1
Outline
2
3
Introduction
Welcome everybody to DATA 621 - Neural Networks and Deep Learning!
Prof Cristiano Fanelli
Our research group works at the nexus of data science and physical sciences, more info: https://cristianofanelli.com
Bayesian uncertainty quantification in ML/DL, anomaly detection, particle identification, fast simulation, AI-assisted design, multi-objective optimization, autonomous experimental control, calibration/alignment
4
These Lectures
All material can be found at
https://cfteach.github.io/NNDL_DATA621
Lectures
Tutorials
Assignments
Supplemental Material
5
Relationship with Other Courses @ W&M/DS
DATA 201 - Programming for Data Science
DATA 301 - Applied Machine Learning
DATA 442 (DATA 621 - for graduate) - Neural Network & Deep Learning
DATA 462 (DATA 622 - for graduate) - Generative AI
Grading
6
Final Project
7
For collaborative projects: please specify your individual contributions to the project, adhering to standard scientific work practices. The presentation can exceed [13] minutes, but it should not last longer than the number of participants multiplied by 13 minutes. Theoretical background, clarity of presentation will be assessed individually.
Please notice that you will receive questions from your peers and evaluations will be also based on the clarity of your answers (everyone in the collaborative project is encouraged to answer to questions).
Aspects that will influence grading
*13 minutes is tentative
8
If you want to discuss about the NN & Deep Learning course,
and other topics, I am in ISC 1265 (office hours 9:30-11:30, Friday)
Syllabus
More Information can be found in the course Syllabus (link here)
9
What do we mean
by Deep Learning?
Taxonomy
10
Data Mining: The process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.
Big Data: A term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis.
Predictive Analytics: The use of data, statistical algorithms and machine learning techniques to identify the likelihood of future outcomes based on historical data.
Natural Language Processing (NLP): A branch of AI that helps computers understand, interpret and manipulate human language.
UNSUPERVISED
SUPERVISED
REINFORCEMENT
Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. It infers a function from labeled training data consisting of a set of training examples
Reinforcement learning is concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward and make informed choices.
Unsupervised learning is a type of machine learning in which the algorithm is not provided with any pre-assigned labels or scores for the training data. Unsupervised learning algorithms must first self-discover any naturally occurring patterns in that training data set.
R. S. Sutton, and A. G Barto (1998), Reinforcement learning: An introduction, Vol. 1 (MIT press Cambridge)
ML
11
UNSUPERVISED
SUPERVISED
REINFORCEMENT
12
ML
NIPS 2016: “If intelligence is a cake,
the bulk of the cake is unsupervised learning,
the icing on the cake is supervised learning,
and the cherry on the cake is reinforcement learning”
LeCun, Turing award 2018
VP and Chief AI Scientist, Facebook
DeepMind
Deep Q-learning playing Atari Breakout
Nature,
518.7540 (2015)
Mnih et al, 1312.5602
Nature, 518.7540 (2015)
(2016)
13
It creates a hole to penetrate the barrier, and keeps throwing the ball there taking advantage of multiple bounces
Deep Learning
14
Deep Learning
DL contains many layers of neurons
“we stand at the height of some of the greatest accomplishments that happened in DL”
(2019)
Meta-learning [3]
Autopilot [2]
Natural Language Processing [1]
Video to video synthesis [4]
CF, INFN Machine Learning School, Camogli, 2019
15
16
(2024)
(today) AI/ML is ubiquitous
17
And many more applications (healthcare, finance, social media, cybersecurity, agriculture, etc.)
Smartphone assistance
Home automation
Entertainment
E-commerce
Autonomous vehicles
..and even dining
Foundation Models
18
Foundation models are large-scale machine learning models that are pre-trained on a wide range of data (Internet text), resulting in a model that can be adapted to a wide range of downstream tasks
Generative Pre-training Transformer (GPT), developed by OpenAI, is a leading example of a foundation model. It uses a transformer architecture to understand and generate human-like text
Foundation models like GPT have revolutionized AI, enabling more nuanced and context-aware applications. They also pose ethical challenges, such as potential bias in responses and the difficulty of controlling their output
In the broader NP and HEP community:
1st Large Language Models in Physics Symposium (LIPS) in Hamburg (DESY campus) from Feb 21 – 23, 2024.
Why is ML ubiquitous?
19
A. Radovic, et al. "Machine learning at the energy and intensity frontiers of particle physics." Nature 560.7716 (2018): 41-48.
20
How did it all start?
A (non-exhaustive) list of milestones in chronological order to add more context
Many ideas behind neural networks are relatively old, but they have been revitalized and popularized in more recent years
(1958) Perceptron & Artificial Neurons
21
Impulses carried away from cell body
Biological neuron processing chemical and electrical signals
Mark I Perceptron machine,
Connected to a camera with 20x20 photocells (400-pixel image)
Image recognition
(1958) Perceptron & Artificial Neurons
22
Binary classification, linear decision boundary
Activation: step function
Training
Perceptron is a “simple version of deep learning”
Key Components of DL Architectures
23
Example: multi-layer perceptron
These fundamental components will be revisited during the course
Core elements:
Optimization methods:
Image by Alec Radford
1960-1985 Backpropagation
24
G. Hinton: “I have never claimed that I invented backpropagation. David Rumelhart invented it independently long after people in other fields had invented it. It is true that when we first published we did not know the history so there were previous inventors that we failed to cite. What I have claimed is that I was the person to clearly demonstrate that backpropagation could learn interesting internal representations and that this is what made it popular.” (source1, source2)
G. Hinton: “My view is throw it all away and start again. The future depends on some graduate student who is deeply suspicious about everything I’ve said”. (source3)
1979-1982 Introduction of CNN and RNN
25
AI Winter ~ 80’s
26
Early Hype:
1970s Downturn:
Revival:
Takeaway:
(2009) ImageNet
27
AlexNet is a convolutional network architecture named after Alex Krizhevsky under the supervision of Hinton, utilized in the ImageNet Large Scale Visual Recognition (ILSVRC)-2012 competition. Utilized dropout for improved generalization.
Convolutional Neural Network
28
CNN “scans the image”
http://scs.ryerson.ca/~aharley/vis/conv/
(2014) Generative Adversarial Networks
30
GANs have raised ethical concerns in the area of deep fakes
2012-2016 More Recent Milestones
31
32
A Typical Problem
(Bias/Variance)
Typical Problem
33
Bias/Variance
34
Fitting vs
Predicting
35
(same linear)
polynomial order 10
(same polynomial order 10)
Training Data
Test Data
Linear dependence
(a)
(b)
(Simple Regression Examples)
36
(a)
(b)
Training Data
Test Data
Fitting vs
Predicting
37
Training Data
Test Data
Data from
polynomial order 10
Fitting vs
Predicting
ML can be difficult (way more than that ;)
38
39
Goals of DATA-621
Disclaimer: Uncertainty Quantification will be often mentioned but is not the main focus of the course. A good starting point for a more rigorous UQ treatment is the course BRDS
40
That’s It for Today!
41
Spares