Lecture 10: �Deep Neural Networks
Applied Data Science Spring 2025
Amir Hesam Salavati
Hamed Shah-Mansouri
https://ghabehfarda.ir/
Last Session We Covered...
Intro to Neural Networks
Some History
Feedforward Architectures
Recurrent Neural Networks
From Shallow to Deep Neural Networks
Image: https://laptrinhx.com/future-prospects-of-deep-learning-in-medicine-687217971/
Excellent and Fabulous Book to Read
Key Concepts
Image:https://medium.com/@buckhamduffymedia/understanding-the-relationship-between-artificial-intelligence-machine-learning-and-data-82cbaea388f5
‟AI is akin to building a rocket ship. The rocket engine is the learning algorithms but the fuel is the huge amounts of data we can feed to these algorithms〞
Andrew Ng
Some Fascinating Applications
Machine Translation
Automatic Image Captioning
A photo of a woman sitting on a cloud in space, wearing an astronaut suit and a helmet. She is smiling and waving at the camera.
Some Fascinating Applications
Text Generation
A short paragraph on the power of ANNs:
شبکه های عصبی مصنوعی (ANNs) ابزارهای قدرتمندی هستند که در بسیاری از زمینه ها، از جمله تشخیص چهره، ترجمه زبان و پردازش زبان طبیعی، مورد استفاده قرار می گیرند. ANNs از مغز انسان الگوبرداری شده اند و از مجموعهای از گرههای متصل تشکیل شدهاند که میتوانند اطلاعات را پردازش کنند و یاد بگیرند. ANNs توانایی یادگیری از داده ها و ایجاد الگوهای پیچیده را دارند، که آنها را برای طیف گسترده ای از وظایف مفید می کند.
در سالهای اخیر، ANNs پیشرفتهای چشمگیری داشتهاند و اکنون در بسیاری از محصولات و خدماتی که ما هر روز استفاده میکنیم، استفاده میشوند. به عنوان مثال، ANNs برای شناسایی چهره در تلفنهای هوشمند، ترجمه متن در برنامههای ترجمه و تشخیص کلمات در نرمافزار تشخیص گفتار استفاده میشوند.
Image: https://arxiv.org/pdf/1912.04958.pdf
Image Generation using GANs
Automatically taking care of a some of the cumbersome steps we had to perform manually
Image: https://quantdare.com/what-is-the-difference-between-deep-learning-and-machine-learning/
General Formulation
General Formulation
Gradient Descent
Image: https://miro.medium.com/max/1262/1*v0VYQkVnTfMF5ptEnvAGSA.jpeg
Gradient Descent
Steps: for each data point
Goal:
.�.�.
2. Update the weights:
Repeat the above steps several times (until convergence or max_itr)
Gradient Descent: Why Opposite the Gradient Direction?
Goal:
Step2: Update the weights:
Image:https://virgool.io/@danialfarsy/%D8%A8%D8%B1%D8%B1%D8%B3%DB%8C-%D9%88-%D9%85%D9%82%D8%A7%DB%8C%D8%B3%D9%87-batch-gradient-descentmini-batch-gradient-descentstochastic-gradient-descent-n4yklzivliiw
Gradient Descent: Learning Rate
Goal:
Step2: Update the weights:
Image:https://virgool.io/@danialfarsy/%D8%A8%D8%B1%D8%B1%D8%B3%DB%8C-%D9%88-%D9%85%D9%82%D8%A7%DB%8C%D8%B3%D9%87-batch-gradient-descentmini-batch-gradient-descentstochastic-gradient-descent-n4yklzivliiw
What Happens When Learning Rate is Too High/Low?
Gradient Descent: Properly Selecting Learning Rate
https://cs231n.github.io/neural-networks-3/
Gradient Descent: Batch Size
Step2: Update the weights:
When Shall we Stop Training in Gradient Descent?
Gradient Descent: Stopping
Perform training:
Stop when:
Image:https://researchgate.net/figure/Early-stopping-method_fig3_283697186
Image: towardsdatascience.com/how-does-back-propagation-in-artificial-neural-networks-work-c7cad873ea7
Historical Challenges of Gradient Descent in Neural Nets
Some Notations
.�.�.
Layer 2
Layer 4
Layer 1
.�.�.
.�.�.
Layer 3
Some Math
Based on the notations in the previous slide, we can write down the output of neurons in layer l as:
: the weight matrix from layer l-1 to layer l
: the output of layer l-1’s neurons (in vector form)
: the vectorized activation function, i.e.
Some Assumptions
Two necessary assumptions on the cost function:
2. It can be written as a function of the output layer only, i.e.
Backpropagation: Key Idea
The derivative of the cost function w.r.t. Weight at any given layer is proportional to:
Takeaways:
Backpropagation Algorithm in a Nutshell
.�.�.
Layer 2
Layer 4
Layer 1
.�.�.
.�.�.
Layer 3
2. Backward pass (backpropagation): For each layer, calculate errors:
3. Return gradients and update we:
Backpropagation Relation to Gradient Descent?
Backpropagation and Gradient Descent
Image: https://miro.medium.com/max/1262/1*v0VYQkVnTfMF5ptEnvAGSA.jpeg
Techniques to Improve the Learning of Deep Neural Nets
Image:employee-performance.com/blog/how-effective-performance-management-can-increase-companys-success/
Importance of Activation Function
An activation function that does not saturate easily or is not very low is ideal
Activation Functions
Binary
Linear
Images are from https:/v7labs.com/blog/neural-networks-activation-functions
Pros: simple
Cons: always saturated
Pros: never saturates
Cons: gradient is always 1
Better Activation Functions
Sigmoid/Tanh
ReLU (Rectified Linear Unit)
Images are from https:/v7labs.com/blog/neural-networks-activation-functions
Pros: nonlinear and simple
Cons: rapidly saturates
Pros: not easily saturates
Cons: don’t exactly know why it works well in practice!!
Other Famous Activation Functions
Leaky ReLU
Parametric ReLU
Images are from https:/v7labs.com/blog/neural-networks-activation-functions
Exponentially Linear Unit (ELU)
Importance of Suitable Objective Function
Weight Initialization
Importance of Weight Initialization
Regularization and Overfitting
Dropout
.�.�.
.�.�.
.�.�.
How Does “Dropout” Reduce Overfitting?
Dropout
.�.�.
.�.�.
.�.�.
K-Fold Cross Validation
Image: https://towardsdatascience.com/cross-validation-k-fold-vs-monte-carlo-e54df2fc179b
Expanding the Dataset
Image:https://researchgate.net/publication/319413978/figure/fig2/AS:533727585333249@1504261980375/Data-augmentation-using-semantic-preserving-transformation-for-SBIR.png
Expanding the Dataset
Image:https://medium.com/secure-and-private-ai-writing-challenge/data-augmentation-increases-accuracy-of-your-model-but-how-aa1913468722
Examples of Deep Neural Networks
Gradient Descent & Backpropagation
Performance improvement techniques
https://redbubble.com/i/sticker/data-scientist-deep-learning-joke-by-dataninja/53395629.EJUG5
ToDo List for Next Session