Neural Networks
and Deep Learning
DATA 442 & 621
Cristiano Fanelli
02/04/2025 - Lecture 4
Outline
2
References:
Raschka et al, chap 4
3
Missing Data
df.isnull()
df
4
Missing Data
5
Eliminating examples or features
# only drop rows where ‘all’ columns are NaN
# drop rows that have less than 4 real values
6
Imputing
df
7
Imputing
8
Categorical Data
*Depending on the context and applications
9
Ordinal Categorical Data
10
Ordinal Categorical Data
11
Nominal Data: One-hot Encoding
# Initial Dataset
# Effect on the column to encode as Ohe
# Overall transformation of the initial dataset
Leave the other columns untouched
12
One-hot Encoding
13
Partitioning a Dataset
The stratify parameter in train_test_split ensures that the class proportions in the y labels are maintained in both the training and testing sets. This is important when dealing with imbalanced datasets where certain classes are underrepresented.
E.g., if 70% of your data belongs to class A and 30% to class B, the train_test_split will ensure that this ratio is maintained in both the training and testing sets.
14
Partitioning a Dataset
Some Recipes:
15
Scaling Features
16
Selecting Meaningful Features
MSE is “spherical”
The larger the value of the regularization parameter λ the faster the penalization loss grows, which leads to a narrower ball
17
Selecting Meaningful Features
L1 provide sharp contours, and encourages sparsity
L1 inherently serves as a method for feature selection
18
Implementations in sklearn
liblinear (open source library for large-scale linear classification; uses coordinate descent and other methods)
One vs Rest: splits a multi-class classification into one binary classification problem per class.
During inference, the model calculates the probability for each class, and the class with the highest probability is selected.
19
Impact of Regularization on weights
Using L1
performs feature selection
20
Other methods for feature selection
21
Feature Importance with Random Forest
Impurity Reduction (Classification): For classification, the importance of a feature is measured by the decrease in Gini impurity or entropy when a feature is used to split a node. The more a feature decreases the impurity, the more important it is.
22
Feature Importance with Random Forest
Variance Reduction (Regression): For regression, feature importance is assessed based on how much a feature decreases the variance of the split. A significant reduction in variance implies a higher importance of the feature.
23
Intro to
Optimization Techniques
Methods potentially covered in this course
24
Many other methods/approaches - not covered.
Acquisition Functions
25
Best found so far
We are sampling x
Exploitation
Exploration
f
x
Acquisition Functions
26
random
RS
EI
N calls
E. Brochu, Eric, V. M. Cora, and N. De Freitas. "A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning." arXiv:1012.2599 (2010).
See also Kriging, geostatistical interpolation technique,
Gaussian Processes in a nutshell
27
What kind of problems are we talking about?
28
GA Optimization Strategies
29
MOO Pipelines: e.g., MOBO
Multi-Objective Bayesian Optimization
(e.g., Ax: adaptive experimentation platform supported by Meta AI)
See 2nd AI4EIC workshop, https://indico.bnl.gov/e/AI4EIC
An extension of BO we already discussed about
30
K-Fold Cross-Validation
31
Learning Curves
32
Confusion Matrix
33
Classification Metrics
34
Classification Metrics
35
Classification Metrics
Matthews Correlation Coefficient
36
Receiver Operating Characteristic
37
Spares