MACHINE LEARNING
Feature Engineering
& Selection
How to create new features and select the most important ones for models
Lecture Series • Data Science & ML • 2025
What is Feature Engineering?
Feature Engineering is the process of using domain knowledge to transform raw data into features that better represent the underlying patterns for machine learning models — directly impacting model accuracy and performance.
Transform
Convert raw data
into useful formats
Create
Build new features
from existing ones
Select
Keep only the most
informative features
Types of Features
Numerical
Categorical
Text
Temporal
Feature Creation Techniques
Polynomial Features
Create x², x³, or interaction terms like x₁ × x₂ to capture non-linear relationships
Binning / Discretization
Convert continuous values into groups: age → [child, teen, adult, senior]
Aggregation
Group-level stats: mean purchase per customer, max sessions per day
Ratio & Interaction
Combine columns: BMI = weight/height², profit_margin = profit/revenue
Python Example
from sklearn.preprocessing
import PolynomialFeatures
import pandas as pd
# Polynomial features
poly = PolynomialFeatures(
degree=2,
include_bias=False)
X_poly = poly.fit_transform(X)
# Binning
df['age_group'] = pd.cut(
df['age'],
bins=[0,18,35,60,99],
labels=['child',
'young','mid','senior'])
# Ratio feature
df['bmi'] = (
df['weight']/df['height']**2)
Encoding Categorical Features
One-Hot Encoding
USE: Nominal categories with no order
Color: [Red, Blue, Green] → 3 binary columns
Label Encoding
USE: Ordinal categories with natural order
Size: S=0, M=1, L=2, XL=3
Target Encoding
USE: High-cardinality categories
Replace category with mean of target variable
Frequency Encoding
USE: When count matters
Replace category with its frequency in dataset
Binary Encoding
USE: Large number of categories
Category → integer → binary digits → columns
Embedding (DL)
USE: Text, very high-cardinality features
Word2Vec, Entity Embeddings in neural nets
Date & Time Feature Engineering
Extract from datetime:
Year, Month, Day, Hour, Minute
Day of week (Monday=0 ... Sunday=6)
Is weekend? Is holiday?
Week of year, Quarter
Time since last event (lag)
Rolling window aggregations
Python — Datetime Features
import pandas as pd
df['date'] = pd.to_datetime(
df['date_column'])
# Extract components
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df['day'] = df['date'].dt.day
df['hour'] = df['date'].dt.hour
df['weekday'] = df['date'].dt.weekday
df['is_weekend'] = (
df['weekday'] >= 5).astype(int)
df['quarter'] = df['date'].dt.quarter
# Lag features
df['lag_7'] = df['sales'].shift(7)
# Rolling window
df['rolling_14'] = (
df['sales']
.rolling(window=14)
.mean())
Feature Scaling & Normalization
Min-Max Scaling
x' = (x - min) / (max - min)
Range: [0, 1] | When distribution is not Gaussian
Z-Score Standardization
x' = (x - μ) / σ
Mean=0, Std=1 | When algorithm assumes normal distribution
Robust Scaling
x' = (x - median) / IQR
Handles outliers well | Data with significant outliers
Feature Selection Methods
Filter Methods
Statistical tests to rank features independently of the model
✓ Fast, model-agnostic
✗ Ignores feature interactions
Wrapper Methods
Use a model to evaluate subsets of features iteratively
✓ Finds best subset
✗ Computationally expensive
Embedded Methods
Feature selection happens as part of model training
✓ Efficient & accurate
✗ Model-specific
Feature Importance & Evaluation
Drop Low-Importance Features
Features with importance < 0.01 often add noise
Use SHAP Values
Explains individual predictions and global importance
Permutation Importance
Model-agnostic: shuffle each feature and measure accuracy drop
Correlation Filter
Remove one of two features with correlation > 0.95
Summary & Best Practices
Best Practices
Understand domain: talk to experts about meaningful features
Always scale features for distance-based models (KNN, SVM)
Use cross-validation when evaluating feature importance
Handle missing values before feature engineering
Start simple — add complexity only when needed
Common Pitfalls
Data leakage: don't use future data to create features
Too many features → overfitting (curse of dimensionality)
Scaling test data using training statistics only
Key Takeaway: Feature Engineering often has more impact on model performance than algorithm choice!