1 of 10

MACHINE LEARNING

Feature Engineering

& Selection

How to create new features and select the most important ones for models

Lecture Series • Data Science & ML • 2025

2 of 10

What is Feature Engineering?

Feature Engineering is the process of using domain knowledge to transform raw data into features that better represent the underlying patterns for machine learning models — directly impacting model accuracy and performance.

Transform

Convert raw data

into useful formats

Create

Build new features

from existing ones

Select

Keep only the most

informative features

3 of 10

Types of Features

Numerical

  • Continuous: age, salary, temperature
  • Discrete: count of items, number of rooms

Categorical

  • Nominal: color, city, product type
  • Ordinal: rating (1-5), education level

Text

  • TF-IDF, word embeddings
  • N-grams, sentiment scores

Temporal

  • Hour, day, month, weekday
  • Time since event, lag features

4 of 10

Feature Creation Techniques

Polynomial Features

Create x², x³, or interaction terms like x₁ × x₂ to capture non-linear relationships

Binning / Discretization

Convert continuous values into groups: age → [child, teen, adult, senior]

Aggregation

Group-level stats: mean purchase per customer, max sessions per day

Ratio & Interaction

Combine columns: BMI = weight/height², profit_margin = profit/revenue

Python Example

from sklearn.preprocessing

import PolynomialFeatures

import pandas as pd

# Polynomial features

poly = PolynomialFeatures(

degree=2,

include_bias=False)

X_poly = poly.fit_transform(X)

# Binning

df['age_group'] = pd.cut(

df['age'],

bins=[0,18,35,60,99],

labels=['child',

'young','mid','senior'])

# Ratio feature

df['bmi'] = (

df['weight']/df['height']**2)

5 of 10

Encoding Categorical Features

One-Hot Encoding

USE: Nominal categories with no order

Color: [Red, Blue, Green] → 3 binary columns

Label Encoding

USE: Ordinal categories with natural order

Size: S=0, M=1, L=2, XL=3

Target Encoding

USE: High-cardinality categories

Replace category with mean of target variable

Frequency Encoding

USE: When count matters

Replace category with its frequency in dataset

Binary Encoding

USE: Large number of categories

Category → integer → binary digits → columns

Embedding (DL)

USE: Text, very high-cardinality features

Word2Vec, Entity Embeddings in neural nets

6 of 10

Date & Time Feature Engineering

Extract from datetime:

Year, Month, Day, Hour, Minute

Day of week (Monday=0 ... Sunday=6)

Is weekend? Is holiday?

Week of year, Quarter

Time since last event (lag)

Rolling window aggregations

Python — Datetime Features

import pandas as pd

df['date'] = pd.to_datetime(

df['date_column'])

# Extract components

df['year'] = df['date'].dt.year

df['month'] = df['date'].dt.month

df['day'] = df['date'].dt.day

df['hour'] = df['date'].dt.hour

df['weekday'] = df['date'].dt.weekday

df['is_weekend'] = (

df['weekday'] >= 5).astype(int)

df['quarter'] = df['date'].dt.quarter

# Lag features

df['lag_7'] = df['sales'].shift(7)

# Rolling window

df['rolling_14'] = (

df['sales']

.rolling(window=14)

.mean())

7 of 10

Feature Scaling & Normalization

Min-Max Scaling

x' = (x - min) / (max - min)

Range: [0, 1] | When distribution is not Gaussian

Z-Score Standardization

x' = (x - μ) / σ

Mean=0, Std=1 | When algorithm assumes normal distribution

Robust Scaling

x' = (x - median) / IQR

Handles outliers well | Data with significant outliers

8 of 10

Feature Selection Methods

Filter Methods

Statistical tests to rank features independently of the model

  • Correlation coefficient
  • Chi-Square test
  • ANOVA F-test
  • Mutual Information

✓ Fast, model-agnostic

✗ Ignores feature interactions

Wrapper Methods

Use a model to evaluate subsets of features iteratively

  • Recursive Feature Elimination (RFE)
  • Forward selection
  • Backward elimination
  • Stepwise selection

✓ Finds best subset

✗ Computationally expensive

Embedded Methods

Feature selection happens as part of model training

  • L1/Lasso Regularization
  • Tree-based importance
  • ElasticNet
  • Ridge Regression

✓ Efficient & accurate

✗ Model-specific

9 of 10

Feature Importance & Evaluation

Drop Low-Importance Features

Features with importance < 0.01 often add noise

Use SHAP Values

Explains individual predictions and global importance

Permutation Importance

Model-agnostic: shuffle each feature and measure accuracy drop

Correlation Filter

Remove one of two features with correlation > 0.95

10 of 10

Summary & Best Practices

Best Practices

Understand domain: talk to experts about meaningful features

Always scale features for distance-based models (KNN, SVM)

Use cross-validation when evaluating feature importance

Handle missing values before feature engineering

Start simple — add complexity only when needed

Common Pitfalls

Data leakage: don't use future data to create features

Too many features → overfitting (curse of dimensionality)

Scaling test data using training statistics only

Key Takeaway: Feature Engineering often has more impact on model performance than algorithm choice!