PRML Group Project
Team Members
This is our PRML course project on the problem statement of stroke prediction. Stroke is the second leading cause of death globally, responsible for approximately 11% of total deaths. The given dataset can be used to predict whether a patient is likely to get a stroke based on input parameters like gender, age, various diseases, and smoking status. Here we have used several traditional ml techniques in order to make predictions, then evaluated those techniques with various parameters and recorded observations in order to judge which techniques were effective and which ones were not effective
Introduction
KNN(K-Nearest Neighbor)
Core Idea:
Predicts a data point’s label by majority vote from its k nearest neighbors in the feature space.
Strength:
Simple to implement and works well with small, clean datasets.
Limitation:
Performance drops with high-dimensional or large datasets.
Bayesian Learning(using Naive Bayes assumption)
Core Idea:
Applies Bayes’ Theorem with strong (naive) independence assumptions between features.
Strength:
Very fast and performs well with high-dimensional data.
Limitation:
Assumes feature independence, which is rarely true in practice.
Logistic Regression
Core Idea:
Models the probability of a binary outcome using a logistic function.
Strength:
Easy to interpret and implement.
Limitation:
Limited to linear decision boundaries.
Decision Tree
Core Idea:
Splits data recursively based on features to form a tree-like structure for decision-making.
Strength:
Highly interpretable and works for both classification and regression.
Limitation:
Prone to overfitting without proper pruning.
MLP(Multi-layer Perceptron)
Core Idea:
A type of neural network with multiple layers that learn complex patterns.
Strength:
Can approximate any function given enough neurons and data.
Limitation:
Requires large data and computational power.
LDA (Linear Discriminant Analysis)
Core Idea:
Projects data onto a lower-dimensional space maximizing class separability.
Strength:
Reduces dimensionality while preserving class-discriminative information.
Limitation:
Assumes normally distributed classes with equal covariance.
PCA(Principal Component Analysis)
Core Idea:
Transforms features into uncorrelated principal components capturing maximum variance.
Strength:
Reduces dimensionality and noise.
Limitation:
Loses interpretability of original features.
Thank you