1 of 15

BANK LOAN CASE STUDY 

By Sumit Kumar Prajapat

This Photo by Unknown author is licensed under CC BY-SA-NC.

2 of 15

PROJECT DESCRIPTION

Objective

  •  Analyze a loan application dataset for patterns, outliers, imbalances, and correlations related to loan defaults. Build insights for accurate prediction models.

Approach:

  • Missing Data: Use Python functions for identification (e.g., isnull()) and impute missing values (e.g., mean, median).
  • Outlier Detection: Utilize Python's statistical functions and visualizations to find outliers in numerical variables.
  • Data Imbalance: Calculate class frequencies in Python (pandas) and address imbalance through techniques like oversampling/undersampling.
  • Descriptive & Bivariate Analysis: Perform univariate analysis (describe()) and explore relationships between variables (correlation).
  • Correlation Analysis: Calculate variable-target correlations with Python visualize using heatmaps.

3 of 15

APPROACH

  • I began the project by analyzing a application dataset to understand its structure and identify any missing data. I used Python and its pandas library to handle missing values, employing functions like isnull() to detect missing entries.
  • To identify potential outliers that could impact the analysis, I utilized Python's statistical functions and visualizations, such as box plots and scatter plots.
  • Next, I addressed data imbalance issues in the dataset using Python's pandas library. I calculated  target variable with value_counts() and considered techniques to balance the data.
  • For gaining insights into the dataset, I conducted univariate, segmented univariate, and bivariate analyses using Python's describe() function for descriptive statistics, for segmented analysis, and  bivariate analysis.
  • To explore the relationships between variables and the target variable, I calculated correlation coefficients using Python's corr() function and visualized the correlations with heatmaps.
  • Throughout the project, I leveraged various Python libraries, including pandas and matplotlib, to efficiently handle, analyze, and visualize the data

4 of 15

TECH STACK USED

  • Tech-Stack Used:
  • Python: Utilized for data analysis and manipulation.
  • PyCharm : Facilitated efficient coding and debugging.
  • Libraries:
  • pandas: Employed for data handling and analysis.
  • Seaborn : Used for creating visualizations, including box plots, scatter plots, and heatmaps.
  • matplotlib : Used for additional data visualization capabilities and customization.

5 of 15

INSIGHT

  • Addressed missing data using Python, ensuring dataset completeness.
  • Identified outliers in numerical variables, understanding extreme data points' impact.
  • Balanced data distribution, improving prediction model accuracy.
  • Discovered patterns and trends through descriptive and bivariate analysis.
  • Uncovered significant correlations, highlighting loan default indicators.
  • Built accurate prediction models for risk assessment and informed decision-making.
  • Enhanced risk management and loan approval procedures for financial institutions.

6 of 15

RESULT 

  • Analyzed the loan application dataset, gaining insights into loan default factors.
  • Utilized Python and its libraries for proficient data analysis.
  • Handled missing data, identified outliers, and addressed data imbalances.
  • Built accurate loan default prediction models, improving risk assessment.
  • Emphasized the importance of data-driven decision-making in financial institutions.
  • Acquired transferable skills applicable to diverse datasets and real-world scenarios.
  • A transformative and enlightening journey in data analysis and financial risk management.

7 of 15

TASK 1 MISSING DATA

8 of 15

TASK 2 OUTLIERS

9 of 15

TASK 3 DATA IMBALANCED

10 of 15

TASK 4: UNIVARIATE ANALYSIS

11 of 15

TASK 4: SEGMENT UNIVARIATE ANALYSIS

12 of 15

TASK 4: BIVARIATE ANALYSIS

13 of 15

TASK 5: CORELATION FOR PAYMENT DIFFICULTIES

14 of 15

TASK 5: CORELATION FOR NO PAYMENT DIFFICULTIES

15 of 15

THANK YOU 

By Sumit K Prajapat

Drive Link: Data

Video Explanation: First Video

second vedio

Third Vedio

Fourth video