1 of 33

STUDENTS PERFORMANCE AND BEHAVIOR ANALYSIS.

Ruth Tabitha, DS33A.

2 of 33

Get to know me.

Skills:

Data Analysis and Data Visualization with with Python, SQL, Power BI, Tableau

Work:

SMILE Helpdesk – UNDP

Immunization Data Analyst - Ministry of Health

Training:

Data Analyst & Data Science – Dibimbing.id

Education:

Public Health (S.KM) – UI

Epidemiology (M.Epid) – UI

Hi? I’m Ruth Tabitha.

3 of 33

Outline.

DATA UNDERSTANDING

DATA PREPARING

  • Cleaning
  • Manipulating

EXPLORATORY DATA ANALYSIS

  • Descriptive
  • Analytics

4 of 33

Data Understanding.

5 of 33

  • This dataset contains real data from 5,000 students collected by a private learning provider.
  • It includes important attributes (like age, gender, department, grades, scores, attendance, study habits, stress level, parental background, extracurricular activities, etc.) that allows to:

        • Essentially, this dataset is designed to help examine how various factors relate to academic performance, using real-school data.

A General Information.

Explore patterns and trends in student performance

Analyze correlations between academic success and factors like study hours, sleep, stress, attendance, or participation in extracurricular activities.

Identify insights about what influences student outcomes, including demographic or social factors.

6 of 33

Objective.

To found the key factors that influence students’ academic performance.

This research is conducted to provide insights for lecturers, rector, or the university stakeholders on how to enhance student outcomes and overall educational quality.

7 of 33

Dataset Column Description.

Column

Description

Student ID

Unique identifier for each student.

Name

Student’s first and last name.

Gender

Male and Female.

Age

The age of the student.

Department

Student's department (e.g., CS, Engineering, Business).

Attendance (%)

Attendance percentage (0-100%).

Score

Midterm, Final, Assignment, Quizzes, Project and Total Score (out of 100)

Participation_Score

Score based on class participation (0-10).

Grade

Letter grade (A, B, C, D, F).

Study_Hours_per_Week

Average study hours per week.

Extracurricular_Activities

Whether the student participates in extracurriculars (Yes/No).

Internet_Access_at_Home

Does the student have access to the internet at home? (Yes/No).

Parent_Education_Level:

Highest education level of parents (None, High School, Bachelor's, Master's, PhD).

Family_Income_Level

Low, Medium, High

Stress_Level

Self-reported stress level (1: Low, 10: High).

Sleep_Hours_per_Night

Average hours of sleep per night.

8 of 33

Data Preparing.

9 of 33

There are 1,025 missing values in Parent Education Level, but since they contain 'None,' which is assumed to mean 'No Education,’ it will replace the values accordingly.

Check Missing Values.

Check Duplicates.

Handle the Missing Values.

No duplicate records found.

Data Cleaning and Preprocessing.

10 of 33

Results.

Check Outliers.

With IQR Method.

Data Cleaning and Preprocessing.

The total_score ranges between 66 and 76, with four outliers: one at the minimum score (50.6) and three at the higher end, ranging from 92 to 95. However, since it is 0.08% and don't affect the whole data, we do not have to handle it.

Visualization.

11 of 33

Results.

Adding New Column.

Data Manipulating.

Analyze Range.

Analyze the range and criteria used to assign Grade labels (e.g., 'A', 'B', 'C') based on Total Score.

12 of 33

Statistic Descriptive.

All data completely contains 5000 rows. Based on the table above, the data is assumed to have a normal distribution, as the mean and median values are close.

13 of 33

Exploratory Data Analysis.

14 of 33

Descriptive Analysis.

Delve deep on the overall data.

Let’s understand what is happening.

15 of 33

Research Questions.

1. How is the student’s distribution for all variables?

2. Which academic department has the highest-performing students, and why?

16 of 33

Distribution Between Variables.

Gender.

Age.

17 of 33

Distribution Between Variables.

Extracurricullar Act.

Internet.

18 of 33

Distribution Between Variables.

Family Income.

Parent’s Education.

19 of 33

Distribution Between Variables.

Students’ Performance.

20 of 33

Take a deep look: The Performing Students Across Department.

Total Score Distribution by Dep.

There is no significant difference between students’ scores across departments, and the distribution of grade ranges and student performance is very similar.

Looking more closely, Computer Science student has the highest score (95), but this is not much different than the top scores in other departments.

Student’s Performance Distribution by Dep.

21 of 33

Quantitative Analysis.

Delve deeper on the specific variables.

Let’s see if there is any relationship between the variables.

22 of 33

Research Questions.

1. Which numerical attribute have the most influence on the student’s score?

2. Are there any gender differences in academic achievement?

3. Do students who engage in extracurricular activities perform better or worse academically?

4. Does access to the internet at home impact students’ academic outcomes?

5. How do parental education levels influence students' academic performance?

6. Does family income level have an effect on academic performance?

23 of 33

Focus on numerical attributes: How’s the correlation?

Measure with correlation-matrix.

The attributes that seems to have influence on the total score comes from the project score, followed by the final score, midterm score, and assignment score, then quizzes and participation. However, the influence is not strong enough since it’s below 1.

There is no evident relationship with the other variables. In fact, the results of this test show how much each assessment contributes to the total score.

24 of 33

Male or Female: Which performed the best?

Measure with ANOVA test.

Since the p-value is above the alpha level (0.05), we can conclude that there is no statistically significant difference in mean Total Scores among Male or Female. This can be seen from the similar ranges and medians of the boxplots.

However, Female students have the opportunity to achieve the highest scores, rather than Male students.

ANOVA F-statistic: 0.0056

ANOVA P-value: 0.9405

25 of 33

Extracurricular Activities: Is it impacted to student’s score?

Measure with ANOVA test.

Since the p-value is above the alpha level (0.05), we can conclude that there is no statistically significant difference in mean Total Scores among the groups based on Extracurricular Activities. This can be seen from the similar ranges and medians of the boxplots.

However, students with extracurricular activities have the opportunity to achieve the highest scores, based on the boxplots.

ANOVA F-statistic: 0.0638

ANOVA P-value: 0.8006

26 of 33

Internet Use at Home: Is it impacted to student’s score?

Measure with t-test.

Since the p-value is greater than the alpha level (0.05), we conclude that there is no statistically significant difference in mean Total Scores between students with and without internet access at home. This can be seen from the similar ranges and medians of the boxplots.

However, students with internet access have the opportunity to achieve the highest scores, based on the boxplots.

Independent Samples t-test:

T-statistic: -0.5512

P-value: 0.5815

27 of 33

Parental Education: Is it impacted to student’s score?

Measure with ANOVA test.

ANOVA F-statistic: 0.7638

ANOVA P-value: 0.5487

Since the p-value is greater than the alpha level (0.05), we conclude that there is no statistically significant difference in mean Total Scores among the groups based on Parental Education.

Students whose parents have higher education are not guaranteed to get higher scores than those whose parents have lower or no education. This can be seen from the similar ranges and medians of the boxplots.

Interestingly, students whose parents completed only high school have a chance of achieving the highest scores.

28 of 33

Family Income: Is it impacted to student’s score?

Measure with ANOVA test.

ANOVA F-statistic: 0.3296

ANOVA P-value: 0.7192

Since the p-value is greater than the alpha level (0.05), we conclude that there is no statistically significant difference in mean Total Scores among the groups based on Family Income Level.

Students who have higher family income level are not guaranteed to get higher scores than those who have low family income level. This can be seen from the similar ranges and medians of the boxplots.

Interestingly, students who have low family income level have a chance of achieving the highest scores.

29 of 33

Study Hours: Is it different between departments?

Measure with ANOVA test.

ANOVA F-statistic: 1.2245

ANOVA P-value: 0.2291

Since the p-value is greater than the alpha level (0.05), we conclude that there is no statistically significant difference in mean Study Hours among the groups based on Departments.

There is no difference in effort in study hours in each department. All students in each department have the same effort in study hours.

30 of 33

The data is less varied, so there is no significant relationship or influence on the total student score. This is evident from the fact that all data points are distributed in similar amounts.

All students have the same opportunity to achieve high scores, regardless of demographics or other factors.

Quick Conclusions.

Let’s consider the results, what information could be obtained.

31 of 33

Recommendations.

There may be other variables that have a greater influence: such as teachers’ quality, school facilities, etc.

Explore Variables.

Since study hours (effort), mental condition (stress), and number of sleeping hours do not have an effect, variable “Semester," should be considered as each higher semester tends to have increased difficulty.

Semester.

Since parents' education level does not have an impact, further research could be done on parenting patterns using qualitative methods to see how study habits are developed at home.

Parents’ Education.

Based on the findings, several recommendations provided below.

Since attendance level has no a significant effect, further research could explore the campus system more deeply—whether or not attendance is mandatory and whether the final exam or projects are prioritized in determining grades.

Attendance Rate.

Since extracurricular participation does not have much impact, further research could examine in more depth the types of extracurricular activities students join—whether related to religion, sports, or research.

Extracurricular Act.

It is necessary to collect more data, such as from multiple cohorts and more departments, if needed.

Generally.

32 of 33

The Dashboard.

Let’s see the overall information.

33 of 33

A good education is a foundation for a better future. – Elizabeth Warren.

THANKYOU.