1 of 8

What Drives a Country’s

Data Performance?

Using World Bank Statistical Performance Indicators (2023)

Mahya Tazike | Statistics for Data Science | Spring 2026

2 of 8

Audience and Objective

International development policymakers (World Bank, UN)

Objective

Identify which data capabilities are most strongly associated with overall statistical performance.

Source: World Bank Statistical Performance Indicators (SPI)

Audience

3 of 8

Data Overview

187

Countries

2023

Year Focus

0 - 100

Score Range

Outcome

overall score

The 5 Sub-Dimensions Measured:

Data Use: How data is used for decision-making and policy

Data Services: Availability and accessibility of data to users

Data Products: Quality and range of statistical outputs (reports, indicators)

Data Sources: Data collection systems (surveys, administrative data)

Data Infrastructure: Systems and tools supporting data storage and management

Source: World Bank Statistical Performance Indicators (SPI)

5

Sub-Dimensions

4 of 8

Distribution of Overall Statistical Performance (2023)

Key insight:�Scores range widely (28 to 95). Analysis shows high-income countries score ~25 points higher than low-income countries.

Why histogram?�Shows the distribution of overall scores across countries

What it shows:�Most countries fall in the mid-to-high range (60–90), with noticeable variation across countries

5 of 8

Overall Statistical Performance by Income Group (2023)

Key insight:�High-income countries score about 25 points higher than low-income countries on average, indicating a clear income-based gap in statistical performance.

Why boxplot?�Compares the distribution of scores across different income groups

What it shows:

  • Median scores increase with income level
  • High-income countries have consistently higher scores
  • Lower-income groups show more variability

6 of 8

Relationship Between Data Use and Overall Performance (2023)

Key insight:�There is a strong positive relationship between data use and overall statistical performance.

Why scatterplot?�Shows the relationship between two numeric variables

What it shows:

  • As data use increases, overall performance increases
  • The relationship appears roughly linear
  • Some variation exists, but the upward trend is clear

7 of 8

Analysis: Hypothesis Test and Regression

Part A: Hypothesis Test

Question:�Do high-income and low-income countries have significantly different overall scores?

H₀: No difference between groups�H₁: High-income countries score higher

Method: Two-sample t-test

Result:�p-value = 2.76e-10 → highly significant difference

Mean high-income: 81.2 | Mean low-income: 56.4

Part B: Regression Model

Question:�Which sub-dimension is most strongly associated with overall score?

Step 1 (primary):�overall_score ~ data_use_score��Step 2 (extended):�overall_score ~ data_use_score + data_products_score

Interpretation:�A 1-point increase in data use score is associated with a

0.75-point increase in overall score.

Model 1 R-squared = 0.735 (73.5% of variation explained)

Model 2 R-squared = 0.812 (81.2% with data products added)

Why regression? Variables are numeric, and we want to measure the strength of association with overall score.

8 of 8

Key Findings and Recommendations

01

Income gap is significant

High-income countries score ~25 points higher than low-income countries on average.

02

Data use is the strongest association

Countries with higher data use scores tend to have higher overall performance.

03

Data products also matter

Including data products adds explanatory value to the model.

Recommendation:

International organizations should prioritize data use capacity and data products when supporting lower-income countries. These dimensions show the strongest association with overall statistical performance.

Limitations:

Cross-sectional analysis (2023 only). Relationships are associative, not causal.