What Drives a Country’s
Data Performance?
Using World Bank Statistical Performance Indicators (2023)
Mahya Tazike | Statistics for Data Science | Spring 2026
Audience and Objective
International development policymakers (World Bank, UN)
Objective
Identify which data capabilities are most strongly associated with overall statistical performance.
Source: World Bank Statistical Performance Indicators (SPI)
Audience
Data Overview
187
Countries
2023
Year Focus
0 - 100
Score Range
Outcome
overall score
The 5 Sub-Dimensions Measured:
Data Use: How data is used for decision-making and policy
Data Services: Availability and accessibility of data to users
Data Products: Quality and range of statistical outputs (reports, indicators)
Data Sources: Data collection systems (surveys, administrative data)
Data Infrastructure: Systems and tools supporting data storage and management
Source: World Bank Statistical Performance Indicators (SPI)
5
Sub-Dimensions
Distribution of Overall Statistical Performance (2023)
Key insight:�Scores range widely (28 to 95). Analysis shows high-income countries score ~25 points higher than low-income countries.
Why histogram?�Shows the distribution of overall scores across countries
What it shows:�Most countries fall in the mid-to-high range (60–90), with noticeable variation across countries
Overall Statistical Performance by Income Group (2023)
Key insight:�High-income countries score about 25 points higher than low-income countries on average, indicating a clear income-based gap in statistical performance.
Why boxplot?�Compares the distribution of scores across different income groups
What it shows:
Relationship Between Data Use and Overall Performance (2023)
Key insight:�There is a strong positive relationship between data use and overall statistical performance.
Why scatterplot?�Shows the relationship between two numeric variables
What it shows:
Analysis: Hypothesis Test and Regression
Part A: Hypothesis Test
Question:�Do high-income and low-income countries have significantly different overall scores?
H₀: No difference between groups�H₁: High-income countries score higher
Method: Two-sample t-test
Result:�p-value = 2.76e-10 → highly significant difference
Mean high-income: 81.2 | Mean low-income: 56.4
Part B: Regression Model
Question:�Which sub-dimension is most strongly associated with overall score?
Step 1 (primary):�overall_score ~ data_use_score��Step 2 (extended):�overall_score ~ data_use_score + data_products_score
Interpretation:�A 1-point increase in data use score is associated with a
0.75-point increase in overall score.
Model 1 R-squared = 0.735 (73.5% of variation explained)
Model 2 R-squared = 0.812 (81.2% with data products added)
Why regression? Variables are numeric, and we want to measure the strength of association with overall score.
Key Findings and Recommendations
01
Income gap is significant
High-income countries score ~25 points higher than low-income countries on average.
02
Data use is the strongest association
Countries with higher data use scores tend to have higher overall performance.
03
Data products also matter
Including data products adds explanatory value to the model.
Recommendation:
International organizations should prioritize data use capacity and data products when supporting lower-income countries. These dimensions show the strongest association with overall statistical performance.
Limitations:
Cross-sectional analysis (2023 only). Relationships are associative, not causal.