1 of 10

1

CityScale HackTeam Members :

Thomas Marconi, Andrew Mahorner, & Schmidt Joseph

Scale Your Needs

2 of 10

Motivation & Problem Statement

We have data. What do we do with it?

Problem Case 2 (City and Tract-level Affordability Indexes)�

  • What’s the problem?
    • Disparate and complex dashboarding tools create a lack of actionable insights for non-data literate users
    • Policymakers, real estate investors, and nonprofits have no easy way to view data about census tracts and interact with this data to experiment with possible future case scenarios
  • Why did we choose this?
    • Democratization of data insights allows for key stakeholders like policymakers to understand their city’s needs
    • Data-driven solutions are the key to making positive impact in the lives of real people

3 of 10

Solution Summary

One-stop shop for policy makers to understand their city’s at risk population areas

  • Purpose: Generates a PCA-based Housing Affordability & Risk Index across tracts in 25 major U.S. cities
  • Features:
    • Interactive heatmap dashboard
    • Integrated filters for targeted analysis
    • Chatbot for natural language data analysis
  • Impact & Use Cases:
    • Policymakers: Direct funding, rental aid, and development near transit
    • Real Estate Planners: Identify locations for highest social and economic impact
    • Nonprofits: Focus outreach where communities need it most

4 of 10

5 of 10

Data Description and APIs

  • Description/key stats summarizing data
    • How much (depth & width)?
      • Used curated feature set
          • From 178 total features to ~40
  • Data pipelines/transformations description
    • Structured the data into sqlite db
      • Retrieve a feature for a tract
      • Retrieve all features for tract
      • Retrieve all tracts for a city
    • Added geographic data of tract from USCB shapefiles
    • Assumptions:
      • Data is coherent

CityScale’s Architecture

6 of 10

Method Description

Primary Technique: Unsupervised Dimensionality Reduction (PCA)

  • Unsupervised dimensionality reduction that transforms 20+ housing metrics into a single affordability risk score (0-100)
    • Why PCA?
      • No labeled training data needed
      • Handles correlated features naturally
  • Steps to generating the PCA based index:
    • 1. Create a PCA using the entire data set
    • 2. Label the PCs that were generated by PCA
    • 3. Standardize (z-score)
    • 4. Create composite score
    • 5. Redistribute on 0-100 scale
  • Tech Stack:
    • scikit-learn, pandas/numpy, next.js, fastapi, leaflet, sqlite

Consider adding any images that represent the method you used.

7 of 10

Performance Results

Correlation of high-risk tracts aligned with eviction rates

    • Validation:
      • Compared CityScale’s tract-level risk scores with EvictionLab’s (Princeton) eviction rates
    • Method:
      • Classified results into Low, Moderate, and High risk buckets and analyzed via a confusion matrix
    • Findings:
      • As real-world risk (rows) increases, predicted scores (columns) trend higher — showing strong directional alignment
    • Note:
      • Model outputs are more conservative, incorporating broader socioeconomic factors beyond eviction rates

This could be a great place to put a confusion matrix!

A chart/graph of your results would be a nice addition.

Confusion Matrix

8 of 10

Our Challenges

  • Lack of Domain Knowledge
    • Reliance on mentors and SMEs

  • Feature Set too complex
    • Trimming down the feature set to create a model that accurately predicted outcomes

  • Missing tract coordinates/shapes

  • Early Chatbot hallucinations

9 of 10

Demo

10 of 10

Implications

  • Bridging the Problem → Solution
    • Problems:
      • Disparate data dashboards
      • No easy way to understand our data and drive policy
      • What do our people need?
    • Solution:
      • Interactable heat map based dashboard with filters and natural language analytics layer
  • Implications
    • One stop shop understanding of what is going on in our city for data literate and illiterate users
    • Data driven decisions that help real people
  • Future Steps
    • Fine tune housing index model with the help of PhD level SMEs
    • Improve chatbot to fine tune responses and create more valuable insight about the data
    • Deploy to cloud infrastructure to allow for widespread use
      • Scale to all backgrounds, example case a College Graduate understanding the best areas to live that would excel their career goals
    • Partner with city staff, survey residents in selected tracts, visit neighborhood sites to find more raw findings to refine data set
    • Cater an app version for homeowners or renters to understand where is affordable in their city