1 of 19

Spatial data: Map and location information as another data point on the road to Diversity, Equity, and Inclusion (DEI).

Datag Summer 2023

Gordon Douglas

NERIC Testing and Data Reporting Team

2 of 19

Why use spatial data?

  • It adds another dimension to the existing data that you already have (or can easily get). You can add a “where” component to your data that you may not otherwise be considering when finding patterns or correlations
  • Most of the population are visual learners, and map data is inherently visual by nature. It presents data in a way that is natural for most people to look at and extrapolate information from
  • There are an abundance of available spatial

datasets to use in your data analyses

  • It allows additional opportunities to connect

some non-education datasets comparatively in

a visual way to look at your educational

data through some lenses you may not

have considered.

3 of 19

GIS Data basics

  • GIS (Geographical Information System) data is all over the place, in use by public as well as commercial entities (educational and otherwise) for a multitude of data analysis purposes. We can take advantage of the large amount of public GIS datasets available for our uses.
  • Data is typically provided as a csv of latitude/longitude points for individual locations, Shape files, or a Geodatabase. Shape files and geodatabases can contain more than just lat/lon; such as lines, areas (polygons), as well as other attributes about the data
  • Data can be stored and used as provided, or converted between formats, including conversion into a number of standard databases types (SQL, noSQL, cloud, etc.)
  • GIS data uses a projection system within files (NAD83, WGS84, etc), which would roughly be equivalent to Mercator, Cartesian, etc. coordinate systems on paper maps. Referred to as SRID (spatial reference ID) within GIS data. Make sure all your projections are the same (we use SRID 4326 as it’s the most common default) using a GIS tool such as ogr2ogr if necessary
  • GIS systems software (including visualization software) typically consists of a base map (in our case the US) with various levels of zoom-ability that is loaded in as tiles. We then add our data on top of this as layers, lines (“edges”), and points to display the information we’re looking for. Various formatting can also be applied to emphasize pieces of data

4 of 19

What spatial data is out there? (Public Resources) and what do we do with it?�

  • What datasets are available for our use with analyzing education data spatially?
    • State, County, City/Town, School District boundaries – Civil Boundaries from NYS GIS Clearinghouse
    • Zip Code boundaries (zip code tabulation areas) – from US Census Bureau / USPS
    • Catchment areas? Possibly through bussing software already, or drawn yourself using qgis/grass
    • Other Comparative Census datasets (area demographics, density, household income, crime, broadband, taxes, etc.) – available from various sources like counties and municipalities, other government.
      • Data Warehouse data you are already reporting, or other SIS data you have
      • Crime statistics from NY Dept of Criminal Justice, US Dept of Justice
      • Health statistics from NY Dept of Health, US HHS and CDC
      • Employment statistics from NY Dept of Labor, US Bureau of Labor Statistics
      • Demographic, Housing, Economic, Social, Health, Infrastructure statistics (state, county, municipality levels) from US Census Bureau and NCES American Community Survey, USGS, other entities

5 of 19

Geocoding for individual addresses

    • Individual geocoded address data – make sure any service that isn’t run locally is ED law 2d compliant
      • SEDREF Reports for Institutions (School buildings, other public buildings) locations, coordinates already included in full listing
      • Options for geocoding student addresses – make sure Ed Law 2d compliant if using 3rd party
        • Does you transportation software already do this? Can you export it?
        • Some SIS’s have some built in functionality
        • Commercial (LOTS of providers)? – make sure Ed law 2d compliant
        • US Census Bureau geocoding (free for batches up to 10,000) using hashes to keep it anonymous?
        • NYS SAM (street address mapping) geocoding (M-F 7am-3:30pm can use in ARCgis and QGIS, or download dataset)
        • Open source geocoding server (PostGIS, Nomanatim, Pelias, GISgraphy)
        • Geocoders for databases you may already use (oracle, ms sql, etc.) – make sure data provider is ed law 2d like above
        • Excel PowerMap Geocoding (uses Bing as Provider)

6 of 19

Running your own geocoding server – the route NERIC takes

  • PostgreSQL with PostGIS performs the geocoding on as many records as you need, as often a needed.
  • Downloads the MAF (Master Address File) from US Census Bureau for whatever states you define to geocode against (done using PostGIS scripts). Does require updating dataset when census updates to have new roads, etc.
  • Performance/Speed depends on hardware and data cleanliness (ours averages 45 seconds / 1k records)
    • For most LEA’s this won’t matter as address set is small. NERIC has ~225k addresses per year.
    • Caching already geocoded addresses from previous run improves performance since you don’t need to geocode the same address again. De-duplicate this cached set for additional benefits. (re-run full set of addresses when updating data tables for improvements)
    • Would possibly like to move from address matching to location key mapping in the future
  • NERIC does additional geofencing is using district and/or zip code boundaries to find addresses outside of the district/zipcode. We buffer the edges of these boundaries to accommodate for any edge inaccuracies. You can also utilize distance radius from a building within the district, for entities within boundaries (non-pubs)

7 of 19

PostGIS geocoding process

  • Geocoded addresses provide accuracy rating to easily find addresses that are not where expected
  • Accuracy is high, with most mismatches due to malformed addresses. (“0 Unknown St.”)
    • Standardizing addresses the best you can improves process (i.e. Main St. vs main street). If you use an online standardizer, make sure it’s Ed law 2d compliant. PostGIS has a built-in standardizer (normalize_address)
    • If exact address is not found, interpolation attempts to provide coordinates. I.E. if “30 Main St” and “38 Main St” exist but “32 Main St” doesn’t, the geocoder will place it between them an approximate distance and on same side of street.

8 of 19

How can we use Geospatial Data to help us with inequity? Are there correlations between metrics you track (or could) and location?

  • Standard metrics you might want to map – Demographics/Program Services (subgroups), Enrollments (exit codes, retentions), Exam Proficiency (or refusals), HS Outcome (Grad Rate), Attendance, Discipline, Course Failures, etc.; both longitudinally and comparatively
  • ESSA data: district/building/subgroup standings or performance indices physically across your district and others (since data is public)
  • Tracking of homeless/migrant populations as they change addresses, or Out of district students (Homeschool, tuition pay, etc.)
  • Student mobility and transience - population movement of students into/out of/within district
  • Virtual Learning/DL students – where do they attend from and where do yours attend?
  • Digital Equity/SEL data (Connectivity/Device access), income disparity

We want to look at the data comparatively, longitudinally, and at various aggregation levels

9 of 19

How can we use Geospatial Data to help us with inequity? Are there correlations between metrics you track (or could) and location? (Cont.)

  • Average time or distance to School vs attendance
  • Proximity to community resources (schools, churches, emergency services, public transportation, etc.) or to private/charter schools
  • Proximity to hazards (persistently dangerous areas, or dangerous natural/man-made structures, natural disaster data)
  • Post-secondary data (where are they going? What type of institutions, and what degrees are they earning?)
  • HR and DW Staff Data analysis (more inexperienced teachers in one building vs another, etc.)
  • Business office data such as tax data, budget data, etc. broken out by building, zip code, etc.
  • Health data such pandemic, immunizations, obesity, rates of asthma, diabetes, etc.

We want to look at the data comparatively, longitudinally, and at various aggregation levels

10 of 19

What are some pieces you might want to see from your data or ours?

11 of 19

Tools for visualizing the data

  • NERIC uses Tableau
  • Other LEA’s use PowerBI, Cognos, or other systems
  • Excel (2013 or higher) using PowerMap
  • Google Looker (Data Studio) – geo/map charts
  • Open Source options
    • GeoDa
    • Metabase open source edition
  • Depending on tool you may have a lot of formatting flexibility to visualize the data by lines, points, borders, shapes, colors (with dithering or shading), proximity, or text. Most tools support multiple layers of visualization in addition to the above. Tooltips and legends can help highlight additional details

Make sure products used are Ed law 2-d compliant

12 of 19

Data.nysed.gov publicly released data�Comparison here by district boundary by exam proficiency rate (all exams), but could easily be filtered to specific exams, or any other data.nysed.gov database metrics (enrollment, grad rate, Percents ELL/Homeless/SWD/ED etc.). Can focus down to RIC, BOCES, District levels

13 of 19

Exam Proficiency – Overall vs. ED vs. SWD

14 of 19

Internet Access

15 of 19

Homeless within 1.5 miles of School

16 of 19

Movement in/out of district

17 of 19

Economically Disadvantaged vs Income

18 of 19

SWD vs income�

19 of 19

Thank you for attending!�

gordon.douglas@neric.org