1 of 10

Devika Kakkar

Harvard University, CGA

kakkar@fas.harvard.edu

Measurement of Partisan Segregation of 180 million U.S. voters using advanced GIS Data Science

2 of 10

Project Overview

Objective: Measure partisan segregation at individual level for 180 Million U.S. voters

This work is partially sponsored by NSF Awards #1841403 and OmniSci

3 of 10

Challenges: Big geospatial data processing

Traditional method

Challenges

K-Nearest Neighbor search

  • Creating pairwise distance matrix and sorting on distance
  • Buffer method

  • Dataset of 180 Million records implies trillion calculations
  • Traditional method are slow and inefficient

Partisan exposure calculations

  • Execute using scripting language e.g., Python, R
  • Partisan weight calculations from 180 Billion distances
  • Scripting methods are slow and resource intense
  • I/O speed is very slow

[Images: Nature Human Behaviour]

4 of 10

Solution: Available Computing Resources

+

5 of 10

Solution: Two-Layered Approach for KNN Calculations

1. Geohash based Spatial Clustering

2. R-tree based Index Search

  • Coordinates expressed as alphanumeric string
  • Longer the shared prefix, closer the points
  • Closer points clustered together on disk
  • Provides fast and efficient access to data
  • Pure spatial Index based search
  • Faster, cheaper and more efficient
  • Searches up and down the Bounding boxes

6 of 10

Solution: Accelerated GPU based processing of partisan exposure

180 Billion relations

180 million�voters

1000 neighbors

7 of 10

Solution: Novelty of our approach

KNN calculations @ 200,000 distances/sec

Partisan weights @ 800M distances/sec

Accelerated GPU processing

Big geospatial data using Data Science

Extremely fast I/O on big data

Cost and Time Effective

8 of 10

Results: Partisan exposure of individual US voters

[Images: Nature Human Behaviour]

9 of 10

Results: Publications and news coverage

10 of 10

References

[1] Brown J. & Enos R., The measurement of partisan sorting for 180 million voters, Nature Human Behavior, 2021 https://www.nature.com/articles/s41562-021-01066-z.epdf

[2] Badger B., Quealy K. & Katz. J, A Close-Up Picture of Partisan Segregation, Among 180 Million Voters, The New York Times, 2021 https://www.nytimes.com/interactive/2021/03/17/upshot/partisan-segregation-maps.html

[3] Kakkar D., Lewis B., Singh R., OmniSci Virtual Summit, 2020 https://www.youtube.com/watch?v=3DlOeWqDMSs

[4] Kakkar D., Lewis B., Scaling geospatial processes on Harvard’s high-performance cluster, Harvard DataFest, 2020 https://drive.google.com/file/d/1FEnh-okCNLuthtyQtoBldyid7D6Sb-F_/view?usp=sharing

[5] Introduction to Cluster Computing on FASRC: https://www.rc.fas.harvard.edu/wp-content/uploads/2019/12/Intro-to-Cannon.pdf

[6] About Postgis: https://postgis.net/

[7] OmniSci Overview: https://docs.omnisci.com/latest/4_distributed.html