1 of 19

Redivis: A Scalable Web Platform for Business Research

Alex StorerIan MathewsErin Delaney

Stanford GSB – Director of Data, Analytics & Research Computing�Redivis – CEO �Redivis – Head of Design

PEARC 2024

2 of 19

Doesn’t Business School Just Do MBAs?

Research is a key facet of Stanford’s Graduate School of Business

Academic Departments

  • Accounting
  • Economics
  • Finance
  • Marketing
  • Operations, Information & Technology
  • Organizational Behavior
  • Political Economy

PEARC 2024

3 of 19

What distinguishes Business Data?

Licensed Data

Diverse Data

Private Data

PEARC 2024

4 of 19

What distinguishes Business Data?

Licensed Data

Diverse Data

Private Data

“To examine rent control’s effects on tenant migration and neighborhood choices, we make use of new panel data which provide address-level migration decisions and housing characteristics for the majority of adults living in San Francisco in the early 1990s.”

Diamond, Rebecca, Tim McQuade, and Franklin Qian. 2019. "The Effects of Rent Control Expansion on Tenants, Landlords, and Inequality: Evidence from San Francisco." American Economic Review, 109 (9): 3365–94.

DOI: 10.1257/aer.20181289

PEARC 2024

5 of 19

A recurring problem

“To examine rent control’s effects on tenant migration and neighborhood choices, we make use of new panel data which provide address-level migration decisions and housing characteristics for the majority of adults living in San Francisco in the early 1990s.”

  • A multi-terabyte database
  • Needs to be queried and merged with other large databases
  • Data Administrators need to tightly manage access

PEARC 2024

6 of 19

What about HPC?

My experience: This is hard!

  • Distributed databases on Slurm aren’t easy to configure
  • Big data in parallel can be bumpy for I/O
  • Permissions can be very confusing for users and administrators alike
  • “Read means copy”

  • A multi-terabyte database
  • Needs to be queried and merged with other large databases
  • Data Administrators need to tightly manage access

PEARC 2024

7 of 19

What about the Cloud?

My experience: This is hard!

  • Unexpected costs (e.g., someone queries your database in US-East-1 instead of US-West-2)
  • Not designed for researchers (Tricky UI, Reproducibility?)
  • Permissions and logging is very complicated

  • A multi-terabyte database
  • Needs to be queried and merged with other large databases
  • Data Administrators need to tightly manage access

PEARC 2024

8 of 19

What would we like?

  • High performance querying
  • Friendly user interface (particularly for less technical users)
  • Friendly interface for administrators
  • Privacy-first approach to data storage

  • A multi-terabyte database
  • Needs to be queried and merged with other large databases
  • Data Administrators need to tightly manage access

PEARC 2024

9 of 19

What would we like?

  • High performance querying
  • Friendly user interface (particularly for less technical users)
  • Friendly interface for administrators
  • Privacy-first approach to data storage

These are key features of Redivis!

PEARC 2024

10 of 19

Introducing Redivis

  • A data platform for academic research, built in partnership with Stanford since 2016
  • Independent worker-owned company, specifically targeting academia
  • Application layer on GCP to provide a collaborative analysis platform for large, diverse data that respects dataset privacy

PEARC 2024

11 of 19

Redivis by Example – Data Discovery

PEARC 2024

12 of 19

Redivis by Example – Dataset Access

URL traffic for a representative panel of users, dozens of TB of data

  • Licensed by GSB, with strict data usage requirements
  • Only named users can view, downloading extracts on a case-by-case basis

PEARC 2024

13 of 19

Redivis by Example – Table Exploration

Easily scroll through a single 2.5TB table

- Quick Summary Statistics

- SQL Queries on this table

- Integrated Data Dictionary

– 11.5 billion web site visits from the year 2020

PEARC 2024

14 of 19

Redivis by Example – Data Analysis

Find the visits to CDC.gov in 2020 and investigate the relationship to Household Income

- Beginner Friendly Interface

- Compiles to Standard SQL

- 27 seconds to execute!

PEARC 2024

15 of 19

Redivis by Example – SQL Queries

Write SQL Queries Directly

- Within a project

- Using the API

PEARC 2024

16 of 19

Redivis by Example – Notebook Interface

Notebook Interface

- Pick your VM size

- Easy integration with Queries

- Same network controls as source data

- Project members can collaborate

- Code history is saved automatically

PEARC 2024

17 of 19

Redivis by Example – Project Interface

Shareable projects show:

- included data and tables

- transforms and notebooks in a directed graph

https://darc.stanford.edu/pearc

PEARC 2024

18 of 19

Redivis by Example – Administrator View

Easy to read view of who did what on Redivis

PEARC 2024

19 of 19

Conclusion

8 years of Iterative Development → 10 minute Presentation 😤

Stanford GSB Data, Analytics, and Research Computing: https://gsbresearchhub.stanford.edu/

Key Redivis Features – More at redivis.com

  • High performance querying
  • Friendly user interface (particularly for less technical users)
  • Friendly interface for administrators
  • Privacy-first approach to data storage

Alex Storer – astorer@stanford.edu

Ian Mathews - ian@redivis.com

Erin Delaney - erin@redivis.com

PEARC 2024