1 of 38

Measuring the Impact of Remote Knowledge Work Across the United States

Alan Kwan (HKU)

Ben Matthies (University of Notre Dame)

April 2023

2 of 38

Overview

  • What we do:
    • Construct a firm-level economy-wide measure of remote work using data on employee internet activity at each firm
      • Firm-month panel of remote work share for hundreds of thousands of firms in the U.S.

    • We join to confidential tax filing data to deliver first firm-level estimates of the relationship between remote work and output

    • Present stylized facts about the determinants and consequences of remote work
      • Determinants: Relationship to pandemic risk, monitoring cost proxies
      • Consequences: Firm financials, micro estimates of worker behavior from the the internet-activity dataset, flow of workers from in-person to remote work firms

2

3 of 38

Literature positioning

  • Lots of work on whether remote work "works"
    • Erik Brynjolfsson et al. (2022), Cevat Giray Aksoy et al. (2022), Peter Lambert et al. (2022) Carl Benedikt Frey and Giorgio Presidente (2022)
    • Natalia Emanuel and Harrington( 2022), Dalton, Dey and Lowenstein (2022, BLS)

  • Methodologies of how to measure remote work
    • Prithwiraj (Raj) Choudhury et al. (2022), Ruobing Han et al. (2022) Aruna Ranganathan and Aayan Das (2022), Piotr Lewandowski et al. (2022)

  • Some work on housing demand and consequences of remote work
    • John Mondragon and Johannes Wieland (2022), Matthew J. Delventhal and Andrii Parkhomenko (2022) Gupta, Mittal and Van Niewerburgh (2022)

4 of 38

Literature positioning

  • Lots of work on whether remote work "works"
    • Erik Brynjolfsson et al. (2022), Cevat Giray Aksoy et al. (2022), Peter Lambert et al. (2022) Carl Benedikt Frey and Giorgio Presidente (2022)
    • Natalia Emanuel and Harrington( 2022), Dalton, Dey and Lowenstein (2022, BLS)
    • First firm-level study with comprehensive administrative data!
  • Methodologies of how to measure remote work
    • Prithwiraj (Raj) Choudhury et al. (2022), Ruobing Han et al. (2022) Aruna Ranganathan and Aayan Das (2022), Piotr Lewandowski et al. (2022)

While not a drop-in replacement, we are the first to propose a real-time longitudinal measure of remote work

  • Some work on housing demand and consequences of remote work
    • John Mondragon and Johannes Wieland (2022), Matthew J. Delventhal and Andrii Parkhomenko (2022) Gupta, Mittal and Van Niewerburgh (2022)
    • For now, nothing to say here! But we might be able to merge in real estate-related outcome variables.

5 of 38

Roadmap

  • Introduction
  • Data
  • Tests
  • Conclusion

6 of 38

Data sources

  • Our Data Partner operates a cooperative of thousands of media publishers
    • Collect data via cookies (longitudinally tracking users across IPs)
    • 1 billion events per day, we posit can be used to create a remote work measure

  • Combine with confidential U.S. govt filings to estimate firm output
    • Not ready for dissemination. I thought it would be today, should be very soon

7 of 38

Internet activity mapped to firms

  • Our Data Partner operates cooperative of thousands of media publishers
    • Collect data via cookies (longitudinally tracking users across IPs)

1 billion events per day

Many household names in media as well as less well known sites

Publisher

A

Visitor

Visitor

Publisher

B

Data Vendor

Visitor 🡪 firm

Content 🡪 topic

Publisher

C

User

Domain

IP address

Website

Time

Loc.

User agent

User. AK

hku.hk

147.8.113.116

Oct 10, 2022, 1:50pm

Pokfulam, HK

“google chrome PC”

User AK

hku.hk

147.8.117.176

Oct 10, 2022, 2:30pm

Pokfulam, HK

“google chrome PC”

User B

hku.hk

147.8.117.176

https://acme.xyz/whyismyadvisersoamean

May 5, 2022, 11:50pm

Pokfulam, HK

“safari iPhone”

User AK

hku.hk

75.8.112.184

https://medium.com/howtofakeyouknowaboutml

May 5, 2022, 2:50pm

Mid-levels HK

“chrome android”

8 of 38

Scale of content is pretty large

9 of 38

Scale

Wide range of types of sites

Both obscure and well-known publishers

Inferred topic distribution is wide

10 of 38

Remote work measure – based on prepandemic info

  • The goal is to classify IP addresses as {vpn, residential, business, mobile}
    • >760 million in total
  • Step #1: manual classification
    • Rule-sets based on our understanding of the internet
      • e.g. vpn2fa.hku.hk is probably a VPN
    • Coverage of about 60% of internet activity

  • Step #2: Pick up the rest with a model
  • Lots of data sources
    • Reverse DNS data from Rapid 7, Censys, Luminati – residential proxy data “RpaaS: residential proxies as a service”, IAB classification data from an ensemble of Webshrinker, BuiltWith, Dmoz, MaxMind connection type database, Shodan VPN list

11 of 38

Manual rule-set (30%+ of IPs, 60% of traffic)

  • Residential → in the same residential block as a residential proxy
  • Worked with Lumniati to ping their blocks, any IP in the same /24 block (a.b.c.octet4) is likely residential

  • Mobile IP
  • At least 200 mobile phones, majority device types are mobile phones, and either (reverse DNS shows registration to a mobile company MaxMind infers connection type)

  • Business IP
  • If the company has classified this as a business IP consistently throughout 2019 through bi-weekly exercises

  • VPN: reverse DNS is
    • VPN 🡪 vpn2fa.hku.hk
    • Shodan VPN list

12 of 38

Machine learning model

  • Labeled set: ~10k cleanly labeled IP addresses
      • Fortune 1000 known office IPs + some others
      • Residential blocks for Comcast
      • Some mobile networks (where there are at least 200 mobile phones daily)
      • VPNs -- the reverse DNS says VPN in the name
      • Residential proxy networks (consumer VPNs)
  • Feature sets
    • Geographic dispersion The number of unique (a mobile phone, then a VPN will have geographically disperse)
    • Scale of content How much business relevant content you read
    • Composition of content How much social content read; How much adult content read
    • Temporal patterns
    • % domination by a top user / top company

  • Standard random forest model

13 of 38

The classified IP addresses “make sense”

14 of 38

Machine learning metrics

15 of 38

Remote work measure

  •  

15

Tech Firm 1

June 2019

0.05

Tech Firm 1

December 2021

0.85

BioTech Firm 300,000

February 2019

0.10

16 of 38

2020 remote share: across all IPs in the United States

16

17 of 38

By type of IP (normalized to same scale)

18 of 38

18

County-level regression vs SafeGraph

 

 

19 of 38

By state and industry

20 of 38

Roadmap

  • Introduction
  • Data
  • Tests
  • Conclusion

21 of 38

What drives firm decisions to work remotely?

    • Coordination: certain types of work benefit from discussion and collaboration

    • Moral hazard: distance reduces monitoring (e.g. Dessein, Galleoti, Santos (2016), Gumpert, Steimer Antoni (2021))

    • Infrastructure: IT systems for remote work (communication, collaboration, monitoring); Managerial hiring

21

    • Commute distance: firms with longer commute distances benefit more from remote work

    • Labor market: remote work firms may attract and retain stronger talent given strong worker preferences

    • Reduced distraction/malaise

Costs

Benefits

* In some cases remote work is not possible (fast food, service); this type of variation should be absorbed in industry FE

22 of 38

Baseline determinants of remote work

23 of 38

IT drives remote work decisions sensibly

24 of 38

More managers 🡪 afford remote work cost

25 of 38

Two stage least squares regression

26 of 38

Outcome variables

  •  

27 of 38

Summary of results – corporate financials

  • Cannot disclose any actual numbers
  • Overall, the effects of remote work are positive for firm output
  • Sales/assets, sales/wages, etc.
  • Net profit/assets, net profit/wages
  • Sensible variaton across tradeables/non-tradeables

27

28 of 38

Reading as a measure of work?

28

Elusive evidence of the afternoon siesta

29 of 38

Reading as a measure of work?

29

30 of 38

Firm behavior

  • How does remote work impact firm investment into:
    • Information Technologies? (increase monitoring and collaboration)
    • Managerial layers? (increase monitoring)

  • Measures
    • IT Index: count of the number of unique technologies at the firm
      • Constructed using data from Aberdeen Computer Intelligence Database (underlying data provider comes from the Dunn and Bradstreet database which underpins the National Establishment Time Series used in academic studies)
      • Infrastructure, general data, internal data, communication, security
    • Change in percent of job postings related to a managerial role (from 2019 to 2020)
      • Job postings data from Burning Glass Technologies (near-universe of job postings)
  • Results, an exogenous increase in remote work:
    • Firms invest in IT technologies (particularly related to communications and monitoring)
    • Firms tilt hiring activity towards managerial roles

30

31 of 38

Investments in labor

31

32 of 38

Labor effects

  • Does remote work provide an advantage in the labor market?

  • Measures
    • Map every profile on LinkedIn to firms. For each person, compare where they worked just prior to COVID to where they worked post 2020

  • Results
    • Workers flow from low remote work firms to high remote work firms

32

33 of 38

Labor preference

33

34 of 38

Labor preference

34

35 of 38

Labor preference

35

36 of 38

Conclusion / what’s next

  • Hopefully convinced that …
    • This is a real remote work measure
    • And there are material effects of remote work on organizational resources (IT, managers) and firm decisions
    • Plausibly exogenous increases in remote work are associated with increases in firm output (although not yet something we can call “productivity”)
  • Next steps:
    • Estimate production function parameters explicitly
    • Study the impact of remote work on firm investment and innovation [expenditures], data on compensation and labor flows across firms using (anonymized) individual-level tax data

36

37 of 38

Thank you

38 of 38

Commute distance

38