1 of 22

A Crash Course

in Data Analysis

July 2024

Confidential • All rights reserved © 2023 Stellar Development Foundation

2 of 22

Overview

  1. Accessing Hubble
  2. Hot Tips for Querying
  3. How to Dumpster Dive

3 of 22

Accessing Hubble

4 of 22

Accessing Hubble - Logging in

Logging in to GCP to use BigQuery is easy!

  • Check out our Hubble developer docs
  • Specifically
    • The Connecting section
    • The End to End Analysis Example
    • The Data Dictionaries
      • In the developer docs
      • Within the BigQuery tables schema

4

5 of 22

Accessing Hubble - Logging in

You should see something like this

5

6 of 22

Accessing Hubble - Adding crypto-stellar

  • You can star the `crypto-stellar` GCP project to add our public dataset to your UI

6

7 of 22

Accessing Hubble - UI Navigation

  • You can use the search bar to find tables or use the arrow buttons to expand/collapse datasets to list tables
  • The “SCHEMA” tab will show the available columns in the table
  • The “DETAILS” tab will show the partitioned/clustered columns for the table

7

8 of 22

Hot Tips for Querying

9 of 22

Hot Tips for Querying

The general structure of a query consists of:

  • The column(s) of data to return
  • The table(s) to get data from
  • How the data should be filtered

9

Data to return

Tables to use

Filters to apply

10 of 22

Hot Tips for Querying

Don’t use “SELECT *”

  • Only SELECT the columns of data you need
  • This makes the query run faster and process less data
    • Before you run a query you can see the amount of data that will be processed by the query

  • SELECT * processes 23x the amount of data compared to only selecting the subset of columns actually needed

10

11 of 22

Hot Tips for Querying

Use the “PREVIEW” tab instead of “SELECT *” to preview the data

11

12 of 22

Hot Tips for Querying

  • Most tables are partitioned and have clustering on certain columns
    • At a minimum, try to filter by the partitioned and/or clustered columns
  • Note: Some tables are partitioned by different columns (e.g. closed_at or batch_run_date)
  • Note: LIMIT does NOT change the amount of data that will be processed. It only limits the number of rows returned in the results

Add filters to reduce the amount of data queried/returned

12

13 of 22

Hot Tips for Querying

Add filters to joined tables as well

13

14 of 22

Hot Tips for Querying

Try not to do unnecessary joins or complex joins when not needed

  • Unnecessary JOINs increases the amount of data that needs to be processed for no reason
    • For example, you can use enriched_history_operations instead of manually joining operations, transactions, and ledgers
  • Note: You will probably never need to do a CROSS JOIN (cartesian join)

14

15 of 22

Hot Tips for Querying - tl;dr

There are some general practices that everyone should follow

  • Don’t use “SELECT *”
    • Only SELECT the columns you want
  • Use the table “PREVIEW” for a free quick view of table data
  • Add filters to reduce the amount of data queried/returned
    • Filter by datetime with closed_at or batch_run_date
  • Filtering can be applied to joined tables
  • Try not to do unnecessary joins or complex joins when not needed
  • Most queries should complete in seconds/low minutes
  • Take note of the estimated amount of data a query will process
    • MB good; TB bad
  • Ask for help in ⁠🔹|hubble if needed

15

16 of 22

How To Dumpster Dive

17 of 22

Tips for Data Analysis

  1. Be clear about your objective
  2. Learn your business domain
  3. Learn the Stellar Network protocol
  4. Be skeptical!

17

18 of 22

History vs Ledger State Tables

18

History

  • Transaction, event based
  • What is the average fee charged for a Soroban transaction?
  • What is the total daily transactional volume for <asset>?
  • What is the average trade size for a given asset pair?

State

  • Ledger Entry State, given point in time
  • Contains full history of the ledger
  • What was my account balance on Jan 1, 2023?
  • How many unclaimed claimable balances are on the network?
  • What is the market cap for <asset>?

19 of 22

Examples

20 of 22

Have any of my contract data entries expired?

20

21 of 22

Visualizing Your Data

22 of 22

Questions?