1 of 25

GeoParquet for OGC TC Huntsville

Chris Holmes

Planet

2 of 25

Started in 2013 as part of the Hadoop ecosystem

  • Good for storing big data of any kind
  • Saves on cloud storage space
  • Increased data throughput and performance
  • All major cloud data products support it

3 of 25

4 of 25

Most Cloud products support GEO

5 of 25

X

X

X

X

X

Most Cloud products support GEO

But in slightly different ways, and not interoperable.

X

6 of 25

Parquet for Geospatial

  • Though we’re using ‘GeoParquet’ to get started, the hope is this geospatial encoding becomes standard for any Parquet file
  • Spatial should not be special
    • We shouldn’t need our own special formats — we should be able to teach any existing system to speak geospatial, or we’ll always be a niche.
  • Cloud data warehouses are central in this trend, and really exciting since they all do geospatial very well.

7 of 25

Why Parquest is interesting for geo

  • Size: As small as a zipped shapefile, by default.
    • Save money on cloud storage, faster download
  • Speed: Read and write faster
  • Cloud Data Warehouse Interoperability: BigQuery, Snowflake, etc. likely won’t ever support Shapefile or Geopackage, but they support Parquet
  • Complex Data Model
    • Parquet can handle all sorts of nested data structures, to model real world entities
    • But Geoparquet requires top level ‘default’ geometry, so any ‘normal’ (flat) GIS can handle it in some way

8 of 25

A community driven initiative

Under the umbrella of the Open Geospatial Consortium

github.com/opengeospatial/geoparquet

9 of 25

Geoparquet as a standard storage layer for geo

CLOUD

STORAGE

polygons.parquet

ENTERPRISE DATA

COMPUTING

ENGINES & LIBRARIES

geoparquet

SELECT data FROM myTable JOIN providerTable

PROVIDERS DATA

10 of 25

Inside Parquet you can specify the data types of the columns in a very extensible way. We are using that to specify the Geospatial data on the column.

What is it internally? Not a lot

11 of 25

Status

  • Currently Version 1.0-beta.1. Likely going to rc.1 soon, and 1.0.0 full test engine implemented
  • Website up at geoparquet.org
  • Implementations in GDAL/OGR, GeoPandas, Apache Sedona, QGIS, R, Go, Julia, Dot Net…
  • Planetary Computer provides STAC Index as GeoParquet,
    • Plus Building Footprints & Climate Normals
  • Focus is solely ‘interoperability’, little spatial optimization for 1.0

12 of 25

13 of 25

  • Native columnar geometry format, from GeoArrow.
    • Enables spatial optimizations by leveraging Parquet native indexing. youtube.com/watch?v=uNQrwMMn1jk
  • Spatial Partitioning :  Distribute data set in 2 gig or less chunks along logical boundaries.
    • Be able to treat it as all one, or just focus on a particular part. Enable distributed compute
  • Web Potential:

Beyond 1.0 (where things get exciting)

14 of 25

  • Try it out, use in workflows, give feedback, contribute
  • Support it in your software
  • Provide some data as GeoParquet
    • Be counted as a ‘provider’ in our 1.0 criteria
  • Sponsor Radiant Earth / cloudnativegeo.org, join next sprint

How You Can Help!

15 of 25

Thank You!

16 of 25

What is Cloud Native Geospatial?

17 of 25

Geospatial Standards & Software built for the cloud from the ground up

18 of 25

19 of 25

Technical Principles

  • Read-oriented
  • Index-accelerated partial read over HTTP
  • Open specifications, software, and data
  • Multiscale metadata

20 of 25

21 of 25

  • Stream GeoTIFF’s for visualization and analysis without having to download
  • Maxar was early adopter with support in GBDX and converting Ikonos data
  • Recently accepted as full OGC standard

22 of 25

  • Stream point clouds for visualization and analysis.
  • Is still a .laz, so works with legacy tools just like COG’s & GeoTIFF’s

23 of 25

  • Cloud-native format for multi-dimensional data cubes
    • Thinks HDF5 or NetCDF
  • Not backwards compatible like COG/COPC, but new NetCDF is using it

24 of 25

  • Simple JSON format that adds metadata to any cloud-native format
  • STAC API is almost 1.0, enables search of STAC records using OGC Features API
  • Funded and convened by Radiant Earth
  • Maxar was early supporter - sponsor of first sprint + early implementations

25 of 25