1 of 56

PostGIS �Performance Tips

paul.ramsey@crunchydata.com

2 of 56

3 of 56

MongoDB is WeB ScALe!

4 of 56

Mandatory dig at NoSQL databases and the sad, sad, Sad people who use them

5 of 56

😀�PostGIS is great! So fast!

😡�PostGIS is unacceptably slow!

😐�…

The hero’s journey…

6 of 56

How do I make PostGIS faster?!?

7 of 56

It Depends

8 of 56

The $250,000 feature flag!

Brought to you by…

ALTER SESSION � SET SPATIAL_VECTOR_ACCELERATION = true

9 of 56

The dream…!

The reality…

ALTER SESSION � SET postgis_faster = true

yum update geos

10 of 56

GEOS 3.10

GEOS GOES FASTER

crunchydata.com

11 of 56

GEOS GOES FASTER

crunchydata.com

12 of 56

It Depends

13 of 56

What “It Depends” On

  • Data size
    • How many records in that table?
  • Data heterogeneity
    • How variable is geometry extent?
  • Data complexity
    • How many vertices in those geometries?
  • Query complexity
    • How many clauses in that SQL?

crunchydata.com

14 of 56

Data Size

Foreach Record(A)� Foreach Record(B)

TestJoinCondition()

TestJoinCondition()

runs 10,000 times

crunchydata.com

15 of 56

Data Size

Foreach Record(A)� Foreach Record(B)

TestJoinCondition()

TestJoinCondition()

runs 100,000,000 times

crunchydata.com

16 of 56

Performance Principle #1

Remember Spatial Indexes

Build them! �Use them! �EXPLAIN ANALYZE!

crunchydata.com

17 of 56

What Indexes? (Advanced)

  • GIST Spatial Indexes (R-tree)
    • CREATE INDEX foo_geom_x � ON foo USING GIST (geom);
  • For spatial joins
  • For queries with good selectivity (smaller result sets)
  • Watch out for highly heterogeneous data �(we will come back to this)

crunchydata.com

18 of 56

What Indexes? (Advanced)

  • BRIN Spatial Indexes? (Block range)
    • CREATE INDEX foo_geom_x � ON foo USING BRIN (geom);
  • For data stored with high spatial autocorrelation
    • Usually ordered or sorted points
  • For queries with poor selectivity (larger result sets)
    • 100s of thousands of results

crunchydata.com

19 of 56

What Indexes? (Advanced)

Good spatial autocorrelation

Poor spatial autocorrelation

crunchydata.com

20 of 56

What Indexes? (Advanced)

  • BRIN Spatial Indexes? (Block range)
    • CREATE INDEX foo_geom_x � ON foo USING BRIN (geom);
  • Presort using a GeoHash to add spatial autocorrelation
  • Or just stick to on-disk temporal autocorrelation that is a side effect of real-time data capture

crunchydata.com

21 of 56

What Indexes? (Advanced)

  • SP-GIST Spatial Indexes? (Quad-tree)
    • CREATE INDEX foo_geom_x � ON foo USING SPGIST (geom);
  • For more homogeneous data?
    • All objects of similar size
  • For more static data?
    • SPGIST is newer less-used code than GIST
  • Measure before you commit

crunchydata.com

22 of 56

Spatial Indexes (RTree)

gist_geometry_ops_2d builds…

  • Dynamically tuned to the input data, unlike grid-based index systems.
  • Default opclass is 2D, there is also an ND opclass.
  • For every input geometry, the index stores… a box.

Dun, dun, duuuun!!

crunchydata.com

23 of 56

24 of 56

25 of 56

26 of 56

27 of 56

SELECT pts.geom

FROM pts p

JOIN countries c

ON ST_Intersects(p.geom, c.geom)� WHERE c.name = 'Canada'

28 of 56

SELECT pts.geom

FROM pts p

JOIN countries c

ON ST_Intersects(p.geom, c.geom)� WHERE c.name = 'Canada'

29 of 56

SELECT pts.geom

FROM pts p

JOIN countries c

ON ST_Intersects(p.geom, c.geom)� WHERE c.name = 'Canada'

30 of 56

Performance Principle #2

Strive for Homogeneity

Shapes of similar extent

Filters of similar selectivity�EXPLAIN ANALYZE!

crunchydata.com

31 of 56

WITH subdivided AS (SELECT ST_Subdivide(geom) AS geom� FROM countries� WHERE c.name = 'Canada'�)

SELECT pts.geom

FROM pts p

JOIN subdivided s� ON ST_Intersects(p.geom, s.geom)

32 of 56

WITH subdivided AS (SELECT ST_Subdivide(geom) AS geom� FROM countries� WHERE c.name = 'Canada'�)

SELECT pts.geom

FROM pts p

JOIN subdivided s� ON ST_Intersects(p.geom, s.geom)

33 of 56

WITH subdivided AS (SELECT ST_Subdivide(geom) AS geom� FROM countries� WHERE c.name = 'Canada'�)

SELECT pts.geom

FROM pts p

JOIN subdivided s� ON ST_Intersects(p.geom, s.geom)

34 of 56

WITH subdivided AS (SELECT ST_Subdivide(geom) AS geom� FROM countries�)

SELECT pts.geom

FROM pts p

JOIN subdivided s� ON ST_Intersects(p.geom, s.geom)

35 of 56

Performance Principle #3

Enough Detail and No More

Use simplification

Use approximations�Beware of the TOAST

crunchydata.com

36 of 56

WITH subdivided AS (SELECT ST_Subdivide(geom) AS geom� FROM countries� WHERE c.name = 'Canada'�)

SELECT pts.geom

FROM pts p

JOIN subdivided s� ON ST_Intersects(p.geom, s.geom)

37 of 56

TOAST

  • The Oversize Attribute Storage Technique
  • PostgreSQL has a fixed 8KB page size
  • Once a geometry has more than 200 vertices, the odds that it is in an oversized tuple get high

crunchydata.com

38 of 56

TOAST

  • Oversized tuples are chopped into pieces and placed in a side table
  • The original tuple is replaced with a pointer to the pieces
  • Recovery involves gathering all the pieces and appending them in the correct order

crunchydata.com

39 of 56

5128 vertices�915.378 Ha

4211 vertices �915.380 Ha

ST_Simplify()

40 of 56

964 vertices

225 vertices

ST_Simplify()

15 �vertices

ST_Simplify()

915 Ha

927 Ha

951 Ha

41 of 56

ST_ConcaveHull(polygon)

  • Concave hull is usually used on points
  • Used against a polygon, the result is a polygon guaranteed to contain the input
  • Perfect for simplifying query polygons to be slightly over-determined

crunchydata.com

42 of 56

Match Data Detail to Resolution

  • “My 0,0,0 tile takes forever to render!”
  • What screen resolution are you going to fill?
  • How many meters per pixel?
  • Rework your data to match that resolution.

How big is a pixel?

crunchydata.com

43 of 56

Dilate and Erode

Too detailed!

crunchydata.com

44 of 56

Dilate and Erode

Dilate! ST_Buffer(g, 100)

crunchydata.com

45 of 56

Dilate and Erode

Erode! ST_Buffer(g, -100)

crunchydata.com

46 of 56

Performance Principle #4

Tame Query Complexity

Use JOIN syntax

Don’t get CTE happy�EXPLAIN! Test in parts

crunchydata.com

47 of 56

Code smell #1 - “I Hate Joins”

SELECT p.objectid, p.name�FROM parcels p

WHERE p.zone IN (

SELECT zoneid

FROM zoning

WHERE zonename LIKE 'R%'

)

SELECT p.objectid, p.name�FROM parcels p

JOIN zoning z � ON p.zone = z.zoneid

WHERE z.zonename LIKE 'R%'

Not like this!

Like this!

crunchydata.com

48 of 56

Code smell #2 - “I Hate Joins”

SELECT p.objectid AS f_id,

ST_Area(ST_Intersection(p.geom, w.geom)) / � ST_Area(p.geom) � AS wetland_pct

FROM dbo.parcels p, dbo.wetland w

WHERE ST_Intersects(p.geom, w.geom)

ORDER BY p.objectid

Use JOIN syntax

crunchydata.com

49 of 56

Code smell #3 - “CTE all the things”

WITH p AS (

SELECT objectid, geom

FROM dbo.parcels

)

SELECT p.objectid AS f_id,

ST_Area(ST_Intersection(p.geom, w.geom)) AS wetland_area

FROM p, dbo.wetland w

WHERE ST_Intersects(p.geom, w.geom)

ORDER BY b.objectid

Avoid�optimization�boundaries

Because they (can) wreck indexing

crunchydata.com

50 of 56

Code smell #3 - “CTE all the things”

WITH p AS (

SELECT objectid, geom

FROM dbo.parcels

)

SELECT p.objectid AS f_id,

ST_Area(ST_Intersection(p.geom, w.geom)) AS wetland_area

FROM p, dbo.wetland w

WHERE ST_Intersects(p.geom, w.geom)

ORDER BY b.objectid

Avoid�optimization�boundaries

Because they (can) wreck indexing

crunchydata.com

51 of 56

Code smell #4 - “Summarize it all”

WITH bounds AS MATERIALIZED (

SELECT ST_TileEnvelope(0,0,0) AS merc,

ST_Transform(ST_TileEnvelope(0,0,0), 4326) AS ll

),

mvtgeom AS (

SELECT ST_AsMVTGeom(ST_Transform(t.geom, 3857), bounds.merc) AS geom

FROM gps_points t� CROSS JOIN bounds

WHERE ST_Intersects(t.geom, bounds.ll)

)

SELECT ST_AsMVT(mvtgeom.*) FROM mvtgeom

the �whole �world

In addition to “tile the world”, “union the world” is also popular.

crunchydata.com

52 of 56

Debugging Principles

  • It’s not because there’s too much SQL

“Why is my five-page SQL query running slowly?”

crunchydata.com

53 of 56

Debugging Principles

  • It’s not because there’s too much SQL
  • Use EXPLAIN ANALYZE to find the parts of the query taking long time
    • Extract those and run alone
  • Use EXPLAIN ANALYZE to find bad estimates
    • Where does estimate and run-time diverge?

“Why is my five-page SQL query running slowly?”

crunchydata.com

54 of 56

Debugging Principles

  • Note where in the EXPLAIN ANALYZE indexes are being used
    • Does that match your expectation?
  • Note where they are not being used.
    • Does that match your expectation?
  • It’s not because there’s too much SQL
  • Use EXPLAIN ANALYZE to find the parts of the query taking long time
    • Extract those and run alone
  • Use EXPLAIN ANALYZE to find bad estimates
    • Where does estimate and run-time diverge?

“Why is my five-page SQL query running slowly?”

crunchydata.com

55 of 56

😀�PostGIS is great! So fast!

😡�PostGIS is unacceptably slow!

😐�…

The hero’s journey…

56 of 56

Questions?

paul.ramsey@crunchydata.com