1 of 56

PostGIS �Performance Tips

paul.ramsey@crunchydata.com

2 of 56

3 of 56

MongoDB is WeB ScALe!

4 of 56

Mandatory dig at NoSQL databases and the sad, sad, Sad people who use them

5 of 56

😀�PostGIS is great! So fast!

😡�PostGIS is unacceptably slow!

😐�…

The hero’s journey…

6 of 56

How do I make PostGIS faster?!?

7 of 56

It Depends

8 of 56

The $250,000 feature flag!

Brought to you by…

ALTER SESSION � SET SPATIAL_VECTOR_ACCELERATION = true

9 of 56

The dream…!

The reality…

ALTER SESSION � SET postgis_faster = true

yum update geos

10 of 56

GEOS 3.10

GEOS GOES FASTER

crunchydata.com

11 of 56

GEOS GOES FASTER

crunchydata.com

12 of 56

It Depends

13 of 56

What “It Depends” On

Data size

How many records in that table?

Data heterogeneity

How variable is geometry extent?

Data complexity

How many vertices in those geometries?

Query complexity

How many clauses in that SQL?

crunchydata.com

14 of 56

Data Size

Foreach Record(A)� Foreach Record(B)

TestJoinCondition()

runs 10,000 times

crunchydata.com

15 of 56

Data Size

Foreach Record(A)� Foreach Record(B)

TestJoinCondition()

runs 100,000,000 times

crunchydata.com

16 of 56

Performance Principle #1

Remember Spatial Indexes

Build them! �Use them! �EXPLAIN ANALYZE!

crunchydata.com

17 of 56

What Indexes? (Advanced)

GIST Spatial Indexes (R-tree)

CREATE INDEX foo_geom_x � ON foo USING GIST (geom);

For spatial joins
For queries with good selectivity (smaller result sets)
Watch out for highly heterogeneous data �(we will come back to this)

crunchydata.com

18 of 56

What Indexes? (Advanced)

BRIN Spatial Indexes? (Block range)

CREATE INDEX foo_geom_x � ON foo USING BRIN (geom);

For data stored with high spatial autocorrelation

Usually ordered or sorted points

For queries with poor selectivity (larger result sets)

100s of thousands of results

crunchydata.com

19 of 56

What Indexes? (Advanced)

Good spatial autocorrelation

Poor spatial autocorrelation

crunchydata.com

20 of 56

What Indexes? (Advanced)

BRIN Spatial Indexes? (Block range)

CREATE INDEX foo_geom_x � ON foo USING BRIN (geom);

Presort using a GeoHash to add spatial autocorrelation
Or just stick to on-disk temporal autocorrelation that is a side effect of real-time data capture

crunchydata.com

21 of 56

What Indexes? (Advanced)

SP-GIST Spatial Indexes? (Quad-tree)

CREATE INDEX foo_geom_x � ON foo USING SPGIST (geom);

For more homogeneous data?

All objects of similar size

For more static data?

SPGIST is newer less-used code than GIST

Measure before you commit

crunchydata.com

22 of 56

Spatial Indexes (RTree)

gist_geometry_ops_2d builds…

Dynamically tuned to the input data, unlike grid-based index systems.
Default opclass is 2D, there is also an ND opclass.
For every input geometry, the index stores… a box.

Dun, dun, duuuun!!

crunchydata.com

23 of 56

24 of 56

25 of 56

26 of 56

27 of 56

SELECT pts.geom

FROM pts p

JOIN countries c

ON ST_Intersects(p.geom, c.geom)� WHERE c.name = 'Canada'

28 of 56

SELECT pts.geom

FROM pts p

JOIN countries c

ON ST_Intersects(p.geom, c.geom)� WHERE c.name = 'Canada'

29 of 56

SELECT pts.geom

FROM pts p

JOIN countries c

ON ST_Intersects(p.geom, c.geom)� WHERE c.name = 'Canada'

30 of 56

Performance Principle #2

Strive for Homogeneity

Shapes of similar extent

Filters of similar selectivity�EXPLAIN ANALYZE!

crunchydata.com

31 of 56

WITH subdivided AS (� SELECT ST_Subdivide(geom) AS geom� FROM countries� WHERE c.name = 'Canada'�)

SELECT pts.geom

FROM pts p

JOIN subdivided s� ON ST_Intersects(p.geom, s.geom)

32 of 56

WITH subdivided AS (� SELECT ST_Subdivide(geom) AS geom� FROM countries� WHERE c.name = 'Canada'�)

SELECT pts.geom

FROM pts p

JOIN subdivided s� ON ST_Intersects(p.geom, s.geom)

33 of 56

WITH subdivided AS (� SELECT ST_Subdivide(geom) AS geom� FROM countries� WHERE c.name = 'Canada'�)

SELECT pts.geom

FROM pts p

JOIN subdivided s� ON ST_Intersects(p.geom, s.geom)

34 of 56

WITH subdivided AS (� SELECT ST_Subdivide(geom) AS geom� FROM countries�)

SELECT pts.geom

FROM pts p

JOIN subdivided s� ON ST_Intersects(p.geom, s.geom)

35 of 56

Performance Principle #3

Enough Detail and No More

Use simplification

Use approximations�Beware of the TOAST

crunchydata.com

36 of 56

WITH subdivided AS (� SELECT ST_Subdivide(geom) AS geom� FROM countries� WHERE c.name = 'Canada'�)

SELECT pts.geom

FROM pts p

JOIN subdivided s� ON ST_Intersects(p.geom, s.geom)

37 of 56

TOAST

The Oversize Attribute Storage Technique
PostgreSQL has a fixed 8KB page size
Once a geometry has more than 200 vertices, the odds that it is in an oversized tuple get high

crunchydata.com

38 of 56

TOAST

Oversized tuples are chopped into pieces and placed in a side table
The original tuple is replaced with a pointer to the pieces
Recovery involves gathering all the pieces and appending them in the correct order

crunchydata.com

Oversize values get chopped up and moved into a side table. That way each piece is still below the page size, so the database can happily store very large objects “in a table” and the user is none the wiser about what is going on behind the scenes.

Except… what is going on behind the scenes is quite computationally intensive. Retrieval of a TOASTed object involves finding all the parts, decompressing them, and then appending them into the final in-memory object. It’s transparent, it’s clever, but it’s not computationally cheap.

The TOAST system and large object handling in general in PostgreSQL is predicated on the idea that large things are relatively rare. When you have a table that is all large things, which assumption can cause bad plans, and every retrieval of any object will have a stiff penalty. For some data, I’ve measured TOAST retrieval as bad as 5 to 10 times slower than direct retrieval.

39 of 56

5128 vertices�915.378 Ha

4211 vertices �915.380 Ha

ST_Simplify()

40 of 56

964 vertices

225 vertices

ST_Simplify()

15 �vertices

ST_Simplify()

915 Ha

927 Ha

951 Ha

41 of 56

ST_ConcaveHull(polygon)

Concave hull is usually used on points
Used against a polygon, the result is a polygon guaranteed to contain the input
Perfect for simplifying query polygons to be slightly over-determined

crunchydata.com

42 of 56

Match Data Detail to Resolution

“My 0,0,0 tile takes forever to render!”
What screen resolution are you going to fill?
How many meters per pixel?
Rework your data to match that resolution.

How big is a pixel?

crunchydata.com

43 of 56

Dilate and Erode

Too detailed!

crunchydata.com

44 of 56

Dilate and Erode

Dilate! ST_Buffer(g, 100)

crunchydata.com

45 of 56

Dilate and Erode

Erode! ST_Buffer(g, -100)

crunchydata.com

46 of 56

Performance Principle #4

Tame Query Complexity

Use JOIN syntax

Don’t get CTE happy�EXPLAIN! Test in parts

crunchydata.com

47 of 56

Code smell #1 - “I Hate Joins”

SELECT p.objectid, p.name�FROM parcels p

WHERE p.zone IN (

SELECT zoneid

FROM zoning

WHERE zonename LIKE 'R%'

)

SELECT p.objectid, p.name�FROM parcels p

JOIN zoning z � ON p.zone = z.zoneid

WHERE z.zonename LIKE 'R%'

Not like this!

Like this!

crunchydata.com

48 of 56

Code smell #2 - “I Hate Joins”

SELECT p.objectid AS f_id,

ST_Area(ST_Intersection(p.geom, w.geom)) / � ST_Area(p.geom) � AS wetland_pct

FROM dbo.parcels p, dbo.wetland w

WHERE ST_Intersects(p.geom, w.geom)

ORDER BY p.objectid

Use JOIN syntax

crunchydata.com

49 of 56

Code smell #3 - “CTE all the things”

WITH p AS (

SELECT objectid, geom

FROM dbo.parcels

)

SELECT p.objectid AS f_id,

ST_Area(ST_Intersection(p.geom, w.geom)) AS wetland_area

FROM p, dbo.wetland w

WHERE ST_Intersects(p.geom, w.geom)

ORDER BY b.objectid

Avoid�optimization�boundaries

Because they (can) wreck indexing

crunchydata.com

50 of 56

Code smell #3 - “CTE all the things”

WITH p AS (

SELECT objectid, geom

FROM dbo.parcels

)

SELECT p.objectid AS f_id,

ST_Area(ST_Intersection(p.geom, w.geom)) AS wetland_area

FROM p, dbo.wetland w

WHERE ST_Intersects(p.geom, w.geom)

ORDER BY b.objectid

Avoid�optimization�boundaries

Because they (can) wreck indexing

crunchydata.com

51 of 56

Code smell #4 - “Summarize it all”

WITH bounds AS MATERIALIZED (

SELECT ST_TileEnvelope(0,0,0) AS merc,

ST_Transform(ST_TileEnvelope(0,0,0), 4326) AS ll

),

mvtgeom AS (

SELECT ST_AsMVTGeom(ST_Transform(t.geom, 3857), bounds.merc) AS geom

FROM gps_points t� CROSS JOIN bounds

WHERE ST_Intersects(t.geom, bounds.ll)

)

SELECT ST_AsMVT(mvtgeom.*) FROM mvtgeom

the �whole �world

In addition to “tile the world”, “union the world” is also popular.

crunchydata.com

As I mentioned before, and with vector tiles in particular, there’s a bit of magical thinking that goes on, and it’s hard to avoid, because at some level, a vector tile map is going to want to display the full extent of a table, which implies ripping the whole contents of the table into a tile. But don’t be surprised if it takes a while, you are transcoding a whole table, after all.

Consider putting a cache layer in front of your tile server so you only pay the price rarely, and also consider simplifying your inputs to match your output scale.

A tile-generating function or view that switches data sources based on zoom level is a simple way to accomplish this.

This is also where tricks like the dilate/erode can come in handy, to turn raw, high detail data into processed, low detail data.

52 of 56

Debugging Principles

It’s not because there’s too much SQL

“Why is my five-page SQL query running slowly?”

crunchydata.com

53 of 56

Debugging Principles

It’s not because there’s too much SQL
Use EXPLAIN ANALYZE to find the parts of the query taking long time

Extract those and run alone

Use EXPLAIN ANALYZE to find bad estimates

Where does estimate and run-time diverge?

“Why is my five-page SQL query running slowly?”

crunchydata.com

54 of 56

Debugging Principles

Note where in the EXPLAIN ANALYZE indexes are being used

Does that match your expectation?

Note where they are not being used.

Does that match your expectation?

It’s not because there’s too much SQL
Use EXPLAIN ANALYZE to find the parts of the query taking long time

Extract those and run alone

Use EXPLAIN ANALYZE to find bad estimates

Where does estimate and run-time diverge?

“Why is my five-page SQL query running slowly?”

crunchydata.com

55 of 56

😀�PostGIS is great! So fast!

😡�PostGIS is unacceptably slow!

😐�…

The hero’s journey…

56 of 56

Questions?

paul.ramsey@crunchydata.com