1 of 19

Building a Robust Open Source Geospatial Database Solution using PostgreSQL and its Ecosystem

2 of 19

What is PostgreSQL?

PostgreSQL is a powerful, open source object-relational database system with over 35 years of active development that has earned it a strong reputation for reliability, feature robustness, and performance.

�It can be used to store any kind of data given the huge list of extensions and can be hosted on all the major platforms or you can deploy on your own tailored to your specific use case.

3 of 19

postGIS

PostGIS extends the capabilities of the PostgreSQL relational database by adding support storing, indexing and querying geographic data.

PostGIS features include:

  • Spatial Data Storage: Store different types of spatial data such as points, lines, polygons, and multi-geometries, in both 2D and 3D data.
  • Spatial Indexing: Quickly search and retrieve spatial data based on its location.
  • Spatial Functions: A wide range of spatial functions that allow you to filter and analyze spatial data, measuring distances and areas, intersecting geometries, buffering, and more.
  • Geometry Processing: Tools for processing and manipulating geometry data, such as simplification, conversion, and generalization.
  • Raster Data Support: Storage and processing of raster data, such as elevation data and weather data.
  • Geocoding and Reverse Geocoding: Functions for geocoding and reverse geocoding.
  • Integration: Access and work with PostGIS using third party tools such as QGIS, GeoServer, MapServer, ArcGIS, Tableau.

4 of 19

And there are more…

  • postgis_topology - Extension that enables the storage and management of topological relationships between spatial features.
  • postgis_tiger_geocoder - Extension that provides tools for geocoding, enabling the conversion of addresses into geographic coordinates using data from the U.S. Census Bureau's TIGER/Line dataset.
  • postgis_raster - Extension that adds support for storing, analyzing, and processing raster data, such as satellite imagery or other gridded datasets, within a spatial database environment.
  • address_standardizer - Extension that helps standardize and normalize address data. It assists in parsing, cleaning, and formatting addresses to a consistent structure, making it easier to work with geocoding and address-based applications.

5 of 19

And more…

  • pgrouting - Extension for routing and network analysis. It adds geospatial routing functionality to the database, allowing users to perform various routing algorithms and network analysis operations on spatial data.
  • mobilityDB - For managing and analyzing spatiotemporal data, specifically designed for mobility-related applications. It extends the database to handle temporal and spatiotemporal data types, enabling efficient storage and querying of information related to moving objects and temporal evolution.
  • ogr_fdw - Foreign data wrapper that enables the database to access and interact with various geospatial data formats supported by the OGR library. It allows PostgreSQL to seamlessly connect to external geospatial data sources, such as shapefiles or other spatial formats, treating them as if they were native PostgreSQL tables.

6 of 19

And more…

  • h3-pg - Extension that integrates H3, a spatial indexing system developed by Uber. H3 is designed for hexagonal spatial indexing, allowing efficient indexing and querying of geospatial data based on hexagons. The extension enables the use of H3 functionalities within PostgreSQL databases, facilitating operations such as geospatial indexing and analysis using hexagonal grids.
  • pointcloud_postgis - The pointcloud_postgis extension adds functions that allow you to use PostgreSQL Pointcloud with PostGIS, converting PcPoint and PcPatch to Geometry and doing spatial filtering on point cloud data.

And many more to come…

7 of 19

Moving data in and out of postgres

While the extensions make it way easy to store the date in postgres but how do we get it out and firstly how to send data to it. The obvious approach is to create your wrappers in forms of API to interact with postgres and for common use case we have many such tools,wrappers and more.

  • shp2pgsql - A command-line tool for importing shapefiles into PostgreSQL databases with PostGIS extension, converting spatial vector data into the appropriate database format.
  • pgsql2shp - A command-line tool for exporting data from PostgreSQL databases, particularly spatial data stored with PostGIS, to shapefile format.
  • ogr2ogr - A versatile command-line tool from the GDAL library that supports the conversion of various geospatial formats, including vector and raster data.
  • raster2pgsql - A tool for importing raster data, such as satellite imagery or gridded datasets, into a PostgreSQL database with PostGIS Raster extension.

8 of 19

And more…

  • pgtopo_export - The pgtopo_export script connects to the named database and exports data of the named topology and associated layers to a file with a custom format.
  • pgtopo_import - The pgtopo_import data loader reads an export file written by pgtopo_export and outputs an SQL script reconstructing the topology therein contained.
  • Osm2pgsql - Osm2pgsql imports OpenStreetMap (OSM) data into a PostgreSQL/PostGIS database. It is an essential part of many rendering toolchains, the Nominatim geocoder and other applications processing OSM data.

9 of 19

CLI tools are okay but we need more…

  • QGIS - QGIS is an open-source GIS software that allows users to create, edit, visualize, and analyze geospatial data.
  • GeoNode - GeoNode is an open-source platform that facilitates the creation, sharing, and management of geospatial data and maps. It is designed to be a web-based platform for developing geospatial information systems (GIS).
  • GeoServer - GeoServer is an open-source server software that allows users to share and publish geospatial data over the internet using industry-standard protocols. It supports the Web Feature Service (WFS), Web Map Service (WMS), and other OGC standards.
  • GeoDjango - GeoDjango is a geographic extension for Django, a high-level Python web framework. It adds support for spatial data types and geospatial querying, allowing developers to build location-aware web applications.

10 of 19

And more….

  • pygeoapi - pygeoapi is a Python server implementation of the OGC API suite of standards. The project emerged as part of the next generation OGC API efforts in 2018 and provides the capability for organizations to deploy a RESTful OGC API endpoint using OpenAPI, GeoJSON, and HTML.
  • tipg - tipg is a Python package that helps create lightweight OGC Features and Tiles API with a PostGIS Database backend. The API has been designed for OGC Features and OGC Tiles specifications.
  • PostgREST - PostgREST is a standalone web server that turns your PostgreSQL database directly into a RESTful API. The structural constraints and permissions in the database determine the API endpoints and operations.

11 of 19

Now we have the data and the means to extract it, let’s make it more accessible.

Postgres owing to it’s open source nature has a variety of options when it comes to deployment, let’s have a look at few options.

  • AWS RDS - Amazon RDS is a managed database service that simplifies the process of setting up, operating, and scaling a relational database, including PostgreSQL Key Features include Automated backups,High availability with Multi-AZ deployments,Automatic software patching, Monitoring and metrics through Amazon CloudWatch.
  • AWS Aurora - Aurora is a fully managed, MySQL and PostgreSQL-compatible relational database engine provided by AWS. While it's compatible with PostgreSQL, it has some unique features and performance improvements. Key Features include High performance and scalability, Multi-AZ deployments for high availability, Automated backups and replication.
  • Azure Postgres - Microsoft Azure offers a managed PostgreSQL service that provides high availability, security, and scalability. Key Features include Automated backups and point-in-time recovery, Scaling options to adapt to changing workloads, Integration with Azure Active Directory for authentication

12 of 19

And more…

TimescaleDB - TimescaleDB is an open-source time-series database built on PostgreSQL. It extends PostgreSQL to handle time-series data efficiently.Key Features include Time-series data optimizations, Continuous aggregates for improved query performance, Scalability for handling large volumes of time-series data.

  • EDB (EnterpriseDB) - EnterpriseDB is a company that provides products and services based on the open-source PostgreSQL database. Their offerings include enhanced and enterprise-ready versions of PostgreSQL. Key Features include EDB Postgres Advanced Server includes additional enterprise features, Tools for migration from other databases to PostgreSQL
  • Percona - Percona provides software and services for open-source databases, including PostgreSQL. They offer support, consulting, and management tools for PostgreSQL deployments. Key Features include Percona Distribution for PostgreSQL includes performance and management tools, Support for high-performance and scalability
  • CrunchyData - Crunchy Data specializes in providing solutions for open-source databases, including PostgreSQL. They offer tools and services to enhance PostgreSQL deployments. Key Features include, Crunchy PostgreSQL for Kubernetes for containerized deployments, Backup and disaster recovery solutions

13 of 19

And last but not the least…

Bare metal servers - Run your own single or multi tenant postgres cluster with any configuration you like, no license fee and plenty of documentation to achieve any performance you want to hit on.

Did someone say `docker`????

14 of 19

The Container Angle

When it comes to new age cloud practises, we can be leaning towards to docker or containers in general to run our applications and why not run postgres on it as well and there are few ways we can do just that.

  • Crunchy PostgreSQL Operator - Crunchy Data's PostgreSQL Operator is designed to manage PostgreSQL clusters on Kubernetes. It provides features for high availability, backup, and scaling.
  • Zalando PostgreSQL Operator - Zalando has developed a PostgreSQL Operator for managing PostgreSQL clusters on Kubernetes. It includes features for automated backups, scaling, and configuration management.

15 of 19

And more…

  • KubeDB - KubeDB is a multi-cloud database-as-a-service platform for Kubernetes. It supports PostgreSQL and provides a PostgreSQL operator for managing PostgreSQL clusters.
  • Stolon - Stolon is a cloud-native PostgreSQL manager for PostgreSQL high availability. While not a traditional operator, Stolon is often used with Kubernetes to manage PostgreSQL clusters.
  • PostgreSQL Operator by Zalando: Run highly-available PostgreSQL clusters on Kubernetes (K8s) powered by Patroni. It is configured only through Postgres manifests (CRDs) to ease integration into automated CI/CD pipelines with no access to Kubernetes API directly, promoting infrastructure as code vs manual operations.

16 of 19

What more can we expect?

Given the large community on postgres many have tried to bridge the gap and find the missing pieces in postgres and the following are some of the result. �

  • Supabase - Supabase is an open-source alternative to Firebase. It combines a PostgreSQL database with real-time capabilities and a set of tools for building scalable web applications. It provides a user-friendly interface for database management and supports features like authentication, serverless functions, and real-time subscriptions.
  • Yugabyte - YugabyteDB is an open-source, distributed SQL database designed for high availability and scalability. It is compatible with PostgreSQL and supports distributed transactions. Yugabyte is suitable for cloud-native and multi-cloud deployments, providing resilience and consistent performance across multiple nodes.

17 of 19

And more…

  • TimescaleDB - TimescaleDB is an open-source time-series database that builds on top of PostgreSQL. It is designed to efficiently handle time-series data, making it suitable for applications such as IoT, monitoring, and analytics. TimescaleDB retains the relational model of PostgreSQL while providing optimizations for time-series workloads.
  • Citus - Citus is an extension for PostgreSQL that transforms it into a distributed database system. It enables horizontal scaling of PostgreSQL across multiple nodes, allowing for improved performance and parallel processing. Citus is particularly well-suited for applications with large datasets and complex queries.

18 of 19

Some use cases

With it’s wide variety of options there are some interesting solutions out there that you can build by joining different pieces together.

  • Spatial - Temporal Database using Postgis + timesaleDB + uber h3
  • REST APIs out of the box using PostgREST
  • OGC APIs using pygeoapi or tipg

And more…

19 of 19

Thank You

You can reach me out on:�� https://twitter.com/jsonsingh

https://www.linkedin.com/in/jsonsingh/

Or check more about me at: https://jsonsingh.com