1 of 41

When a database does too much: Reflecting on PostgreSQL’s hidden capabilities

Kowshiq Kattamuri

Advanced App Engineer, Accenture

2 of 41

Author | YouTuber | Tutor

Working in the Big Data domain with a focus on data engineering and analytics
Exploring intersections of AI, ML, and Quantum Computing for next-gen problem solving
Founder of Kowshiq’s Community – a 13K+ member platform sharing job opportunities, hackathons & events for students and freshers
A recent B.Tech graduate from NIT Calicut, carrying forward a strong foundation in technology

Kowshiq Kattamuri

Advanced App Engineer, Accenture

4 of 41

Notion

Built fully on PostgreSQL

Core of product architecture

Initially monolithic, then sharded across instances

Instagram

Early scale with PostgreSQL

Millions of users supported

Primary DB till date

5 of 41

The Polyglot Problem

Complexity and Cost

PostgreSQL for core data

Redis or Memcached for caching

RabittMQ or Kafka for job queues

Elasticsearch for search

6 of 41

Queue Systems

UserJot’s example

Key Value Storage

Heithem moumni’s story

Full Text Search

Level’s blog

Real Time Features

Buildbot’s take

8 of 41

Kafka

Topics, partitions, replication factor, retention policies, consumer groups, maybe schema registry

Redis

Lists/BRPOP or streams/XADD/XREAD, stream groups, persistence config (AOF/RDB), eviction policy

RabbitMQ

Exchanges, queues, bindings, dead-letter queues, delayed-retry plugin

Conventional way

Options to consider when you are processing tens of thousands of events/sec

9 of 41

Postgres as Queue

10 of 41

Postgres as Queue

Creating Indexes

11 of 41

Postgres as Queue

Enqueue + wake workers (atomic + instant)

12 of 41

Postgres as Queue

Worker: claim with SKIP LOCKED (no double work), then process

13 of 41

Postgres as Queue

Finish job or retry with backoff

14 of 41

The big process

Postgres

way

A poor man’s perspective

New service to run, secure, back up
Complex constructs (topics, DLQs, retries)
Separate client libs & idempotency store
Extra monitoring, cost, on-call

One database to run & back up
One transaction: data + job
Simple SQL for claim/retry/backoff
Built-in LISTEN/NOTIFY for wakeups

16 of 41

Amazon Dynamo DB

Plan partition keys and sort keys, manage provisioned vs. on-demand capacity, handle global/local secondary indexes,

Redis

Set up the server, manage persistence (RDB or AOF), configure clustering/replication for fault tolerance

Memcached

Configure the daemon, decide on memory allocation, and ensure consistent hashing

Conventional way

Options to consider when ultra-low latency and high throughput at massive scale are critical.

18 of 41

Postgres as Cache

Creating a lightning fast cache table

19 of 41

Postgres as Cache

Insert and fetch with SQL

20 of 41

Postgres as Cache

Emulating TTL

21 of 41

The big process

Postgres

way

A poor man’s perspective

New service to run, secure, back up
Very fast, but adds infra complexity
Limited querying
Separate persistence & durability strategies

One database to run & back up
Key + value stored in standard tables or JSONB
ACID transactions out-of-the-box
Rich querying (indexes, filters, joins)

23 of 41

Apache Solr

Dedicated search server with configs, schema tuning, and monitoring overhead.

Algolia

Managed service but adds cost, vendor lock-in, and data sync maintenance.

Elasticsearch

Requires setting up separate clusters, managing nodes, and syncing data from DB.

Conventional way

Options to consider when ultra fast, web-scale, search-tuned infrastructure is needed.

24 of 41

Postgres for full text search

Searching for JPMorgan Chase using a query like jp

25 of 41

Postgres for full text search

For efficiency during lookups

26 of 41

Postgres for full text search

Optimizing Further

27 of 41

Postgres for full text search

Full text search

28 of 41

Postgres for full text search

Optimized full text search

29 of 41

The big process

Postgres

way

A poor man’s perspective

Extra infra: clusters/servers to deploy and maintain
Data pipelines needed to keep DB and search index in sync
Eventual consistency issues between DB and search system
Higher cost (compute, storage, licensing/usage fees)

All inside the same database
Native full-text support (stemming, ranking, multiple languages)
Strong consistency with structured + unstructured search together
ACID guarantees preserved

30 of 41

Real time features

31 of 41

Pusher

Integrate SDKs, manage channels/events, and pay recurring SaaS fees for scale.

Apache Kafka

Set up clusters, brokers, topics, and a separate pipeline just to stream events.

Firebase Realtime Database

Spin up a separate cloud service, sync app with it, and depend on vendor infra.

Conventional way

Options to consider when live updates need to be handled outside the database.

32 of 41

Postgres for real time updates

Setting up PostgreSQL Triggers

33 of 41

Listening for notifications

34 of 41

The big process

Postgres

way

A poor man’s perspective

Stand up external infra.
Sync DB writes with the service
Risk of data drift
Complex to scale and monitor
Higher costs (infra + licensing)

Real-time publish/subscribe built into the DB
Events tied directly to committed transactions
No external pipelines or services to maintain
Strong consistency, instant push to clients

35 of 41

Development Complexity

Different client libraries and connection patterns

Coordinating deployments across multiple services

Operational Overhead

Different services to monitor, update, and debug

Different scaling patterns and failure modes

Multiple configurations to maintain

Separate backup and disaster recovery procedures

Inconsistent data between systems

Different performance tuning approaches

36 of 41

Let’s do some Math

20 $ PM

Redis

Message queue

25 $ PM

Search service

50 $ PM

Monitoring for 3 services

30 $ PM

Grand Total

125 $ PM

37 of 41

The Single Database That Scales

Single Postgres cluster handling millions of bookings

With more than 10 million users uses PostgreSQL in its data lake.

Entire DevOps platform on Postgres

38 of 41

The magic is in Postgres's architecture

Proven options for horizontal scaling

Read replicas for query scaling

Partitioning for large tables

Connection pooling for concurrency

Logical replication for distributed setups

1 of 41

2 of 41

3 of 41

4 of 41

5 of 41

6 of 41

7 of 41

8 of 41

9 of 41

10 of 41

11 of 41

12 of 41

13 of 41

14 of 41

15 of 41

16 of 41

17 of 41

18 of 41

19 of 41

20 of 41

21 of 41

22 of 41

23 of 41

24 of 41

25 of 41

26 of 41

27 of 41

28 of 41

29 of 41

30 of 41

31 of 41

32 of 41

33 of 41

34 of 41

35 of 41

36 of 41

37 of 41

38 of 41

39 of 41

40 of 41

41 of 41