Published using Google Docs
floyd -> jimmy switchover
Updated automatically every 5 minutes

floyd -> jimmy switchover

Well in advance

  1. Announce downtime on blog.
  2. Announce on Twitter/X/Y/Z.

A couple hours in advance

  1. Stop any cron jobs that access the database.
  2. Set a banner message on MB.

T=0

It will be helpful to have four shells open in advance to jimmy, hendrix, floyd, and aretha (ideally within a screen session on each server).

It is assumed that all host commands are run as root from within /root/docker-server-configs.

  1. Bring all services down on the gateway.
  2. Enable Consul maintenance mode on the primary postgres service on floyd:
  1. ./scripts/set_service_maintenance.sh enable postgres-floyd 65400
  1. Stop the haproxy/pgbouncer containers on jimmy and hendrix:
  1. docker stop pgbouncer-master pgbouncer-slave pgbouncer-any haproxy-postgres
  1. Stop the barman "receive-wal" process on aretha:
  1. docker exec barman sv down cron
  2. docker exec barman sudo -u barman barman receive-wal --stop floyd
  1. Perform the switchover on jimmy, inside a screen session:
  1. docker exec -it postgres-jimmy /bin/bash
  1. sudo -u postgres repmgr -f /etc/repmgr.conf --force-rewind --siblings-follow --dry-run standby switchover
  2. sudo -u postgres repmgr -f /etc/repmgr.conf --force-rewind --siblings-follow standby switchover
  3. sed -i 's/floyd/jimmy/g' /etc/repmgr.conf
  1. Note: It will take 10 seconds for haproxy's health checks to detect the new master. This is controlled by recover_check_interval in /etc/pg_cluster/config.json. Wait.
  1. Update repmgr.conf on hendrix to point to the new master:
  1. docker exec postgres-hendrix sed -i 's/floyd/jimmy/g' /etc/repmgr.conf
  1. On aretha, update the barman configuration and restart its cron services:
  1. docker exec -it barman /bin/bash
  1. mv /etc/barman.d/jimmy.conf.backup /etc/barman.d/jimmy.conf
  2. mv /etc/barman.d/floyd.conf /etc/barman.d/floyd.conf.backup
  3. sudo -u barman barman cron
  4. sv up cron
  1. Stop the postgres instance on floyd (unregistering it from repmgr first):
  1. docker exec postgres-floyd sudo -u postgres repmgr -f /etc/repmgr.conf standby unregister
  2. docker exec postgres-floyd sudo -u postgres stop_postgres.sh
  3. docker stop postgres-floyd
  1. Restart the haproxy/pgbouncer containers on jimmy and hendrix:
  1. docker start pgbouncer-master pgbouncer-slave pgbouncer-any haproxy-postgres
  1. Check that the master and slave pgbouncer services point to their respective roles on jimmy and hendrix:
  1. psql -h localhost -p 65436 -U postgres -d template1 -c 'SELECT pg_is_in_recovery();' (should return 'f')
  2. psql -h localhost -p 65437 -U postgres -d template1 -c 'SELECT pg_is_in_recovery();' (should return 't')
  1. Bring all services back up on the gateway.
  2. Restart any cron jobs that were previously stopped.