floyd -> jimmy switchover
Well in advance
- Announce downtime on blog.
- Announce on Twitter/X/Y/Z.
A couple hours in advance
- Stop any cron jobs that access the database.
- Set a banner message on MB.
T=0
It will be helpful to have four shells open in advance to jimmy, hendrix, floyd, and aretha (ideally within a screen session on each server).
It is assumed that all host commands are run as root from within /root/docker-server-configs.
- Bring all services down on the gateway.
- Enable Consul maintenance mode on the primary postgres service on floyd:
- ./scripts/set_service_maintenance.sh enable postgres-floyd 65400
- Stop the haproxy/pgbouncer containers on jimmy and hendrix:
- docker stop pgbouncer-master pgbouncer-slave pgbouncer-any haproxy-postgres
- Stop the barman "receive-wal" process on aretha:
- docker exec barman sv down cron
- docker exec barman sudo -u barman barman receive-wal --stop floyd
- Perform the switchover on jimmy, inside a screen session:
- docker exec -it postgres-jimmy /bin/bash
- sudo -u postgres repmgr -f /etc/repmgr.conf --force-rewind --siblings-follow --dry-run standby switchover
- sudo -u postgres repmgr -f /etc/repmgr.conf --force-rewind --siblings-follow standby switchover
- sed -i 's/floyd/jimmy/g' /etc/repmgr.conf
- Note: It will take 10 seconds for haproxy's health checks to detect the new master. This is controlled by recover_check_interval in /etc/pg_cluster/config.json. Wait.
- Update repmgr.conf on hendrix to point to the new master:
- docker exec postgres-hendrix sed -i 's/floyd/jimmy/g' /etc/repmgr.conf
- On aretha, update the barman configuration and restart its cron services:
- docker exec -it barman /bin/bash
- mv /etc/barman.d/jimmy.conf.backup /etc/barman.d/jimmy.conf
- mv /etc/barman.d/floyd.conf /etc/barman.d/floyd.conf.backup
- sudo -u barman barman cron
- sv up cron
- Stop the postgres instance on floyd (unregistering it from repmgr first):
- docker exec postgres-floyd sudo -u postgres repmgr -f /etc/repmgr.conf standby unregister
- docker exec postgres-floyd sudo -u postgres stop_postgres.sh
- docker stop postgres-floyd
- Restart the haproxy/pgbouncer containers on jimmy and hendrix:
- docker start pgbouncer-master pgbouncer-slave pgbouncer-any haproxy-postgres
- Check that the master and slave pgbouncer services point to their respective roles on jimmy and hendrix:
- psql -h localhost -p 65436 -U postgres -d template1 -c 'SELECT pg_is_in_recovery();' (should return 'f')
- psql -h localhost -p 65437 -U postgres -d template1 -c 'SELECT pg_is_in_recovery();' (should return 't')
- Bring all services back up on the gateway.
- Restart any cron jobs that were previously stopped.