A Complete Patroni Setup on RHEL/Rocky/Oracle Linux 9

Patroni for PostgreSQL is an open source solution that helps postgres achieve functionalities like high availability, load balancing, and auto failover. It uses HAProxy for load balancing and ETCD to track node status and elects new Leader node among the replicas if current Leader node is unavailable.

Patroni Cluster Example

In order to setup the Patroni Cluster, I’ll assume you have already completed below infrastructure prerequisites.

  • Four hosts (physical or virtual) with RHEL/Rocky/Oracle Linux 9 installed.
pgnode1
pgnode2
pgnode3
master
  • OS packages are updated.
  • And, are able to connect internally.

Let’s begin,

1. Set hostnames using below commands on respective nodes and reboot.

nmcli general hostname pgnode1 -- on Node1
nmcli general hostname pgnode2 -- on Node2
nmcli general hostname pgnode3 -- on Node3
nmcli general hostname master -- on Master Node

2. Add IPs for all four hosts to /etc/hosts on all nodes.

[root@pgnode1 ~]# vi /etc/hosts
192.168.0.140 master
192.168.0.141 pgnode1
192.168.0.142 pgnode2
192.168.0.143 pgnode3

3. Setup ETCD on pgnode1, pgnode2, and pgnode3.

Download etcd binaries to /tmp and then copy them to bin location. Check https://github.com/etcd-io/etcd/releases for latest release.

  • Copy and run below script.
ETCD_VER=v3.4.34
#choose either URL
GOOGLE_URL=https://storage.googleapis.com/etcd
GITHUB_URL=https://github.com/etcd-io/etcd/releases/download
DOWNLOAD_URL=${GOOGLE_URL}
rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
rm -rf /tmp/etcd-download-test && mkdir -p /tmp/etcd-download-test
curl -L ${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz -o /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
tar xzvf /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz -C /tmp/etcd-download-test --strip-components=1
rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
/tmp/etcd-download-test/etcd --version
/tmp/etcd-download-test/etcdctl version

You should get output similar to this on successful download and run.

etcd Version: 3.4.34
Git SHA: c123b3ea3
Go Version: go1.22.7
Go OS/Arch: linux/amd64
etcdctl version: 3.4.34
API version: 3.4
  • Navigate to /tmp/etcd-download-test/ and move binaries now to bin location.
[root@pgnode1 ~]# cd /tmp/etcd-download-test/
[root@pgnode1 etcd-download-test]# mv etcd* /usr/local/bin
  • Verify etcd commands after moving binaries to bin.
[root@pgnode1 ~]# etcd --version
etcd Version: 3.4.34
Git SHA: c123b3ea3
Go Version: go1.22.7
Go OS/Arch: linux/amd64

[root@pgnode1 ~]# etcdctl version
etcdctl version: 3.4.34
API version: 3.4

Configure etcd system service

  • Create directories for lib and config file
[root@pgnode1 ~]# mkdir -p /var/lib/etcd/
[root@pgnode1 ~]# mkdir /etc/etcd
  • Create etcd system user
[root@pgnode1 ~]# groupadd --system etcd
[root@pgnode1 ~]# useradd -s /sbin/nologin --system -g etcd etcd
  • Change ownership and permissions
[root@pgnode1 ~]# chown -R etcd:etcd /var/lib/etcd/
[root@pgnode1 ~]# chmod 0775 /var/lib/etcd/
  • Create systemd service file for etcd daemon on pgnode1, pgnode2, and pgnode3.
---- pgnode1 ----
[root@pgnode1 ~]# vi /etc/systemd/system/etcd.service
[Unit]
Description=etcd key-value store
Documentation=https://github.com/etcd-io/etcd
After=network.target
[Service]
User=etcd
Type=notify
Environment=ETCD_DATA_DIR=/var/lib/etcd
Environment=ETCD_NAME=pgnode1
Environment=ETCD_LISTEN_PEER_URLS="http://192.168.0.141:2380,http://127.0.0.1:7001"
Environment=ETCD_LISTEN_CLIENT_URLS="http://192.168.0.141:2379, http://127.0.0.1:2379"
Environment=ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.0.141:2380"
Environment=ETCD_INITIAL_CLUSTER="pgnode1=http://192.168.0.141:2380,pgnode2=http://192.168.0.142:2380,pgnode3=http://192.168.0.143:2380"
Environment=ETCD_ADVERTISE_CLIENT_URLS="http://192.168.0.141:2379"
Environment=ETCD_INITIAL_CLUSTER_TOKEN="etcdcluster"
Environment=ETCD_INITIAL_CLUSTER_STATE="new"
ExecStart=/usr/local/bin/etcd \
--enable-v2=true
Restart=always
RestartSec=10s
LimitNOFILE=40000
[Install]
WantedBy=multi-user.target
---- pgnode2 ----
[root@pgnode2 ~]# vi /etc/systemd/system/etcd.service
[Unit]
Description=etcd key-value store
Documentation=https://github.com/etcd-io/etcd
After=network.target
[Service]
User=etcd
Type=notify
Environment=ETCD_DATA_DIR=/var/lib/etcd
Environment=ETCD_NAME=pgnode2
Environment=ETCD_LISTEN_PEER_URLS="http://192.168.0.142:2380,http://127.0.0.1:7001"
Environment=ETCD_LISTEN_CLIENT_URLS="http://192.168.0.142:2379, http://127.0.0.1:2379"
Environment=ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.0.142:2380"
Environment=ETCD_INITIAL_CLUSTER="pgnode1=http://192.168.0.141:2380,pgnode2=http://192.168.0.142:2380,pgnode3=http://192.168.0.143:2380"
Environment=ETCD_ADVERTISE_CLIENT_URLS="http://192.168.0.142:2379"
Environment=ETCD_INITIAL_CLUSTER_TOKEN="etcdcluster"
Environment=ETCD_INITIAL_CLUSTER_STATE="new"
ExecStart=/usr/local/bin/etcd \
--enable-v2=true
Restart=always
RestartSec=10s
LimitNOFILE=40000
[Install]
WantedBy=multi-user.target
---- pgnode3 ----
[root@pgnode3 ~]# vi /etc/systemd/system/etcd.service
[Unit]
Description=etcd key-value store
Documentation=https://github.com/etcd-io/etcd
After=network.target
[Service]
User=etcd
Type=notify
Environment=ETCD_DATA_DIR=/var/lib/etcd
Environment=ETCD_NAME=pgnode3
Environment=ETCD_LISTEN_PEER_URLS="http://192.168.0.143:2380,http://127.0.0.1:7001"
Environment=ETCD_LISTEN_CLIENT_URLS="http://192.168.0.143:2379, http://127.0.0.1:2379"
Environment=ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.0.143:2380"
Environment=ETCD_INITIAL_CLUSTER="pgnode1=http://192.168.0.141:2380,pgnode2=http://192.168.0.142:2380,pgnode3=http://192.168.0.143:2380"
Environment=ETCD_ADVERTISE_CLIENT_URLS="http://192.168.0.143:2379"
Environment=ETCD_INITIAL_CLUSTER_TOKEN="etcdcluster"
Environment=ETCD_INITIAL_CLUSTER_STATE="new"
ExecStart=/usr/local/bin/etcd \
--enable-v2=true
Restart=always
RestartSec=10s
LimitNOFILE=40000
[Install]
WantedBy=multi-user.target
  • If SELinux is running in enforcing mode, run following commands to make the local policy active after creating the service file. On all three nodes.
[root@pgnode1 ~]# restorecon -Rv /usr/local/bin/etcd
Relabeled /usr/local/bin/etcd from unconfined_u:object_r:user_tmp_t:s0 to unconfined_u:object_r:bin_t:s0

[root@pgnode2 ~]# restorecon -Rv /usr/local/bin/etcd
Relabeled /usr/local/bin/etcd from unconfined_u:object_r:user_tmp_t:s0 to unconfined_u:object_r:bin_t:s0

[root@pgnode3 ~]# restorecon -Rv /usr/local/bin/etcd
Relabeled /usr/local/bin/etcd from unconfined_u:object_r:user_tmp_t:s0 to unconfined_u:object_r:bin_t:s0
  • Start the etcd service and check the status.
[root@pgnode1 ~]# systemctl start etcd
[root@pgnode1 ~]# systemctl status etcd
● etcd.service - etcd key-value store
Loaded: loaded (/etc/systemd/system/etcd.service; disabled; preset: disabled)
Active: active (running) since Sat 2024-09-14 20:18:45 IST; 1s ago
Docs: https://github.com/etcd-io/etcd
Main PID: 2298 (etcd)
Tasks: 9 (limit: 21742)
Memory: 12.0M
CPU: 127ms
CGroup: /system.slice/etcd.service
└─2298 /usr/local/bin/etcd --enable-v2=true
Sep 14 20:18:45 pgnode1 etcd[2298]: 626689aa3d485e48 initialized peer connection; fast-forwarding 8 ticks (election ticks 10) with 2 active peer(s)
Sep 14 20:18:45 pgnode1 etcd[2298]: raft2024/09/14 20:18:45 INFO: raft.node: 626689aa3d485e48 elected leader 49177db2cb1639a8 at term 282
Sep 14 20:18:45 pgnode1 etcd[2298]: established a TCP streaming connection with peer 8924f471f7a40300 (stream MsgApp v2 writer)
Sep 14 20:18:45 pgnode1 etcd[2298]: established a TCP streaming connection with peer 8924f471f7a40300 (stream Message writer)
Sep 14 20:18:45 pgnode1 etcd[2298]: published {Name:pgnode1 ClientURLs:[http://192.168.0.141:2379]} to cluster f94ea6552f9d580a
Sep 14 20:18:45 pgnode1 etcd[2298]: ready to serve client requests
Sep 14 20:18:45 pgnode1 etcd[2298]: ready to serve client requests
Sep 14 20:18:45 pgnode1 systemd[1]: Started etcd key-value store.
Sep 14 20:18:45 pgnode1 etcd[2298]: serving insecure client requests on 127.0.0.1:2379, this is strongly discouraged!
Sep 14 20:18:45 pgnode1 etcd[2298]: serving insecure client requests on 192.168.0.141:2379, this is strongly discouraged!

4. Download and Install PostgreSQL Server 16 on pgnode1,pgnode2, and pgnode3. Select version and platform from https://www.postgresql.org/download/linux/redhat/

# Install the repository RPM:
dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-9-x86_64/pgdg-redhat-repo-latest.noarch.rpm
# Disable the built-in PostgreSQL module:
dnf -qy module disable postgresql
# Install PostgreSQL:
dnf install -y postgresql16-server postgresql16-contrib
  • After the installation of PostgreSQL Server packages, create symbolic link of postgres binaries to /usr/sbin for patroni to work.
[root@pgnode1 ~]# ln -s /usr/pgsql-16/bin /usr/sbin

5. Download Patroni on pgnode1, pgnode2, and pgnode3.

curl https://bootstrap.pypa.io/pip/3.6/get-pip.py -o /tmp/get-pip.py -k
python3 /tmp/get-pip.py
pip install psycopg2-binary
pip install patroni[etcd,consul]

6. Configure Patroni on pgnode1, pgnode2, and pgnode3.

  • Create a directory and navigate to it. Generate SSL Certificate. Set permissions and change ownership.
---- pgnode1 ----
[root@pgnode1 ~]# mkdir -p /usr/patroni/conf
[root@pgnode1 ~]# cd /usr/patroni/conf

[root@pgnode1 conf]# openssl genrsa -out server.key 2048
[root@pgnode1 conf]# openssl req -new -x509 -days 3650 -key server.key -out server.crt -subj "/C=IN/ST=Maharashtra/L=Mumbai/O=HuzefaPatel/OU=IT/CN=pgnode1"

[root@pgnode1 conf]# chmod 400 server.*
[root@pgnode1 conf]# chown postgres:postgres server.*
---- pgnode2 ----
[root@pgnode2 ~]# mkdir -p /usr/patroni/conf
[root@pgnode2 ~]# cd /usr/patroni/conf

[root@pgnode2 conf]# openssl genrsa -out server.key 2048
[root@pgnode2 conf]# openssl req -new -x509 -days 3650 -key server.key -out server.crt -subj "/C=IN/ST=Maharashtra/L=Mumbai/O=HuzefaPatel/OU=IT/CN=pgnode2"

[root@pgnode2 conf]# chmod 400 server.*
[root@pgnode2 conf]# chown postgres:postgres server.*
---- pgnode3 ----
[root@pgnode3 ~]# mkdir -p /usr/patroni/conf
[root@pgnode3 ~]# cd /usr/patroni/conf

[root@pgnode3 conf]# openssl genrsa -out server.key 2048
[root@pgnode3 conf]# openssl req -new -x509 -days 3650 -key server.key -out server.crt -subj "/C=IN/ST=Maharashtra/L=Mumbai/O=HuzefaPatel/OU=IT/CN=pgnode3"

[root@pgnode3 conf]# chmod 400 server.*
[root@pgnode3 conf]# chown postgres:postgres server.*
  • Create YAML Configuration file for Patroni on pgnode1, pgnode2, and pgnode3.
[root@pgnode1 conf]# vi /usr/patroni/conf/patroni.yml
scope: postgres
namespace: HuzefasPatroni
name: pgnode1
restapi:
listen: 192.168.0.141:8008
connect_address: 192.168.0.141:8008
etcd:
hosts: 1192.168.0.141:2379,192.168.0.142:2379,192.168.0.143:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
maximum_lag_on_syncnode: 15000000
synchronous_mode: false
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
shared_buffers: 2GB
work_mem: 16MB
maintenance_work_mem: 1GB
max_worker_processes: 16
wal_buffers: 64MB
max_wal_size: 2GB
min_wal_size: 1GB
effective_cache_size: 64GB
fsync: on
checkpoint_completion_target: 0.9
log_rotation_size: 100MB
listen_addresses: "*"
max_connections: 2000
temp_buffers: 4MB
ssl: true
ssl_cert_file: /usr/patroni/conf/server.crt
ssl_key_file: /usr/patroni/conf/server.key
initdb:
- encoding: UTF8
- data-checksums
pg_hba:
- host replication replicator 127.0.0.1/32 md5
- host replication replicator 192.168.0.141/32 md5
- host replication replicator 192.168.0.142/32 md5
- host replication replicator 192.168.0.143/32 md5
- host all all 0.0.0.0/0 md5
users:
admin:
password: admin
options:
- createrole
- createdb
postgresql:
listen: 192.168.0.141:5432
connect_address: 192.168.0.141:5432
data_dir: /var/lib/pgsql/16/data
bin_dir: /usr/pgsql-16/bin
pgpass: /tmp/pgpass
authentication:
replication:
username: replicator
password: replicator
superuser:
username: postgres
password: postgres
rewind:
username: pgrewind
password: pgrewind
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: true
---- pgnode2 ----
[root@pgnode2 conf]# vi /usr/patroni/conf/patroni.yml
scope: postgres
namespace: HuzefasPatroni
name: pgnode2
restapi:
listen: 192.168.0.142:8008
connect_address: 192.168.0.142:8008
etcd:
hosts: 1192.168.0.141:2379,192.168.0.142:2379,192.168.0.143:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
maximum_lag_on_syncnode: 15000000
synchronous_mode: false
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
shared_buffers: 2GB
work_mem: 16MB
maintenance_work_mem: 1GB
max_worker_processes: 16
wal_buffers: 64MB
max_wal_size: 2GB
min_wal_size: 1GB
effective_cache_size: 64GB
fsync: on
checkpoint_completion_target: 0.9
log_rotation_size: 100MB
listen_addresses: "*"
max_connections: 2000
temp_buffers: 4MB
ssl: true
ssl_cert_file: /usr/patroni/conf/server.crt
ssl_key_file: /usr/patroni/conf/server.key
initdb:
- encoding: UTF8
- data-checksums
pg_hba:
- host replication replicator 127.0.0.1/32 md5
- host replication replicator 192.168.0.141/32 md5
- host replication replicator 192.168.0.142/32 md5
- host replication replicator 192.168.0.143/32 md5
- host all all 0.0.0.0/0 md5
users:
admin:
password: admin
options:
- createrole
- createdb
postgresql:
listen: 192.168.0.142:5432
connect_address: 192.168.0.142:5432
data_dir: /var/lib/pgsql/16/data
bin_dir: /usr/pgsql-16/bin
pgpass: /tmp/pgpass
authentication:
replication:
username: replicator
password: replicator
superuser:
username: postgres
password: postgres
rewind:
username: pgrewind
password: pgrewind
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: true
---- pgnode3 ----
[root@pgnode3 conf]# vi /usr/patroni/conf/patroni.yml
scope: postgres
namespace: HuzefasPatroni
name: pgnode3
restapi:
listen: 192.168.0.144:8008
connect_address: 192.168.0.143:8008
etcd:
hosts: 1192.168.0.141:2379,192.168.0.142:2379,192.168.0.143:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
maximum_lag_on_syncnode: 15000000
synchronous_mode: false
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
shared_buffers: 2GB
work_mem: 16MB
maintenance_work_mem: 1GB
max_worker_processes: 16
wal_buffers: 64MB
max_wal_size: 2GB
min_wal_size: 1GB
effective_cache_size: 64GB
fsync: on
checkpoint_completion_target: 0.9
log_rotation_size: 100MB
listen_addresses: "*"
max_connections: 2000
temp_buffers: 4MB
ssl: true
ssl_cert_file: /usr/patroni/conf/server.crt
ssl_key_file: /usr/patroni/conf/server.key
initdb:
- encoding: UTF8
- data-checksums
pg_hba:
- host replication replicator 127.0.0.1/32 md5
- host replication replicator 192.168.0.141/32 md5
- host replication replicator 192.168.0.142/32 md5
- host replication replicator 192.168.0.143/32 md5
- host all all 0.0.0.0/0 md5
users:
admin:
password: admin
options:
- createrole
- createdb
postgresql:
listen: 192.168.0.143:5432
connect_address: 192.168.0.143:5432
data_dir: /var/lib/pgsql/16/data
bin_dir: /usr/pgsql-16/bin
pgpass: /tmp/pgpass
authentication:
replication:
username: replicator
password: replicator
superuser:
username: postgres
password: postgres
rewind:
username: pgrewind
password: pgrewind
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: true
  • Create Patroni service file on pgnode1, pgnode2 and pgnode3.
[root@pgnode1 conf]# vi /usr/lib/systemd/system/patroni.service
 [Unit]
 Description=patroni
 Documentation=https://patroni.readthedocs.io/en/latest/index.html
 After=syslog.target network.target etcd.target
 Wants=network-online.target
 [Service]
 Type=simple
 User=postgres
 Group=postgres
 PermissionsStartOnly=true
 ExecStart=/usr/local/bin/patroni /usr/patroni/conf/patroni.yml
 ExecReload=/bin/kill -HUP $MAINPID
 LimitNOFILE=65536
 KillMode=process
 KillSignal=SIGINT
 Restart=on-abnormal
 RestartSec=30s
 TimeoutSec=0
 [Install]
 WantedBy=multi-user.target

7. Start the Patroni service on pgnode1, pgnode2 and pgnode3.

[root@pgnode1 conf]# systemctl daemon-reload
 [root@pgnode1 conf]# systemctl start patroni
 [root@pgnode1 conf]# systemctl status patroni
 ● patroni.service - patroni
      Loaded: loaded (/usr/lib/systemd/system/patroni.service; disabled; preset: disabled)
      Active: active (running) since Sat 2024-09-14 21:11:03 IST; 8s ago
        Docs: https://patroni.readthedocs.io/en/latest/index.html
    Main PID: 3343 (patroni)
       Tasks: 13 (limit: 21742)
      Memory: 484.0M
         CPU: 2.968s
      CGroup: /system.slice/patroni.service
              ├─3343 /usr/bin/python3 /usr/local/bin/patroni /usr/patroni/conf/patroni.yml
              ├─3362 /usr/pgsql-16/bin/postgres -D /var/lib/pgsql/16/data --config-file=/var/lib/pgsql/16/data/postgresql.conf --listen_addresses=192.168.0.141 --port=5432 --cluster_name=postgres --wal_level=r>
              ├─3364 "postgres: postgres: logger "
              ├─3365 "postgres: postgres: checkpointer "
              ├─3366 "postgres: postgres: background writer "
              ├─3368 "postgres: postgres: walwriter "
              ├─3369 "postgres: postgres: autovacuum launcher "
              ├─3370 "postgres: postgres: logical replication launcher "
              └─3377 "postgres: postgres: postgres postgres 192.168.0.141(36374) idle"
 Sep 14 21:11:08 pgnode1 patroni[3343]: 2024-09-14 21:11:08,425 INFO: postmaster pid=3362
 Sep 14 21:11:08 pgnode1 patroni[3363]: 192.168.0.141:5432 - no response
 Sep 14 21:11:08 pgnode1 patroni[3362]: 2024-09-14 21:11:08.556 IST [3362] LOG:  redirecting log output to logging collector process
 Sep 14 21:11:08 pgnode1 patroni[3362]: 2024-09-14 21:11:08.556 IST [3362] HINT:  Future log output will appear in directory "log".
 Sep 14 21:11:09 pgnode1 patroni[3371]: 192.168.0.141:5432 - accepting connections
 Sep 14 21:11:09 pgnode1 patroni[3375]: 192.168.0.141:5432 - accepting connections
 Sep 14 21:11:09 pgnode1 patroni[3343]: 2024-09-14 21:11:09,613 INFO: establishing a new patroni heartbeat connection to postgres
 Sep 14 21:11:09 pgnode1 patroni[3343]: 2024-09-14 21:11:09,659 INFO: running post_bootstrap
 Sep 14 21:11:09 pgnode1 patroni[3343]: 2024-09-14 21:11:09,731 WARNING: Could not activate Linux watchdog device: Can't open watchdog device: [Errno 2] No such file or directory: '/dev/watchdog'
 Sep 14 21:11:09 pgnode1 patroni[3343]: 2024-09-14 21:11:09,764 INFO: initialized a new cluster

8. Enable patroni and etcd services on pgnode1, pgnode2 and pgnode3.

[root@pgnode1 conf]# systemctl enable etcd
 Created symlink /etc/systemd/system/multi-user.target.wants/etcd.service → /etc/systemd/system/etcd.service.
 [root@pgnode1 conf]# systemctl enable patroni
 Created symlink /etc/systemd/system/multi-user.target.wants/patroni.service → /usr/lib/systemd/system/patroni.service.

9. Configure HA Proxy on master node.

  • Install HA Proxy package
[root@master ~]# yum -y install haproxy
  • Create systemd configuration file for HA Proxy
[root@haproxy ~]# vi /etc/systemd/system/multi-user.target.wants/haproxy.service
 [Unit]
 Description=HAProxy Load Balancer
 After=network-online.target
 Wants=network-online.target
 [Service]
 EnvironmentFile=-/etc/sysconfig/haproxy
 Environment="CONFIG=/etc/haproxy/haproxy.cfg" "PIDFILE=/run/haproxy.pid" "CFGDIR=/etc/haproxy/conf.d"
 ExecStartPre=/usr/sbin/haproxy -f $CONFIG -f $CFGDIR -c -q $OPTIONS
 ExecStart=/usr/sbin/haproxy -Ws -f $CONFIG -f $CFGDIR -p $PIDFILE $OPTIONS
 ExecReload=/usr/sbin/haproxy -f $CONFIG -f $CFGDIR -c -q $OPTIONS
 ExecReload=/bin/kill -USR2 $MAINPID
 KillMode=mixed
 SuccessExitStatus=143
 Type=notify
 [Install]
 WantedBy=multi-user.target
  • Edit HA Proxy configuration file, remove its content and add below content to it.
global
     log 127.0.0.1 local2
     chroot /var/lib/haproxy
     pidfile /var/run/haproxy.pid
     maxconn 6000
     user haproxy
     group haproxy
     daemon
     stats socket /var/lib/haproxy/stats
 defaults
     mode tcp
     log global
     retries 3
     timeout queue 1m
     timeout connect 10s
     timeout client 31m
     timeout server 31m
     timeout check 10s
     maxconn 3000
 listen stats
     mode http
     bind *:7000
     stats enable
     stats uri /
 listen postgres
     bind *:5000
     option httpchk
     http-check expect status 200
     default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
     server patroni1 192.168.0.124:5432 maxconn 2000 check port 8008
     server patroni2 192.168.0.125:5432 maxconn 2000 check port 8008
     server patroni3 192.168.0.126:5432 maxconn 2000 check port 8008
 listen postgres-readonly
     bind *:6000
     option httpchk GET /replica
     http-check expect status 200
     default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
     server patroni1 192.168.0.124:5432 maxconn 2000 check port 8008
     server patroni2 192.168.0.125:5432 maxconn 2000 check port 8008
     server patroni3 192.168.0.126:5432 maxconn 2000 check port 8008
  • Start the HA Proxy service
[root@master ~]# systemctl start haproxy
 [root@master ~]# systemctl status haproxy
 ● haproxy.service - HAProxy Load Balancer
      Loaded: loaded (/usr/lib/systemd/system/haproxy.service; disabled; preset: disabled)
      Active: active (running) since Sat 2024-09-14 21:35:33 IST; 8s ago
     Process: 1863 ExecStartPre=/usr/sbin/haproxy -f $CONFIG -f $CFGDIR -c -q $OPTIONS (code=exited, status=0/SUCCESS)
    Main PID: 1865 (haproxy)
       Tasks: 5 (limit: 21742)
      Memory: 5.1M
         CPU: 49ms
  • Enable HA Proxy service on master node.
root@master ~]# systemctl enable haproxy
 Created symlink /etc/systemd/system/multi-user.target.wants/haproxy.service → /usr/lib/systemd/system/haproxy.service.

10. The setup of Patroni cluster is now completed. Let us now try connecting to it.

[root@pgnode1 ~]# export PGPASSWORD=postgres
[root@pgnode1 ~]# psql -h master -d postgres -U postgres -p 5000
psql (16.4)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.
postgres=#

We’re able to connect using master(HA Proxy) node. Per our haproxy configuration 5000 port is for read write connection and 6000 is for read only connections. Meaning if we connect on 5000 port, the connection would go to primary(Leader) node in the cluster. However, if the connection is made on 6000 port, it will go to one of the standbys(Replica). Let’s test this.

[root@pgnode1 ~]# psql -h master -d postgres -U postgres -p 5000 -c "select pg_is_in_recovery(),inet_server_addr();"
  pg_is_in_recovery | inet_server_addr
 -------------------+------------------
  f                 | 192.168.0.141
(1 row)

When we connected to 5000 port, the request went to 192.168.0.141, that is Leader node.

[root@pgnode1 ~]# psql -h master -d postgres -U postgres -p 6000 -c "select pg_is_in_recovery(),inet_server_addr();"
  pg_is_in_recovery | inet_server_addr
 -------------------+------------------
  t                 | 192.168.0.142
 (1 row)
 [root@pgnode1 ~]# psql -h master -d postgres -U postgres -p 6000 -c "select pg_is_in_recovery(),inet_server_addr();"
  pg_is_in_recovery | inet_server_addr
 -------------------+------------------
  t                 | 192.168.0.143
 (1 row)
 [root@pgnode1 ~]# psql -h master -d postgres -U postgres -p 6000 -c "select pg_is_in_recovery(),inet_server_addr();"
  pg_is_in_recovery | inet_server_addr
 -------------------+------------------
  t                 | 192.168.0.142
 (1 row)
 [root@pgnode1 ~]# psql -h master -d postgres -U postgres -p 6000 -c "select pg_is_in_recovery(),inet_server_addr();"
  pg_is_in_recovery | inet_server_addr
 -------------------+------------------
  t                 | 192.168.0.143
 (1 row)

And when we tried on 6000, requests went on 192.168.0.142 and 192.168.0.143 nodes which are replicas.

11. We can check the status of the cluster, test failover and perform switchver of leader node using patronictl utility.

[root@pgnode1 ~]# patronictl -c /usr/patroni/conf/patroni.yml list
 Cluster: postgres (7414524298744491291) ------+----+-----------+--------------+
 | Member  | Host          | Role    | State     | TL | Lag in MB | Tags         |
 +---------+---------------+---------+-----------+----+-----------+--------------+
 | pgnode1 | 192.168.0.141 | Leader  | running   |  1 |           | nosync: true |
 | pgnode2 | 192.168.0.142 | Replica | streaming |  1 |         0 | nosync: true |
 | pgnode3 | 192.168.0.143 | Replica | streaming |  1 |         0 | nosync: true |
 +---------+---------------+---------+-----------+----+-----------+--------------+ 

Visit http://masterip:7000 for HA Proxy Web Dashboard

  • Currently, the Leader node is pgnode1, let’s mimic the failure by stopping patroni service on pgnode1 and see the behavior.
[root@pgnode1 ~]# systemctl stop patroni
  • Check the status now,
root@pgnode1 ~]# patronictl -c /usr/patroni/conf/patroni.yml list
 Cluster: postgres (7414524298744491291) ------+----+-----------+--------------+
 | Member  | Host          | Role    | State     | TL | Lag in MB | Tags         |
 +---------+---------------+---------+-----------+----+-----------+--------------+
 | pgnode1 | 192.168.0.141 | Replica | stopped   |    |   unknown | nosync: true |
 | pgnode2 | 192.168.0.142 | Leader  | running   |  2 |           | nosync: true |
 | pgnode3 | 192.168.0.143 | Replica | streaming |  2 |         0 | nosync: true |
 +---------+---------------+---------+-----------+----+-----------+--------------+ 

pgnode2 has become the Leader now.

[root@pgnode1 ~]# psql -h master -d postgres -U postgres -p 5000 -c "select pg_is_in_recovery(),inet_server_addr();"
  pg_is_in_recovery | inet_server_addr
 -------------------+------------------
  f                 | 192.168.0.142
 (1 row)

Connection on 5000 port went to pgnode2 now, the new Leader node.

  • Lets start the patroni service on pgnode1.
[root@pgnode1 ~]# systemctl start patroni
  • Check the status,
[root@pgnode1 ~]# patronictl -c /usr/patroni/conf/patroni.yml list
 Cluster: postgres (7414524298744491291) ------+----+-----------+--------------+
 | Member  | Host          | Role    | State     | TL | Lag in MB | Tags         |
 +---------+---------------+---------+-----------+----+-----------+--------------+
 | pgnode1 | 192.168.0.141 | Replica | streaming |  2 |         0 | nosync: true |
 | pgnode2 | 192.168.0.142 | Leader  | running   |  2 |           | nosync: true |
 | pgnode3 | 192.168.0.143 | Replica | streaming |  2 |         0 | nosync: true |
 +---------+---------------+---------+-----------+----+-----------+--------------+ 

Now, we can see pgnode1 is back to cluster and current role is Replica

  • Let’s perform the switchover and make the pgnode1 again a Leader node.
[root@pgnode1 ~]# patronictl -c /usr/patroni/conf/patroni.yml switchover
 Current cluster topology
 Cluster: postgres (7414524298744491291) ------+----+-----------+--------------+
 | Member  | Host          | Role    | State     | TL | Lag in MB | Tags         |
 +---------+---------------+---------+-----------+----+-----------+--------------+
 | pgnode1 | 192.168.0.141 | Replica | streaming |  2 |         0 | nosync: true |
 | pgnode2 | 192.168.0.142 | Leader  | running   |  2 |           | nosync: true |
 | pgnode3 | 192.168.0.143 | Replica | streaming |  2 |         0 | nosync: true |
 +---------+---------------+---------+-----------+----+-----------+--------------+
 Primary [pgnode2]:
 Candidate ['pgnode1', 'pgnode3'] []: pgnode1
 When should the switchover take place (e.g. 2024-09-14T22:59 )  [now]:
 Are you sure you want to switchover cluster postgres, demoting current leader pgnode2? [y/N]: y
 2024-09-14 21:59:28.48118 Successfully switched over to "pgnode1"
 Cluster: postgres (7414524298744491291) ------+----+-----------+--------------+
 | Member  | Host          | Role    | State     | TL | Lag in MB | Tags         |
 +---------+---------------+---------+-----------+----+-----------+--------------+
 | pgnode1 | 192.168.0.141 | Leader  | running   |  2 |           | nosync: true |
 | pgnode2 | 192.168.0.142 | Replica | stopped   |    |   unknown | nosync: true |
 | pgnode3 | 192.168.0.143 | Replica | streaming |  2 |         0 | nosync: true |
 +---------+---------------+---------+-----------+----+-----------+--------------+ 

We can see that pgnode1 has become Leader now. pgnode2 is in stopped state.

  • Let’s check the status again in a few seconds or minutes.
[root@pgnode1 ~]# patronictl -c /usr/patroni/conf/patroni.yml list
 Cluster: postgres (7414524298744491291) ------+----+-----------+--------------+
 | Member  | Host          | Role    | State     | TL | Lag in MB | Tags         |
 +---------+---------------+---------+-----------+----+-----------+--------------+
 | pgnode1 | 192.168.0.141 | Leader  | running   |  3 |           | nosync: true |
 | pgnode2 | 192.168.0.142 | Replica | streaming |  3 |         0 | nosync: true |
 | pgnode3 | 192.168.0.143 | Replica | streaming |  3 |         0 | nosync: true |
 +---------+---------------+---------+-----------+----+-----------+--------------+ 

Now we can see pgnode2 has started streaming.

Leave a Reply

Your email address will not be published. Required fields are marked *