Patroni HA

Complete walkthrough of Postgres deployment with
full Failover capabilities, virtual IP switchover and
near realtime replication supporting RAFT and SSL.

Nodes/Instances

6 total compute nodes are required for this install
Amazon Linux 2 used for this demonstration

HA-Proxy (IP and Load Balancing)
192.168.60.100 # haproxy-01
192.168.60.101 # haproxy-02
192.168.60.102 # haproxy-03

Postgres DB (1 Master/2 Relicas)
192.168.60.103 # postgres-01
192.168.60.104 # postgres-02
192.168.60.105 # postgres-03

Install Postgres

Default installation of PostgreSQL using common RPM package
Configuration files need not be updated. Patroni will overwrite.
Auto-start to be disabled for HA-Proxy support.

sudo apt update
sudo apt install -y postgresql-common
sudo /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh
sudo systemctl stop postgresql
sudo systemctl disable postgresql

etcd / RAFT HA replication

etcd maintains a highly-available consistent state using a key-value store
paired with RAFT - a consensus algorithm
The latest version of etcd can be found here: etcd releases

sudo apt update
sudo apt-get install -y wget curl acl
wget https://github.com/etcd-io/etcd/releases/download/v3.5.17/etcd-v3.5.17-linux-amd64.tar.gz
tar xvf etcd-v3.5.17-linux-amd64.tar.gz
mv etcd-v3.5.17-linux-amd64 etcd
sudo mv etcd/etcd* /usr/local/bin/
etcd --version

If output shows install success under common path, create app user:

sudo useradd --system --home /var/lib/etcd --shell /bin/false etcd
sudo mkdir -p /etc/etcd
sudo mkdir -p /etc/etcd/ssl

Repeat etcd install across 3 postgres instances

Certificates

Secure communication must exist between all instances for security/authentication
Start with client install (your jumphost) using OpenSSL, LetsEncrypt or CA

brew install OpenSSL
$ sudo apt install opensll   ## for Linux jumphost
openssl --version

Verify complete install and configure certificate authority

mkdir certs
cd certs
openssl genrsa -out ca.key 2048
openssl req -x509 -new -nodes -key ca.key -subj "/CN=etcd-ca" -days 7300 -out ca.crt

Each DB node requires a unique certificate, identified by IP address.
Concurrently update each file to assure IPs are not duplicated:
ie. 192.168.60.103=etcd-node1, 192.168.60.104=etcd-node2, 192.168.60.105=etcd-node3

sudo mkdir -p /etc/etcd/ssl ; cd /etc/etcd/ssl/
# Generate a private key
openssl genrsa -out etcd-node1.key 2048

# Create temp file for config
cat > temp.cnf <<EOF
[ req ]
distinguished_name = req_distinguished_name
req_extensions = v3_req
[ req_distinguished_name ]
[ v3_req ]
subjectAltName = @alt_names
[ alt_names ]
IP.1 = 192.168.60.103
IP.2 = 127.0.0.1
EOF

# Create a csr
openssl req -new -key etcd-node.key -out etcd-node.csr \
  -subj "/C=US/ST=YourState/L=YourCity/O=YourOrganization/OU=YourUnit/CN=etcd-node1" \
  -config temp.cnf

# Sign the cert
openssl x509 -req -in etcd-node.csr -CA ca.crt -CAkey ca.key -CAcreateserial \
  -out etcd-node.crt -days 7300 -sha256 -extensions v3_req -extfile temp.cnf

# Verify the cert and be sure you see Subject Name Alternative

openssl x509 -in etcd-node.crt -text -noout | grep -A1 "Subject Alternative Name"

# Remove temp file

rm temp.cnf

# change ownership of files
sudo chown -R etcd:etcd /etc/etcd/
sudo chmod 600 /etc/etcd/ssl/etcd-node*.key
sudo chmod 644 /etc/etcd/ssl/etcd-node*.crt /etc/etcd/ssl/ca.crt

Configure etcd Certificates

Each DB node requires unique certificate with hostname/IP.
Update file below with this information

ETCD_NAME="postgresql-01"
ETCD_DATA_DIR="/var/lib/etcd"
ETCD_INITIAL_CLUSTER="postgresql-01=https://192.168.60.103:2380,postgresql-02=https://192.168.60.104:2380,postgresql-03=https://192.168.60.105:2380"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://192.168.60.103:2380"
ETCD_LISTEN_PEER_URLS="https://0.0.0.0:2380"
ETCD_LISTEN_CLIENT_URLS="https://0.0.0.0:2379"
ETCD_ADVERTISE_CLIENT_URLS="https://192.168.60.103:2379"
ETCD_CLIENT_CERT_AUTH="true"
ETCD_TRUSTED_CA_FILE="/etc/etcd/ssl/ca.crt"
ETCD_CERT_FILE="/etc/etcd/ssl/etcd-node1.crt"
ETCD_KEY_FILE="/etc/etcd/ssl/etcd-node1.key"
ETCD_PEER_CLIENT_CERT_AUTH="true"
ETCD_PEER_TRUSTED_CA_FILE="/etc/etcd/ssl/ca.crt"
ETCD_PEER_CERT_FILE="/etc/etcd/ssl/etcd-node1.crt"
ETCD_PEER_KEY_FILE="/etc/etcd/ssl/etcd-node1.key"

Create etcd service on each PostgreSQL node

Contents of /etc/systemd/system/etcd.service file:

[Unit]
Description=etcd key-value store
Documentation=https://github.com/etcd-io/etcd
After=network-online.target
Wants=network-online.target

[Service]
Type=notify
WorkingDirectory=/var/lib/etcd
EnvironmentFile=/etc/etcd/etcd.env
ExecStart=/usr/local/bin/etcd
Restart=always
RestartSec=10s
LimitNOFILE=40000
User=etcd
Group=etcd

[Install]
WantedBy=multi-user.target

Create ETCD_DATA_DIR directory and service
Note: HA service will “hang” on first node until second service comes alive

sudo mkdir -p /var/lib/etcd 
sudo chown -R etcd:etcd /var/lib/etcd
sudo systemctl daemon-reload
sudo systemctl enable etcd
sudo systemctl start etcd
sudo systemctl status etcd

Output of STATUS command should resemble:

● etcd.service - etcd key-value store
     Loaded: loaded (/etc/systemd/system/etcd.service; enabled; preset: enabled)

Verify etcd service using syslogs:

journalctl -xeu etcd.service <br>

░░ Subject: A start job for unit etcd.service has finished successfully
░░ Defined-By: systemd

Verify Across DB Nodes

Execute the following commands on each DB node to verify participation:

etcdctl endpoint health
etcdctl member list

Restart etcd services one final time to assure services will auto-start

sudo usermod -aG etcd $USER
sudo systemctl restart etcd

sudo etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/etcd/ssl/ca.crt \
--cert=/etc/etcd/ssl/etcd-node1.crt \
--key=/etc/etcd/ssl/etcd-node1.key \
endpoint health

sudo etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/etcd/ssl/ca.crt \
--cert=/etc/etcd/ssl/etcd-node1.crt \
--key=/etc/etcd/ssl/etcd-node1.key \
member list

verify leader using the following command:

sudo etcdctl \
  --endpoints=https://192.168.60.103:2379,https://192.168.60.104:2379,https://192.168.60.105:2379 \
  --cacert=/etc/etcd/ssl/ca.crt \
  --cert=/etc/etcd/ssl/etcd-node1.crt \
  --key=/etc/etcd/ssl/etcd-node1.key \
  endpoint status --write-out=table

output should resemble:

+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|          ENDPOINT           |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.60.103:2379 | 59afb19d7cb2565d |  3.5.17 |   20 kB |      true |      false |         2 |         12 |                 12 |        |
| https://192.168.60.104:2379 | 6338565ebcb76aa2 |  3.5.17 |   20 kB |     false |      false |         2 |         12 |                 12 |        |
| https://192.168.60.105:2379 | 9d74b3125c745c74 |  3.5.17 |   20 kB |     false |      false |         2 |         12 |                 12 |        |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

Postgres Certificates

etcd now needs permissions to automate Patroni and Postgres restart

## Generate 20 year Postgres SSL certificates 
openssl genrsa -out server.key 2048 # private key
openssl req -new -key server.key -out server.req # csr
openssl req -x509 -key server.key -in server.req -out server.crt -days 7300 # 7300 days = 20 years
# copy files to each DB node
scp server.crt server.key server.req [email protected]:/tmp
scp server.crt server.key server.req [email protected]:/tmp
scp server.crt server.key server.req [email protected]:/tmp
# on each DB node, copy files to correct location
sudo mkdir -p /var/lib/postgresql/data
sudo mkdir -p /var/lib/postgresql/ssl
sudo mv server.crt server.key server.req /var/lib/postgresql/ssl
# correct permissions
sudo chmod 600 /var/lib/postgresql/ssl/server.key
sudo chmod 644 /var/lib/postgresql/ssl/server.crt
sudo chmod 600 /var/lib/postgresql/ssl/server.req
sudo chown postgres:postgres /var/lib/postgresql/data
sudo chown postgres:postgres /var/lib/postgresql/ssl/server.*
sudo setfacl -m u:postgres:r /etc/etcd/ssl/ca.crt
sudo setfacl -m u:postgres:r /etc/etcd/ssl/etcd-node1.crt
sudo setfacl -m u:postgres:r /etc/etcd/ssl/etcd-node1.key

Assure SSL Certificates executed on each DB node

Patroni Installation and YAML config

Create a unique YAML file on each DB node for Patroni support
In the file below, verify “name”, “connect_address”, are unique to each node
also “password” will be the credentials used to restart services

content of /etc/patroni/config.yml # unique to each DB node

scope: postgresql-cluster
namespace: /service/
name: postgresql-02  # Unique name for Node 2

etcd3:
  hosts: 192.168.60.103:2379,192.168.60.104:2379,192.168.60.105:2379  # etcd cluster nodes
  protocol: https
  cacert: /etc/etcd/ssl/ca.crt
  cert: /etc/etcd/ssl/etcd-node2.crt  # Node 2's etcd certificate
  key: /etc/etcd/ssl/etcd-node2.key  # Node 2's etcd key

restapi:
  listen: 0.0.0.0:8008
  connect_address: 192.168.60.104:8008  # IP for Node 2's REST API
  certfile: /var/lib/postgresql/ssl/server.pem

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
        parameters:
        ssl: 'on'
        ssl_cert_file: /var/lib/postgresql/ssl/server.crt
        ssl_key_file: /var/lib/postgresql/ssl/server.key
        pg_hba:
        - hostssl replication replicator 127.0.0.1/32 md5
        - hostssl replication replicator 192.168.60.103/32 md5
        - hostssl replication replicator 192.168.60.104/32 md5
        - hostssl replication replicator 192.168.60.105/32 md5
        - hostssl all all 127.0.0.1/32 md5
        - hostssl all all 0.0.0.0/0 md5
  initdb:
    - encoding: UTF8
    - data-checksums

postgresql:
  listen: 0.0.0.0:5432
  connect_address: 192.168.60.104:5432  # IP for Node 2's PostgreSQL
  data_dir: /var/lib/postgresql/data
  bin_dir: /usr/lib/postgresql/17/bin
  authentication:
    superuser:
      username: postgres
      password: cnV2abjbDpbh64e12987wR4mj5kQ3456Y0Qf  # Superuser password (provided)
    replication:
      username: replicator
      password: sad9a23jga8jsuedrwtsskj74567suiuwe23  # Replication password (provided)
  parameters:
    max_connections: 100
    shared_buffers: 256MB

tags:
  nofailover: false
  noloadbalance: false
  clonefrom: false

Patroni Certificates

Create PEM file from Certificates for Patroni communication

PEM = Privacy Enhanced Mail for secure communication

sudo sh -c 'cat /var/lib/postgresql/ssl/server.crt /var/lib/postgresql/ssl/server.key > /var/lib/postgresql/ssl/server.pem'
sudo chown postgres:postgres /var/lib/postgresql/ssl/server.pem
sudo chmod 600 /var/lib/postgresql/ssl/server.pem
# Verify config
sudo openssl x509 -in /var/lib/postgresql/ssl/server.pem -text -noout

Restart Patroni and Cluster from “New” to “Existing”

 sudo systemctl restart patroni
 journalctl -u patroni -f

Verify logs show leader and bootstrap success

update /etc/etcd/etcd.env

ETCD_INITIAL_CLUSTER_STATE="existing" # from "new"

Repeat this process across all three database nodes

Verify Postgres HA cluster

Find “Leader” node using the following:

curl -k https://192.168.60.103:8008/primary
curl -k https://192.168.60.104:8008/primary
curl -k https://192.168.60.105:8008/primary

Verify/Edit pb_hba after bootstrap

With cluster config node in HA mode, you must use patroni to edit nodes concurrently

sudo patronictl -c /etc/patroni/config.yml show-config
sudo patronictl -c /etc/patroni/config.yml edit-config

VIP Management for port 5432 traffic

The interaction of HA-Proxy and KeepAliveD are used for failover
While HA-Proxy traditionally manages HTTP traffic, we configure it for
port 5432. KeepAliveD is then used to manage the virtual IP using
Virtual Router Redundancy Protocol and fails over as-needed.

This must be completed on each HA-Proxy Node (192.168.60.100, 101, 102)

HA Proxy Installation

Directing and Managing TCP (OSI Layer 4) is performed by HA Proxy

sudo apt -y install haproxy

Update config file /etc/haproxy/haproxy.cfg

frontend postgres_frontend
    bind *:5432
    mode tcp
    default_backend postgres_backend

backend postgres_backend
    mode tcp
    option tcp-check
    option httpchk GET /primary  # patroni provides an endpoint to check node roles
    http-check expect status 200  # expect 200 for the primary node
    timeout connect 5s
    timeout client 30s
    timeout server 30s
    server postgresql-01 192.168.60.103:5432 port 8008 check check-ssl verify none
    server postgresql-02 192.168.60.104:5432 port 8008 check check-ssl verify none
    server postgresql-03 192.168.60.105:5432 port 8008 check check-ssl verify none

Start HA-Proxy service and verify log output

sudo systemctl reload haproxy
sudo tail -f /var/log/syslog | grep haproxy

KeepAliveD Installation

Update the contents of the following file on each HA-Proxy node:
/etc/keepalived/keepalived.conf

global_defs {
    enable_script_security
    script_user keepalived_script
}

vrrp_script check_haproxy {
    script "/etc/keepalived/check_haproxy.sh"
    interval 2
    fall 3
    rise 2
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0 # update with your nic
    virtual_router_id 51
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass tDHjh7by # change
    }
    virtual_ipaddress {
        192.168.60.110
    }
    track_script {
        check_haproxy
    }
}

Only 1 node must have the following:
STATE Master
priority 100
Replace 2 other haproxy nodes with:
STATE BACKUP
priority 90

Create Check script on each proxy node:
Contents of: /etc/keepalived/check_haproxy.sh

#!/bin/bash

# Define the port to check (e.g., HAProxy frontend port)
PORT=5432

# Check if HAProxy is running
if ! pidof haproxy > /dev/null; then
    echo "HAProxy is not running"
    exit 1
fi

# Check if HAProxy is listening on the expected port
if ! ss -ltn | grep -q ":${PORT}"; then
    echo "HAProxy is not listening on port ${PORT}"
    exit 2
fi

# All checks passed
exit 0

Create user with permissions to execute this check script:

sudo useradd -r -s /bin/false keepalived_script
sudo chmod +x /etc/keepalived/check_haproxy.sh
sudo chown keepalived_script:keepalived_script /etc/keepalived/check_haproxy.sh
sudo chmod 700 /etc/keepalived/check_haproxy.sh

Start Services and Verify

sudo systemctl restart keepalived
sudo journalctl -u keepalived -f
ping 192.168.60.110

Test and Repeat

connect to database using IP
Create transactions and kill primary node
Client connections should restart based on TCP retry_timeout

References

etcd Quickstart
RAFT Explained
HA-Proxy Intro
Techno Tim HA Walkthrough + Video

Last updated on 28 Jun 2025
Published on 28 Jun 2025