escamil - Hugo deployed website

Rob's Menagerie (recipes and such)

Patroni HA

Complete walkthrough of Postgres deployment with
full Failover capabilities, virtual IP switchover and
near realtime replication supporting RAFT and SSL.

Nodes/Instances

6 total compute nodes are required for this install
Amazon Linux 2 used for this demonstration

HA-Proxy (IP and Load Balancing)
192.168.60.100 # haproxy-01
192.168.60.101 # haproxy-02
192.168.60.102 # haproxy-03

Postgres DB (1 Master/2 Relicas)
192.168.60.103 # postgres-01
192.168.60.104 # postgres-02
192.168.60.105 # postgres-03

Install Postgres

Default installation of PostgreSQL using common RPM package
Configuration files need not be updated. Patroni will overwrite.
Auto-start to be disabled for HA-Proxy support.

sudo apt update
sudo apt install -y postgresql-common
sudo /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh
sudo systemctl stop postgresql
sudo systemctl disable postgresql

etcd / RAFT HA replication

etcd maintains a highly-available consistent state using a key-value store
paired with RAFT - a consensus algorithm
The latest version of etcd can be found here: etcd releases

sudo apt update
sudo apt-get install -y wget curl acl
wget https://github.com/etcd-io/etcd/releases/download/v3.5.17/etcd-v3.5.17-linux-amd64.tar.gz
tar xvf etcd-v3.5.17-linux-amd64.tar.gz
mv etcd-v3.5.17-linux-amd64 etcd
sudo mv etcd/etcd* /usr/local/bin/
etcd --version

If output shows install success under common path, create app user:

sudo useradd --system --home /var/lib/etcd --shell /bin/false etcd
sudo mkdir -p /etc/etcd
sudo mkdir -p /etc/etcd/ssl

Repeat etcd install across 3 postgres instances

Certificates

Secure communication must exist between all instances for security/authentication
Start with client install (your jumphost) using OpenSSL, LetsEncrypt or CA

brew install OpenSSL
$ sudo apt install opensll   ## for Linux jumphost
openssl --version

Verify complete install and configure certificate authority

mkdir certs
cd certs
openssl genrsa -out ca.key 2048
openssl req -x509 -new -nodes -key ca.key -subj "/CN=etcd-ca" -days 7300 -out ca.crt

Each DB node requires a unique certificate, identified by IP address.
Concurrently update each file to assure IPs are not duplicated:
ie. 192.168.60.103=etcd-node1, 192.168.60.104=etcd-node2, 192.168.60.105=etcd-node3

sudo mkdir -p /etc/etcd/ssl ; cd /etc/etcd/ssl/
# Generate a private key
openssl genrsa -out etcd-node1.key 2048

# Create temp file for config
cat > temp.cnf <<EOF
[ req ]
distinguished_name = req_distinguished_name
req_extensions = v3_req
[ req_distinguished_name ]
[ v3_req ]
subjectAltName = @alt_names
[ alt_names ]
IP.1 = 192.168.60.103
IP.2 = 127.0.0.1
EOF

# Create a csr
openssl req -new -key etcd-node.key -out etcd-node.csr \
  -subj "/C=US/ST=YourState/L=YourCity/O=YourOrganization/OU=YourUnit/CN=etcd-node1" \
  -config temp.cnf

# Sign the cert
openssl x509 -req -in etcd-node.csr -CA ca.crt -CAkey ca.key -CAcreateserial \
  -out etcd-node.crt -days 7300 -sha256 -extensions v3_req -extfile temp.cnf

# Verify the cert and be sure you see Subject Name Alternative

openssl x509 -in etcd-node.crt -text -noout | grep -A1 "Subject Alternative Name"

# Remove temp file

rm temp.cnf

# change ownership of files
sudo chown -R etcd:etcd /etc/etcd/
sudo chmod 600 /etc/etcd/ssl/etcd-node*.key
sudo chmod 644 /etc/etcd/ssl/etcd-node*.crt /etc/etcd/ssl/ca.crt

Configure etcd Certificates

Each DB node requires unique certificate with hostname/IP.
Update file below with this information

ETCD_NAME="postgresql-01"
ETCD_DATA_DIR="/var/lib/etcd"
ETCD_INITIAL_CLUSTER="postgresql-01=https://192.168.60.103:2380,postgresql-02=https://192.168.60.104:2380,postgresql-03=https://192.168.60.105:2380"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://192.168.60.103:2380"
ETCD_LISTEN_PEER_URLS="https://0.0.0.0:2380"
ETCD_LISTEN_CLIENT_URLS="https://0.0.0.0:2379"
ETCD_ADVERTISE_CLIENT_URLS="https://192.168.60.103:2379"
ETCD_CLIENT_CERT_AUTH="true"
ETCD_TRUSTED_CA_FILE="/etc/etcd/ssl/ca.crt"
ETCD_CERT_FILE="/etc/etcd/ssl/etcd-node1.crt"
ETCD_KEY_FILE="/etc/etcd/ssl/etcd-node1.key"
ETCD_PEER_CLIENT_CERT_AUTH="true"
ETCD_PEER_TRUSTED_CA_FILE="/etc/etcd/ssl/ca.crt"
ETCD_PEER_CERT_FILE="/etc/etcd/ssl/etcd-node1.crt"
ETCD_PEER_KEY_FILE="/etc/etcd/ssl/etcd-node1.key"

Create etcd service on each PostgreSQL node

Contents of /etc/systemd/system/etcd.service file:

[Unit]
Description=etcd key-value store
Documentation=https://github.com/etcd-io/etcd
After=network-online.target
Wants=network-online.target

[Service]
Type=notify
WorkingDirectory=/var/lib/etcd
EnvironmentFile=/etc/etcd/etcd.env
ExecStart=/usr/local/bin/etcd
Restart=always
RestartSec=10s
LimitNOFILE=40000
User=etcd
Group=etcd

[Install]
WantedBy=multi-user.target

Create ETCD_DATA_DIR directory and service
Note: HA service will “hang” on first node until second service comes alive

sudo mkdir -p /var/lib/etcd 
sudo chown -R etcd:etcd /var/lib/etcd
sudo systemctl daemon-reload
sudo systemctl enable etcd
sudo systemctl start etcd
sudo systemctl status etcd

Output of STATUS command should resemble:

● etcd.service - etcd key-value store
     Loaded: loaded (/etc/systemd/system/etcd.service; enabled; preset: enabled)

Verify etcd service using syslogs:

journalctl -xeu etcd.service <br>

░░ Subject: A start job for unit etcd.service has finished successfully
░░ Defined-By: systemd

Verify Across DB Nodes

Execute the following commands on each DB node to verify participation:

etcdctl endpoint health
etcdctl member list

Restart etcd services one final time to assure services will auto-start

sudo usermod -aG etcd $USER
sudo systemctl restart etcd

sudo etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/etcd/ssl/ca.crt \
--cert=/etc/etcd/ssl/etcd-node1.crt \
--key=/etc/etcd/ssl/etcd-node1.key \
endpoint health

sudo etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/etcd/ssl/ca.crt \
--cert=/etc/etcd/ssl/etcd-node1.crt \
--key=/etc/etcd/ssl/etcd-node1.key \
member list

verify leader using the following command:

sudo etcdctl \
  --endpoints=https://192.168.60.103:2379,https://192.168.60.104:2379,https://192.168.60.105:2379 \
  --cacert=/etc/etcd/ssl/ca.crt \
  --cert=/etc/etcd/ssl/etcd-node1.crt \
  --key=/etc/etcd/ssl/etcd-node1.key \
  endpoint status --write-out=table

output should resemble:

+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|          ENDPOINT           |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.60.103:2379 | 59afb19d7cb2565d |  3.5.17 |   20 kB |      true |      false |         2 |         12 |                 12 |        |
| https://192.168.60.104:2379 | 6338565ebcb76aa2 |  3.5.17 |   20 kB |     false |      false |         2 |         12 |                 12 |        |
| https://192.168.60.105:2379 | 9d74b3125c745c74 |  3.5.17 |   20 kB |     false |      false |         2 |         12 |                 12 |        |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

Postgres Certificates

etcd now needs permissions to automate Patroni and Postgres restart

## Generate 20 year Postgres SSL certificates 
openssl genrsa -out server.key 2048 # private key
openssl req -new -key server.key -out server.req # csr
openssl req -x509 -key server.key -in server.req -out server.crt -days 7300 # 7300 days = 20 years
# copy files to each DB node
scp server.crt server.key server.req [email protected]:/tmp
scp server.crt server.key server.req [email protected]:/tmp
scp server.crt server.key server.req [email protected]:/tmp
# on each DB node, copy files to correct location
sudo mkdir -p /var/lib/postgresql/data
sudo mkdir -p /var/lib/postgresql/ssl
sudo mv server.crt server.key server.req /var/lib/postgresql/ssl
# correct permissions
sudo chmod 600 /var/lib/postgresql/ssl/server.key
sudo chmod 644 /var/lib/postgresql/ssl/server.crt
sudo chmod 600 /var/lib/postgresql/ssl/server.req
sudo chown postgres:postgres /var/lib/postgresql/data
sudo chown postgres:postgres /var/lib/postgresql/ssl/server.*
sudo setfacl -m u:postgres:r /etc/etcd/ssl/ca.crt
sudo setfacl -m u:postgres:r /etc/etcd/ssl/etcd-node1.crt
sudo setfacl -m u:postgres:r /etc/etcd/ssl/etcd-node1.key

Assure SSL Certificates executed on each DB node

Patroni Installation and YAML config

Create a unique YAML file on each DB node for Patroni support
In the file below, verify “name”, “connect_address”, are unique to each node
also “password” will be the credentials used to restart services

content of /etc/patroni/config.yml # unique to each DB node

scope: postgresql-cluster
namespace: /service/
name: postgresql-02  # Unique name for Node 2

etcd3:
  hosts: 192.168.60.103:2379,192.168.60.104:2379,192.168.60.105:2379  # etcd cluster nodes
  protocol: https
  cacert: /etc/etcd/ssl/ca.crt
  cert: /etc/etcd/ssl/etcd-node2.crt  # Node 2's etcd certificate
  key: /etc/etcd/ssl/etcd-node2.key  # Node 2's etcd key

restapi:
  listen: 0.0.0.0:8008
  connect_address: 192.168.60.104:8008  # IP for Node 2's REST API
  certfile: /var/lib/postgresql/ssl/server.pem

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
        parameters:
        ssl: 'on'
        ssl_cert_file: /var/lib/postgresql/ssl/server.crt
        ssl_key_file: /var/lib/postgresql/ssl/server.key
        pg_hba:
        - hostssl replication replicator 127.0.0.1/32 md5
        - hostssl replication replicator 192.168.60.103/32 md5
        - hostssl replication replicator 192.168.60.104/32 md5
        - hostssl replication replicator 192.168.60.105/32 md5
        - hostssl all all 127.0.0.1/32 md5
        - hostssl all all 0.0.0.0/0 md5
  initdb:
    - encoding: UTF8
    - data-checksums

postgresql:
  listen: 0.0.0.0:5432
  connect_address: 192.168.60.104:5432  # IP for Node 2's PostgreSQL
  data_dir: /var/lib/postgresql/data
  bin_dir: /usr/lib/postgresql/17/bin
  authentication:
    superuser:
      username: postgres
      password: cnV2abjbDpbh64e12987wR4mj5kQ3456Y0Qf  # Superuser password (provided)
    replication:
      username: replicator
      password: sad9a23jga8jsuedrwtsskj74567suiuwe23  # Replication password (provided)
  parameters:
    max_connections: 100
    shared_buffers: 256MB

tags:
  nofailover: false
  noloadbalance: false
  clonefrom: false

Patroni Certificates

Create PEM file from Certificates for Patroni communication

  • PEM = Privacy Enhanced Mail for secure communication
sudo sh -c 'cat /var/lib/postgresql/ssl/server.crt /var/lib/postgresql/ssl/server.key > /var/lib/postgresql/ssl/server.pem'
sudo chown postgres:postgres /var/lib/postgresql/ssl/server.pem
sudo chmod 600 /var/lib/postgresql/ssl/server.pem
# Verify config
sudo openssl x509 -in /var/lib/postgresql/ssl/server.pem -text -noout

Restart Patroni and Cluster from “New” to “Existing”

 sudo systemctl restart patroni
 journalctl -u patroni -f

Verify logs show leader and bootstrap success

update /etc/etcd/etcd.env

ETCD_INITIAL_CLUSTER_STATE="existing" # from "new"

Repeat this process across all three database nodes

Verify Postgres HA cluster

Find “Leader” node using the following:

curl -k https://192.168.60.103:8008/primary
curl -k https://192.168.60.104:8008/primary
curl -k https://192.168.60.105:8008/primary

Verify/Edit pb_hba after bootstrap

With cluster config node in HA mode, you must use patroni to edit nodes concurrently

sudo patronictl -c /etc/patroni/config.yml show-config
sudo patronictl -c /etc/patroni/config.yml edit-config

VIP Management for port 5432 traffic

The interaction of HA-Proxy and KeepAliveD are used for failover
While HA-Proxy traditionally manages HTTP traffic, we configure it for
port 5432. KeepAliveD is then used to manage the virtual IP using
Virtual Router Redundancy Protocol and fails over as-needed.

This must be completed on each HA-Proxy Node (192.168.60.100, 101, 102)

HA Proxy Installation

Directing and Managing TCP (OSI Layer 4) is performed by HA Proxy

sudo apt -y install haproxy

Update config file /etc/haproxy/haproxy.cfg

frontend postgres_frontend
    bind *:5432
    mode tcp
    default_backend postgres_backend

backend postgres_backend
    mode tcp
    option tcp-check
    option httpchk GET /primary  # patroni provides an endpoint to check node roles
    http-check expect status 200  # expect 200 for the primary node
    timeout connect 5s
    timeout client 30s
    timeout server 30s
    server postgresql-01 192.168.60.103:5432 port 8008 check check-ssl verify none
    server postgresql-02 192.168.60.104:5432 port 8008 check check-ssl verify none
    server postgresql-03 192.168.60.105:5432 port 8008 check check-ssl verify none

Start HA-Proxy service and verify log output

sudo systemctl reload haproxy
sudo tail -f /var/log/syslog | grep haproxy

KeepAliveD Installation

Update the contents of the following file on each HA-Proxy node:
/etc/keepalived/keepalived.conf

global_defs {
    enable_script_security
    script_user keepalived_script
}

vrrp_script check_haproxy {
    script "/etc/keepalived/check_haproxy.sh"
    interval 2
    fall 3
    rise 2
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0 # update with your nic
    virtual_router_id 51
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass tDHjh7by # change
    }
    virtual_ipaddress {
        192.168.60.110
    }
    track_script {
        check_haproxy
    }
}

Only 1 node must have the following:
STATE Master
priority 100
Replace 2 other haproxy nodes with:
STATE BACKUP
priority 90

Create Check script on each proxy node:
Contents of: /etc/keepalived/check_haproxy.sh

#!/bin/bash

# Define the port to check (e.g., HAProxy frontend port)
PORT=5432

# Check if HAProxy is running
if ! pidof haproxy > /dev/null; then
    echo "HAProxy is not running"
    exit 1
fi

# Check if HAProxy is listening on the expected port
if ! ss -ltn | grep -q ":${PORT}"; then
    echo "HAProxy is not listening on port ${PORT}"
    exit 2
fi

# All checks passed
exit 0

Create user with permissions to execute this check script:

sudo useradd -r -s /bin/false keepalived_script
sudo chmod +x /etc/keepalived/check_haproxy.sh
sudo chown keepalived_script:keepalived_script /etc/keepalived/check_haproxy.sh
sudo chmod 700 /etc/keepalived/check_haproxy.sh

Start Services and Verify

sudo systemctl restart keepalived
sudo journalctl -u keepalived -f
ping 192.168.60.110

Test and Repeat

  1. connect to database using IP
  2. Create transactions and kill primary node
  3. Client connections should restart based on TCP retry_timeout

References

etcd Quickstart
RAFT Explained
HA-Proxy Intro
Techno Tim HA Walkthrough + Video

Last updated on 28 Jun 2025
Published on 28 Jun 2025