What are the steps to configure a highly available PostgreSQL cluster using Patroni?

13 June 2024

In today's digital landscape, data availability is crucial. Businesses depend on databases to be up and running at all times. If you're relying on PostgreSQL, ensuring that your database is highly available is absolutely essential. This is where Patroni steps in, providing a robust solution for managing a PostgreSQL cluster. In this article, we will walk you through the comprehensive steps to configure a highly available PostgreSQL cluster using Patroni.

Understanding Patroni and Its Components

Before diving into the configuration process, it is crucial to understand what Patroni is and the components it interacts with. Patroni is an open-source tool designed to manage PostgreSQL clusters. It simplifies the management of high-availability PostgreSQL clusters by automating failover and replications.

Core Components of a Patroni Cluster

A typical Patroni cluster consists of the following core components:

  • PostgreSQL: The database management system that stores your data.
  • etcd: A distributed key-value store that Patroni uses for cluster coordination and leader election.
  • HAProxy: A load balancer used to direct traffic to the correct database node.
  • Patroni: The orchestration tool that ensures high availability.

Understanding these components is paramount to successfully setting up a highly available PostgreSQL cluster.

Installing and Configuring etcd

The first step in setting up a Patroni cluster is to install and configure etcd. This component plays a crucial role in maintaining cluster state and leader election.

Installing etcd

Begin by installing etcd on your servers. Use the following command to install etcd on a Ubuntu system:

sudo apt install etcd

Configuring etcd for High Availability

Once installed, configure etcd to ensure it can handle the demands of a highly available system. Create an etcd configuration file at /etc/etcd/etcd.conf.yml and set the following parameters:

name: 'etcd-node1'
data-dir: '/var/lib/etcd'
initial-cluster-state: 'new'
initial-cluster-token: 'etcd-cluster'
initial-cluster: 'etcd-node1=http://<IP1>:2380,etcd-node2=http://<IP2>:2380,etcd-node3=http://<IP3>:2380'
initial-advertise-peer-urls: 'http://<IP1>:2380'
advertise-client-urls: 'http://<IP1>:2379'
listen-peer-urls: 'http://<IP1>:2380'
listen-client-urls: 'http://<IP1>:2379'

Replace <IP1>, <IP2>, and <IP3> with the IP addresses of your etcd nodes. Start etcd on each of your nodes:

sudo systemctl start etcd
sudo systemctl enable etcd

Installing and Configuring PostgreSQL

The next step involves installing and setting up PostgreSQL on your nodes. PostgreSQL is the backbone of your cluster and must be correctly configured to work with Patroni.

Installing PostgreSQL

Use the following command to install PostgreSQL on each of your nodes:

sudo apt install postgresql

Configuring PostgreSQL for Replication

After installation, configure PostgreSQL for replication. Edit the postgresql.conf and pg_hba.conf files accordingly:

# postgresql.conf
listen_addresses = '*'
wal_level = replica
max_wal_senders = 10
synchronous_commit = 'local'
# pg_hba.conf
host replication replicator <replica_ip>/32 md5
host all all 0.0.0.0/0 md5

Create a replication user:

CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'password';

Installing and Configuring Patroni

With PostgreSQL and etcd in place, it’s time to install and configure Patroni to orchestrate the cluster.

Installing Patroni

Install Patroni using pip:

sudo apt install python3-pip
pip3 install patroni[etcd]

Configuring Patroni

Create a Patroni configuration file named patroni.yml for each node. A basic configuration looks like this:

scope: postgres-cluster
namespace: /service/
name: pg-node1

restapi:
  listen: 0.0.0.0:8008
  connect_address: <IP>:8008

etcd:
  host: <etcd_ip>:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
      parameters:
        wal_level: replica
        hot_standby: "on"
        max_connections: 100
        max_wal_senders: 10
        wal_keep_segments: 8
        archive_mode: "on"
        archive_command: 'cp %p /var/lib/postgresql/data/archive/%f'
        archive_timeout: 1800s
  initdb:
  - encoding: 'UTF8'
  - data-checksums

postgresql:
  listen: 0.0.0.0:5432
  connect_address: <IP>:5432
  data_dir: /var/lib/postgresql/data
  pgpass: /tmp/pgpass0
  authentication:
    replication:
      username: replicator
      password: password
    superuser:
      username: postgres
      password: postgres
  parameters:
    unix_socket_directories: '/var/run/postgresql'

tags:
  nofailover: false
  noloadbalance: false
  clonefrom: false
  nosync: false

Replace <IP> and <etcd_ip> with the respective IP addresses. Start the Patroni service:

sudo systemctl start patroni
sudo systemctl enable patroni

Setting Up HAProxy

To ensure traffic is directed to the correct database node, install and configure HAProxy.

Installing HAProxy

Install HAProxy using:

sudo apt install haproxy

Configuring HAProxy

Edit the HAProxy configuration file at /etc/haproxy/haproxy.cfg to include the PostgreSQL cluster:

frontend pgsql
   bind *:5000
   mode tcp
   default_backend pgsql-backend

backend pgsql-backend
   mode tcp
   option tcp-check
   server pg-node1 <node1_ip>:5432 maxconn 100 check port 8008
   server pg-node2 <node2_ip>:5432 maxconn 100 check port 8008
   server pg-node3 <node3_ip>:5432 maxconn 100 check port 8008

Replace <node1_ip>, <node2_ip>, and <node3_ip> with the IP addresses of your nodes. Restart HAProxy:

sudo systemctl restart haproxy

By following these steps, you can configure a highly available PostgreSQL cluster using Patroni. This setup ensures that your PostgreSQL database remains available and resilient against node failures. With Patroni handling failovers and replications seamlessly, your data remains safe and accessible.

In conclusion, understanding and implementing each component—from installing and configuring etcd, PostgreSQL, and Patroni, to setting up HAProxy—ensures your PostgreSQL cluster is robust and highly available. This detailed guide provides the necessary steps to achieve a reliable and resilient database system, essential for any business relying on PostgreSQL.