In today's digital landscape, data availability is crucial. Businesses depend on databases to be up and running at all times. If you're relying on PostgreSQL, ensuring that your database is highly available is absolutely essential. This is where Patroni steps in, providing a robust solution for managing a PostgreSQL cluster. In this article, we will walk you through the comprehensive steps to configure a highly available PostgreSQL cluster using Patroni.
Before diving into the configuration process, it is crucial to understand what Patroni is and the components it interacts with. Patroni is an open-source tool designed to manage PostgreSQL clusters. It simplifies the management of high-availability PostgreSQL clusters by automating failover and replications.
A typical Patroni cluster consists of the following core components:
Understanding these components is paramount to successfully setting up a highly available PostgreSQL cluster.
The first step in setting up a Patroni cluster is to install and configure etcd. This component plays a crucial role in maintaining cluster state and leader election.
Begin by installing etcd on your servers. Use the following command to install etcd on a Ubuntu system:
sudo apt install etcd
Once installed, configure etcd to ensure it can handle the demands of a highly available system. Create an etcd configuration file at /etc/etcd/etcd.conf.yml
and set the following parameters:
name: 'etcd-node1'
data-dir: '/var/lib/etcd'
initial-cluster-state: 'new'
initial-cluster-token: 'etcd-cluster'
initial-cluster: 'etcd-node1=http://<IP1>:2380,etcd-node2=http://<IP2>:2380,etcd-node3=http://<IP3>:2380'
initial-advertise-peer-urls: 'http://<IP1>:2380'
advertise-client-urls: 'http://<IP1>:2379'
listen-peer-urls: 'http://<IP1>:2380'
listen-client-urls: 'http://<IP1>:2379'
Replace <IP1>
, <IP2>
, and <IP3>
with the IP addresses of your etcd nodes. Start etcd on each of your nodes:
sudo systemctl start etcd
sudo systemctl enable etcd
The next step involves installing and setting up PostgreSQL on your nodes. PostgreSQL is the backbone of your cluster and must be correctly configured to work with Patroni.
Use the following command to install PostgreSQL on each of your nodes:
sudo apt install postgresql
After installation, configure PostgreSQL for replication. Edit the postgresql.conf
and pg_hba.conf
files accordingly:
# postgresql.conf
listen_addresses = '*'
wal_level = replica
max_wal_senders = 10
synchronous_commit = 'local'
# pg_hba.conf
host replication replicator <replica_ip>/32 md5
host all all 0.0.0.0/0 md5
Create a replication user:
CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'password';
With PostgreSQL and etcd in place, it’s time to install and configure Patroni to orchestrate the cluster.
Install Patroni using pip
:
sudo apt install python3-pip
pip3 install patroni[etcd]
Create a Patroni configuration file named patroni.yml
for each node. A basic configuration looks like this:
scope: postgres-cluster
namespace: /service/
name: pg-node1
restapi:
listen: 0.0.0.0:8008
connect_address: <IP>:8008
etcd:
host: <etcd_ip>:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
parameters:
wal_level: replica
hot_standby: "on"
max_connections: 100
max_wal_senders: 10
wal_keep_segments: 8
archive_mode: "on"
archive_command: 'cp %p /var/lib/postgresql/data/archive/%f'
archive_timeout: 1800s
initdb:
- encoding: 'UTF8'
- data-checksums
postgresql:
listen: 0.0.0.0:5432
connect_address: <IP>:5432
data_dir: /var/lib/postgresql/data
pgpass: /tmp/pgpass0
authentication:
replication:
username: replicator
password: password
superuser:
username: postgres
password: postgres
parameters:
unix_socket_directories: '/var/run/postgresql'
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
Replace <IP>
and <etcd_ip>
with the respective IP addresses. Start the Patroni service:
sudo systemctl start patroni
sudo systemctl enable patroni
To ensure traffic is directed to the correct database node, install and configure HAProxy.
Install HAProxy using:
sudo apt install haproxy
Edit the HAProxy configuration file at /etc/haproxy/haproxy.cfg
to include the PostgreSQL cluster:
frontend pgsql
bind *:5000
mode tcp
default_backend pgsql-backend
backend pgsql-backend
mode tcp
option tcp-check
server pg-node1 <node1_ip>:5432 maxconn 100 check port 8008
server pg-node2 <node2_ip>:5432 maxconn 100 check port 8008
server pg-node3 <node3_ip>:5432 maxconn 100 check port 8008
Replace <node1_ip>
, <node2_ip>
, and <node3_ip>
with the IP addresses of your nodes. Restart HAProxy:
sudo systemctl restart haproxy
By following these steps, you can configure a highly available PostgreSQL cluster using Patroni. This setup ensures that your PostgreSQL database remains available and resilient against node failures. With Patroni handling failovers and replications seamlessly, your data remains safe and accessible.
In conclusion, understanding and implementing each component—from installing and configuring etcd, PostgreSQL, and Patroni, to setting up HAProxy—ensures your PostgreSQL cluster is robust and highly available. This detailed guide provides the necessary steps to achieve a reliable and resilient database system, essential for any business relying on PostgreSQL.