Postgres in Kubernetes - Automation with the Postgres Operator

This article was automatically translated from the German original using AI. Read original

Postgres in Kubernetes - Automation with the Postgres Operator

“Persistence is hard” - as a Kubernetes administrator, you are well aware of this fact. Databases are a core component of many applications and must be operated reliably and securely. Support is always welcome, especially when you need to create and manage many instances. Automating databases in Kubernetes is a challenge that can be solved with operators. In this article, we demonstrate how this can be achieved using the “Crunchy Data Postgres Operator” as an example.

Architecture

The goal of the Crunchy Data Postgres Operator (PGO) is to provide a quick and straightforward way to run applications with PostgreSQL in both development and production environments. That sounds interesting :-)

To better understand how PGO accomplishes this, you should take a look at the operator’s architecture.

PGO Architecture (Source: CrunchyData)
PGO Architecture (Source: CrunchyData)

The Crunchy PostgreSQL Operator extends Kubernetes with a higher level of abstraction for easier creation and management of PostgreSQL clusters. It uses “Custom Resources” and defines special Custom Resource Definitions (CRDs) to control itself and the PostgreSQL clusters it creates.

PGO itself runs as a Kubernetes Deployment with a single container, the Operator. The Operator uses Kubernetes controllers that monitor all events on Kubernetes resources (Jobs, Pods) as well as on the Custom Resources (PostgresCluster, PGUpgrade), and reconciles the desired declared state.

PGO is very capable and can provision highly available clusters with multiple replicas, backup options (local & S3), as well as monitoring and connection pooling.

Installation

Getting started is easy thanks to the excellent documentation. The Quickstart Guide walks you through the installation of the Postgres Operator all the way to provisioning your first Postgres database.

The only prerequisite is a Kubernetes cluster that is reachable via kubectl.

The Operator is installed using Helm or kustomize. Only a few steps are required:

Clone the example repository with default configurations:

git clone https://github.com/CrunchyData/postgres-operator-examples.git
cd postgres-operator-examples

The example repo contains various sample configurations for deploying the Operator and databases via Helm or Kustomize.

Installing the Operator in the “postgres-operator” namespace via Kustomize is done with:

kubectl apply -k kustomize/install/namespace
kubectl apply --server-side -k kustomize/install/default```

The First Postgres Database

After installation, you can create your first Postgres database. This is done using a Custom Resource Definition (CRD) that describes the database configuration. An example for a simple database is included in the kustomize/postgres directory:

kubectl apply -k kustomize/postgres

Essentially, only a Custom Resource is created that describes the database. The Postgres Operator ensures that the database instance is created and all necessary additional resources (backup pod, ConfigMaps, Secrets, Services, PVCs, …) are provisioned.

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: hippo
  namespace: mynamespace
spec:
  postgresVersion: 16
  users:
    - name: rhino
      databases:
        - zoo
  instances:
    - name: instance1
      dataVolumeClaimSpec:
        accessModes:
        - "ReadWriteOnce"
        resources:
          requests:
            storage: 1Gi
  backups:
    pgbackrest:
      repos:
      - name: repo1
        volume:
          volumeClaimSpec:
            accessModes:
            - "ReadWriteOnce"
            resources:
              requests:
                storage: 1Gi

Tip

If you want to run the Operator with a backup repo, you need at least one additional node, since the repo container must by default run on a different node than the database instance. Alternatively, you can adjust the topology-spread-constraints for the backup repos accordingly.

Shortly after, we can already see the containers and Secrets in our namespace “mynamespace” and can start using our database.

Access!

To access the database, we start a container where we can connect to the database using psql. We mount the Secret hippo-pguser-rhino created by the Operator for our database instance and map all keys as environment variables, which we can then use for the psql command.

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: pgclient
  namespace: mynamespace
spec:
  containers:
  - name: pgclient
    image: postgres:latest
    command: ["sleep", "infinity"]
    envFrom:
    - secretRef:
        name: hippo-pguser-rhino
    stdin: true
    tty: true
EOF

Then we connect to the container and start a shell:

kubectl exec -n mynamespace -it pgclient -- bash

Because we mounted the Secret entries as environment variables, we can conveniently access the database:

PGPASSWORD=$password psql -U $user -d $dbname -h $host

And off we go - now we can interact with our database instance.

bash

% List users
\du+
% List databases
\l

If these commands work, then an application can also access the database using the information from the Secret.

Using the Public Schema

Please note: for security reasons, Postgres has long recommended revoking permissions on the “public” schema. With Postgres 15, this recommendation became the default behavior. This change improves security but causes difficulties for those who connect as a user and immediately try to create tables.

To set up a schema for the user, the spec.databaseInitSQL field can be used to create appropriate schemas via an init SQL script.

Alternatively, you can use the automatic user schema creation feature, which can simply be enabled via an annotation in the CRD. More information can be found in the User / Database Management Tutorial.

metadata:
  name: hippo
  annotations:
    postgres-operator.crunchydata.com/autoCreateUserSchema: true

This automatically creates a schema with the user’s name when the user is created, which Postgres also prioritizes in the search_path, as the following statement shows:

SHOW search_path;
   search_path   
-----------------
"$user", public

Configuration

Backup

Backup is enabled by default and uses pgBackRest. The configuration is described in the Backup Guide. The backup is performed with a separate container and can be configured in various ways. For example, you can define the number of backups and the retention period. Encryption of backups is also possible. Configuration is done via the Operator’s Custom Resource Definition (CRD).

For a long time, it was not possible to skip backups for, say, test databases.

Optional Backup

A relatively new feature in version 5.7 is the frequently requested “Optional Backup”. This allows databases to be excluded from backup. This is useful, for example, when a database only contains temporary data or should not be backed up, such as a development database. Configuration is also done via the Operator’s CRD. More information can be found in the Optional Backup Guide.

To run our already deployed example database (see above) without backup, we remove the backup section from the spec and add an annotation that allows the Operator to discard an existing backup. If a database is newly created without backup, this annotation is not needed.

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: hippo
  annotations:
    postgres-operator.crunchydata.com/authorizeBackupRemoval: true
spec:
  postgresVersion: 16
  users:
    - name: rhino
      databases:
        - zoo
  instances:
    - name: instance1
      dataVolumeClaimSpec:
        accessModes:
          - "ReadWriteOnce"
        resources:
          requests:
            storage: 1Gi

Day Two Operations

Of course, operating a database involves much more than just creating and deleting databases. The Operator offers many features that simplify the daily operation of databases in Kubernetes.

These include:

  • Monitoring
  • Cluster management
  • Scaling/clustering
  • Configuration changes
  • Version upgrades
  • Migration

CrunchyData has created tutorials for Day Two Operations and documented typical questions with numerous Guides. This demonstrates the true strength and maturity of the solution.

Kubernetes - Helm Deployment

In addition to installing the Postgres Operator via kustomize, CrunchyData also offers a Helm chart. The chart is very flexible and provides many configuration options.

Installing the Operator with Helm is somewhat cumbersome, as the charts are no longer(?) available in a repository. Instead, the chart must be downloaded and included as a dependency in your own chart, for example. The default values are well suited for getting started and can be easily adjusted for specific requirements.

CrunchyData also provides a Helm chart for deploying database instances, which can be integrated into the deployments of applications that require Postgres databases.

Tip

When installing the Operator with Helm via ArgoCD, the error “Too long: must have at most 262144 bytes” may occur when synchronizing Custom Resource Definitions (CRDs). This is because Kubernetes has a 256 KB size limit for annotations. By default, Argo CD uses Client-Side Apply, which adds the last-applied-configuration annotation, and with extensive CRDs, this can cause the limit to be exceeded. The recommended solution is to enable Server-Side Apply, which does not add this annotation. In Argo CD version 2.5 and later, this can be achieved by setting the sync option ServerSideApply=true. For earlier versions, kubectl replace can be used as an alternative, but with caution, as it can lead to unexpected results when multiple clients modify an object. This article describes the problem and solution very well.

Conclusion

The Crunchy Data Postgres Operator is a mature system for running Postgres databases in Kubernetes. Installation is straightforward, and configuration is done via Custom Resource Definitions (CRDs). The Operator offers many features that simplify operating databases in Kubernetes. The documentation is excellent and contains many examples and tutorials. The Operator is a great choice for automating Postgres databases in Kubernetes.

The Operator provides support for administering Postgres databases in Kubernetes and hides much of the complexity. Nevertheless, the stable operation of one (or many!) database(s) remains a challenge and requires a certain level of experience and knowledge.

And remember: no backup, no sympathy!

Authors

Alex Bloss

Author

Alex Bloss

Alex Bloss - Head of attempto-Lab.

Related Posts

Share