Introducing the Kubernetes Autoscaling for Apache Pulsar Operator and the KAAP Stack

TL;DR You can now use a Kubernetes operator to deploy, manage, and horizontally scale an Apache Pulsar cluster. Use the Helm chart to build resource sets, autoscale rules, and Kubernetes pod rack-awareness for your Pulsar components.

It’s hard not to agree that Apache Pulsar is an amazingly powerful tool. Its cloud-native, high-performance design takes an application’s messaging to the next level. Sending events at scale means you need single-digit latency, blazing-fast message processing, and a community that supports its members’ needs. Pulsar has been growing fast; with every version, its feature count expands and its performance improves.

A typical Pulsar cluster includes proxies, brokers, bookies, and zookeepers. It could also have function workers and administration consoles. Each component comes with its own set of configurations. Most also share global services like identity management and observability. As you can imagine, orchestrating a cluster’s deployment, scaling components up and down, and maintaining its health is not a trivial task. If you’ve ever managed a cloud-native service, then you know keeping up with the “moving parts” is like herding cats.

The most popular way to run Pulsar is via its Helm chart with Kubernetes. Configuring each component in the chart is done manually. Distributing a role (identity) throughout the cluster or securing all communications within the cluster takes some learning. Features like rack awareness and pod affinities are possible, but require some manual “helming” to get it right.

Enter Kubernetes Autoscaling for Apache Pulsar (KAAP), which provides automated management and scaling of your Pulsar cluster, simplifying configuration, deployment, and maintenance.

Autoscaling Pulsar clusters with KAAP

One of KAAP’s strengths is taking a cluster to production. There are many possible configurations, and only those that have read the documentation end-to-end and spent a bunch of time managing clusters know how all these settings perform. KAAP has Pulsar best practices built in. Its maintainers are also a part of the Pulsar open source project. So you can rest assured that clusters created with KAAP are ready to run at scale.

To get more familiar with KAAP, here are a few of its notable features.

Resource sets

Resource sets are a new concept introduced with KAAP. When we designed the operator, we considered common needs of our customers, like the ability to group certain cluster components together as a set and apply rules like DNS or identity to only those instances.

Say your Pulsar cluster has ten brokers, and of all the applications using the cluster, there is one particular app that accounts for a big percentage of traffic to the brokers. Prior to resource sets, your only real option would have been to give that app its own Pulsar cluster. But doing so would disconnect the stream of data from the business, which defeats one of the driving principles of using Pulsar.

With resource sets, we can simply segment a few of the ten brokers to a different DNS address dedicated to that very popular app. Resource sets also aid in segmenting configurations, so that same resource set of brokers could run on a different configuration tailored to the characteristics of the app. But because all the brokers are still a part of the same cluster, data can be streamed just as any other application would, so the application’s consumers wouldn’t have to make any special provisions.

Rack awareness

As a complement to resource sets, they can also be given rules to make components “aware” of the zone they are running within. Say you want to create the concept of failure domains in your Pulsar cluster, where certain components have an affinity to communicate with other components in the same availability zone. Make these components a part of a resource set and assign that set to a certain (or multiple) zones.

You’re still managing a single Pulsar cluster, so the consumers can stream data from multiple sources as they always have. But internally you can create a resilient, performant service that is adapted to its workloads.

Scaling rules

“Autoscaling” is in the name of the project, so you can imagine scaling is a foundational piece of its design. KAAP focuses on the two key components of any cluster — broker and bookkeeper.

Prior to KAAP, creating a simple scaling rule for Pulsar brokers meant watching the CPU load of each instance. The focus is on processing a message and keeping a consistent amount of throughput. No cluster in production has just one broker, so the challenge is to aggregate all CPU percentages and make a more informed scaling decision. That simple rule just got very complex.

When the team designed KAAP, they knew a better way to approach the scaling challenge. Instead of using an instance of each broker’s CPU load, they hooked into Pulsar’s internal load balancer. Wait—did you know Pulsar has an internal load balancer?! When KAAP sees that the Pulsar load balancer is having trouble finding brokers to assign topic bundles to, it will scale up the number of brokers to handle the load. When the operator sees that the CPU usage of all brokers is low, it will scale down the number of brokers to save resources.

If you’re worried about a new paradigm where brokers are constantly scaling, rest easy that KAAP uses windowing while deciding when to scale.

Unlike brokers though, bookkeepers (bookies) are not stateless. In fact they store the messages in a Pulsar cluster. If you have ever tried to tie scaling rules to a service that is a stateful store, you’ll know how difficult this effort is. Growing the number of bookies is simple because Pulsar is cloud-native. So “teaching ” about a new bookie available for storage is a matter of incrementing the list.

Removing a bookie from the cluster is an entirely different matter. It’s known as a process of decommissioning. The basic steps are to first remove the instance from active broker interactions. Then all the data stored on the instance needs to be drained to other instances and the brokers need to be made aware of where that data goes.

Traditionally decommissioning a bookie was kept to a manual process, and it wasn’t for the untrained cluster manager. KAAP has all this knowledge built in. When storage is running low, KAAP will sense this and scale up bookies. Because of bookkeeper’s segment-based design, the new storage is available immediately for use by the cluster, with no log stream rebalancing required.

When KAAP sees an abundance of open storage on a Bookkeeper node, the node is automatically scaled down (decommissioned) to free up volume usage and reduce storage costs. This scale-down is done in a safe, controlled manner which ensures no data loss and guarantees the configured replication factor for all messages.

Like all KAAP features, these scaling rules are configurable to meet your specific needs. Choose a percentage of disk and a minimum number of nodes and let KAAP do the rest!

Managing Apache Kafka workloads with Kubernetes autoscaling for Pulsar

KAAP comes built in with the DataStax Starlight for Kafka extension. You can run it as a standalone pod or as a part of Pulsar’s proxy. As a pod, KAAP will scale as demand grows or shrinks. You could create a resource set and route a special DNS entry to those pods.

Starlight for Kafka enables your Apache Kafka clients to connect to a Pulsar cluster. It brings the native Kafka protocol support to Pulsar by introducing a Kafka protocol handler. By adding the Starlight for Kafka protocol handler to a Pulsar cluster, existing workloads can be migrated to Pulsar without modifying any code!

Learn more about Starlight for Kafka here.

The KAAP stack

Taking a Pulsar cluster to production is more than just the cluster’s components. It’s a combination of the cluster and other services working together to provide a resilient, low-latency platform. Identity and certificate management are some of the most popular additions.

Yet again, each of these services have their own set of configurations. To get the services to work in perfect harmony with the cluster, you need an understanding of Pulsar and the given service. In fact, this goes beyond just reading the documentation. Integrating third-party services with a Pulsar cluster takes a deeper understanding. You need to know how things are implemented. Balancing best practices with intended design. Latency is a big deal in a messaging service; the introduction of a third-party service could quickly increase response times of the broker.

DataStax has taken the KAAP operator one step further, and introduced the KAAP Stack. It’s a stack of technologies alongside the operator that are aware of each other using best practices. The services and components in the stack are integrated with the same goal in mind – get the most out of Pulsar.

The KAAP stack includes the following elements:

Pulsar Operator for driving a healthy Pulsar cluster
Prometheus (optionally Grafana) for pre-wired observability into each component in the cluster. If you choose to also include Grafana there are pre-validated dashboards included.
Cert Manager as a Kubernetes-native resource, this establishes a root certificate authority in the cluster to issue and renew certificates
Keycloak to manage identities within the cluster using OpenID best practices, or for use to integrate with external identity providers such as SAML 2.0

The KAAP stack comes with it’s own Helm chart. Offering customizations to run its services in concert with your organization’s existing services.

Read more about the KAAP stack in DataStax documentation. Get started now with these simple Helm commands or visit the examples folder for more learning:

helm repo add kaap https://datastax.github.io/kaap
helm repo update
helm install pulsar kaap/kaap-stack

Migrating your Pulsar clusters to KAAP

If you’ve ever deployed Pulsar’s existing Helm chart, you know there’s an almost endless number of configurations, which makes migrating to an entirely new chart very intimidating.

The DataStax KAAP team ran into the same challenge during the development of the operator. They had existing clusters running production workloads that were in need of updating. They developed a stand alone CLI to aid in this migration.

You might expect a tool like this to take in the values of the Pulsar Helm chart and output a compatible KAAP Helm values file. That approach would make the assumption that the cluster is in complete sync with the chart values and that the chart is aware of all component configurations. Which is typically not the case.

Instead the KAAP migration tool does a deep analysis of a running Pulsar Cluster, creates a Kubernetes manifest using the PulsarCluster CRD specification, and then runs a simulation to see what would happen if that manifest had been applied. A report of that simulation is provided as output from the tool along with the proposed manifest.

You have the chance to review the results for oddities or further customizations, implementing additional concepts with KAAP—like resource sets and scaling rules.

The migration tools JAR can be downloaded from the KAAP project’s releases. You’ll need Java 17 and an existing Pulsar cluster.

Learn more about the KAAP migration tool in DataStax documentation.

Getting started with the Kubernetes Autoscaling for Pulsar Operator

Getting started with KAAP is just like your other Helm charts. Using the Helm CLI, issue the commands:

helm repo add kaap https://datastax.github.io/kaap
helm repo update
helm install kaap kaap/kaap

This will install the KAAP operator and its CRDs. Then you can use the PulsarCluster CRD to create a Pulsar cluster. This gives you a basic configuration that is good for development and sandbox environments. When you’re ready for higher environments, visit the official documentation to learn about secure communications, authentication, and managing resource sets.

We’re excited to learn about how KAAP simplifies management and scaling of your Pulsar cluster. Got feedback or questions? Join our GitHub discussion. It’s a great place to share your ideas and let us know about potential issues. Visit the GitHub project to see what’s in development and contribute to the project.

Datastax Blog