Scaling Event-Driven Architectures on Google Cloud with Pub/Sub and Containers

Written By

Fatih Genc

As event-driven architectures grow more complex, selecting the right container platform becomes critical for ensuring scalability, reliability, and cost-efficiency. Google Cloud offers two primary container platforms: Cloud Run and Google Kubernetes Engine (GKE), each with distinct features and nuances to be considered relative to their event consumption models.

Previously, we discussed GCP Pub Sub consumption models. This article continues the conversation by exploring the selection of container platforms based on the message consumption model (Push or Pull) used by the containerised service.

Scaling Considerations

When it comes to dynamically scaling the system to handle changing event loads, we need to consider both the capabilities of the underlying platform and the key metrics that are available to scale against.

1. Scaling Properties of GCP Pub/Sub

GCP Pub/Sub is inherently designed for scalability and reliability, making it ideal for event-driven architectures:

Scalability: Handles millions of messages per second with thousands of subscribers.
Durability: Messages can be retained even after they are processed by subscribers, ensuring reliable delivery.
Global Availability: Low-latency delivery across regions supports global applications.
Metrics for Scaling: Key metrics, for example indicating subscriber backlog size and message processing latency, are available to scale workloads dynamically

2. Key Metrics for Identifying When Scaling Is Needed

Scaling decisions are typically based on certain metrics that indicate resource constraints or system inefficiencies. The most commonly used metrics include:

Compute Resource Metrics: These include CPU utilisation and memory consumption, which help determine when additional compute capacity is required.
Pub/Sub Metrics: Metrics related to messaging systems, such as topic or subscription performance indicators. Examples include:
- Event backlog (queue depth): The accumulation of unacknowledged messages in a Pub/Sub subscription.
- Events per second: The rate at which events are processed.

Best Practice: To ensure reliable scaling, it's advisable to monitor at least one compute resource metric alongside one Pub/Sub metric.
For instance, in scenarios where processing is not compute-intensive, service concurrency may be fully utilised while CPU usage remains below the scaling threshold. If an event backlog builds up, the system should scale. However, if scaling is based solely on CPU utilisation without considering Pub/Sub metrics, the system may fail to scale appropriately.

‍

Host Container Options for GCP Pub/Sub

Now that we've established key scaling principles let's explore how different container platforms—Cloud Run and GKE—align with these needs.

Cloud Run: A Good Fit for Push Subscriptions

Google Cloud Run is a fully managed, serverless platform designed for running stateless containerised microservices and short-lived Cloud Run Functions. It automatically scales up and down based on HTTP traffic, even scaling to zero during idle periods, making it ideal for Push subscriptions.

‍

Why Cloud Run Works Well for Push Subscriptions:

Dynamic Autoscaling: Push subscriptions inherently use HTTP POST operations to send the event messages to a defined endpoint. Cloud Run comes with the ability to rapidly auto scale against the rate of HTTP requests baked in, allowing it to handle traffic spikes effectively.
Cost Efficiency: Pay only for resources used during processing, making it ideal for applications with variable or unpredictable traffic. Can take advantage of the scale-to-zero serverless pattern when the request rate is low or very uneven.
Minimal Infrastructure Management: Developers can focus on code while Cloud Run handles infrastructure.

About instance autoscaling in Cloud Run services | Cloud Run Documentation | Google Cloud

Limitations:

Not Ideal for Pull Subscriptions under high load/high throughput scenarios. Cloud Run doesn’t currently have options to scale against backlog metrics like unacknowledged messages on a subscription. It scales only based on CPU utilisation and HTTP traffic, which doesn’t align well with Pull subscription workloads as there is no HTTP traffic. You would have to rely only on the CPU utilization threshold which is set at 60% utilisation and can’t be changed. Hence the risk of unreliable scaling as only one scaling metric applies.
HTTP Dependency: Push subscriptions require an always-available HTTPS endpoint, which may introduce additional infra management overhead external to pub/sub - network security and rules need to be configured and maintained.

Best Use Cases:

Stateless APIs and Integration with systems already using HTTP request/response patterns.
Event-driven apps that need to scale rapidly to changes in HTTP traffic/load. Where there is uneven web traffic or many spikes in load throughout the day.
Systems requiring low operational overhead.
Systems which have periods of low to zero traffic/not running 24/7 – Cloud run scales down to zero/no cost if not processing anything.
Short-lived processes (Cloud Run now also offers Cloud Run Functions for short-lived serverless capability).

‍

Google Kubernetes Engine (GKE): The Versatile Option

Google Kubernetes Engine (GKE) provides a managed Kubernetes environment for deploying and managing containerised applications. With its Autopilot mode, much of the operational overhead is abstracted away, making it easier to manage than standard Kubernetes clusters.

A strong point for GKE is that it provides flexibility in configuring metrics to scale against. GKE is not limited to scaling only against CPU utilisation and HTTP traffic, it can be configured to scale against many other metrics, such as memory usage and message backlog. You can even define custom metrics such as queries per second or network throughput. As such it can be used to scale for both Push and Pull type consumption.

‍

Why GKE is a Good Fit for Pull Subscriptions:

Customisable Scaling: Scales based on multiple metrics, including the number of unacknowledged messages on a subscription, ensuring reliable processing of event backlogs.
Support for Complex Workloads: Handles high-throughput scenarios and long-running microservices efficiently.

While GKE is also a good fit for Push type subscriptions, generally Cloud Run is used if the auto scaling metrics (http traffic and CPU utilisation) it uses are sufficient as it offers more simplicity and quick setup with lower operational overhead/maintenance cost.

Best practices for using Pub/Sub metrics as a scaling signal | Pub/Sub Documentation | Google Cloud

Limitations:

Operational Overhead: Although Autopilot simplifies management, GKE still requires more setup and expertise compared to Cloud Run.
Cost of underutilized resources: GKE charges on resources provisioned, which may be more costly than Cloud Run’s pay-per-use model depending on the traffic load. For example, long periods of low to no traffic or sporadic loading can lead to significant resource underutilization as much of the allocated compute remains unused or idle. Vertical autoscaling can be set up to help mitigate this by automatically re-provisioning compute resources. However, this brings other concerns and additional overhead.
Scaling for No Traffic Periods: Unlike Cloud Run, scale to zero is not the default behaviour in GKE. A specific setup, such as using the KEDA Pod Autoscaler, is needed to achieve this functionality. Without scale-to-zero capability, additional running costs will be incurred over periods of no traffic.
Doesn’t have options for short-lived functions or lambda type capability (vs Cloud Run which has Cloud Functions).

Best Use Cases:

Complex architectures or workloads that are mostly on, with little downtime and minimal periods of highly fluctuating load.
Will utilise pull consumption models for long-running, high load, high throughput scenarios.
Batch processing systems where you want to pull and rapidly process large amounts of data on a schedule.
If you need more granular control over resource configuration or use of custom metrics.
You already have a strong Kubernetes practice/knowledge in your organisation.

Hybrid Approaches: Combining Push and Pull Mechanisms

Often a complex architecture may combine both Push and Pull models. For example, a push mechanism may be used to process data which arrives sporadically throughout the day while a primarily pull-based system performs the bulk of the steady-state processing.

In this case, you might use:

Cloud Run for handling HTTP-based triggers efficiently – to take advantage of the out of the box scale-to-zero capability cost savings for and rapid scaling in response to sporadic load heavily fluctuating loads.
GKE with Pull consumption for processing very high-volume/ high throughput event backlogs or complex long running/ “mostly on” workloads.

‍

Closing Thoughts

To effectively scale event-driven architectures, it's essential to align the selected Pub/Sub subscription model to the capabilities of the selected container platform.

Cloud Run stands out for its minimal setup and operational cost-efficiency, making it a good fit for Push based consumption. It provides advanced out-of-the-box features that enable it to rapidly scale with fluctuating loads and scale to zero when there is no traffic. In addition to running microservice containers, it also offers Cloud Run Functions for executing short-lived serverless processes.

GKE offers high flexibility and customisation at the cost of some operational overhead which can be minimised by using its AutoPilot mode. GKE is well suited for high-volume pull consumption where achieving efficiency and high throughput are essential.

GKE also effectively scales against Push based subscriptions and may be preferred in some scenarios for example when GKE is already in use and the request traffic mostly uniform, or there is a strong Kubernetes practice within the organisation.

In many event-driven architectures, where different parts of the system have varying technical needs, taking a hybrid approach may be advantageous. By utilising both Push and Pull consumption models with Cloud Run and GKE container platforms, we can leverage their respective strengths and align their capabilities with specific functional requirements.

By understanding the strengths of each platform and subscription model, you can design a resilient, scalable architecture tailored to your unique needs. Whether you're processing real-time events, ingesting and analysing large streams of data or building machine learning pipelines, Google Cloud platform Pub/Sub and its container platform offerings provide the tools to succeed.

Author

Lead Consultant